Improving your Data Warehouse Testing Toolbelt
- Hannah Dowse
- Oct 6, 2021
- 3 min read
During my time as a Data Engineer, I have observed first-hand the process of testing Data Warehouses, communicating the results between teams and how this presents new and interesting challenges to overcome.
Data warehouses are used and maintained for Analytics and Business Intelligence, so as a test developer for one of these systems there are a number of notable obstacles that stand between you and efficient testing procedures.
The most impactful of which being communication; Data Warehouses interface with all aspects of a business and thus have different specialists that each have their own distinct requirements to be implemented into the warehouse. Until these requirements are satisfied, a warehouse in production cannot be firmly integrated into regular use. Those that are tasked with testing the Data Warehouse have a unique complication. They have to understand all the specialist requirements in order to definitively test whether any of these rules are broken in the current build of the system.
An obvious solution to this is clear and concise business language that communicates these rules. However, these lack the technical detail of how to perform the tests which is a necessity that testers would have to produce alongside this document, meaning twice the effort to maintain both documents. Testers often condense these two into a mixture of clear business language and the underlying technical information that makes such a rule enforceable, these tools are referred to as business readable testing frameworks.
Business readable testing frameworks are written in plain English that can be directly mapped to some test procedure that determines whether the rule is implemented correctly. From this, anybody can read and understand the document. Similar to the business language discussed previously to create specifications. However, there is an important difference. Each sentence in the plain English specification is linked to a function in code meaning that the business language is in fact executable. Not only are the test results significantly easier to construct, but they are also simpler to communicate, more modular, and as such, more reusable. The most important aspect of this approach is that the results clearly indicate to anyone, technical or not, which aspects of the Data Warehouse are functioning as intended.
Implementations of business readable testing frameworks commonly use Cucumber to connect plain English Gherkin tests into python functions that can carry out that section of the test.
As seen here, the tests clearly convey the test procedure through a “Given”, “When”, “Then” format. This indicates the situation the test is relevant to, what the user does to cause an effect, and finally what effect the user should expect from this action. This modular format makes it possible for anybody to take the building blocks and design new tests not just the testers.
Cucumber and many of these frameworks are configured to output test results in JSON format. On its own JSON isn’t particularly good to present to a non-technical team. However, it is incredibly easy to feed this file into a display that makes the results clear, concise and easily understandable. One possible way of displaying this is by using an Allure Report which presents test results in a website that can be accessed company-wide to check the current state of testing.
These reports eliminate the need to repeatedly distribute new test results as well as any confusion that could cause. This output is accessible and always up to date which is particularly helpful for those who develop the warehouse to see which areas need their attention.
Another advantage of this approach is that these tests are designed for longevity. Due to the way that rules can be described in plain business language, these tests become the definition for the warehouse. A living, executable specification. As the project develops, any changes to the specification also manifest in code. Meaning that the testing environment is always up to date with the needs of the project. In addition to this the modularity of the possible sentences makes adjusting and tweaking the specification as simple as possible so there is more time spent developing new building blocks for tests than there is constructing the tests themselves.
These aspects of testing virtual warehouses are often pivotal to avoiding miscommunication and ensuring that the specification matches the need of the warehouse. Adding them to your testing toolbelt will not only make your work more robust but also will end up more efficient in the long run, leaving you with more time to prepare for hurdles you may encounter in the future.



