Why you need a Data Vault for your Data Vault
- Rhys Hanscombe

- Apr 29, 2023
- 4 min read
In the latest Data Vault User Group meet-up, Christopher Siegfried, senior architect at Infovia, presented on “why you need a Data Vault for your Data Vault”. Christopher discussed the innovative work he has been doing with three of his clients, and how metadata has helped solve issues with the long-term maintenance of Data Vault solutions. In addition, he explored the recent emergence of ChatGPT-4, and how it could help you to develop the code for your Data Vault. In this blog, we’ll look at what Christopher talked about, and add our own views on the topic.
The Metadata layer of the Data Vault
Put yourself in this situation – you’re modelling a Data Vault. You know why each element is where it is. Every element of your model has a purpose for being there and Data Vault is meant to last the lifetime of the platform. So, what happens to your organisation’s Data Vault when you leave, move onto a different project, or you try to scale the team with new people? Unless you’re careful, you’ll more than likely run into the following issues.
Yes, you know everything there is to know about the model (because you created it), but the knowledge and understanding is not adequately captured. The information and decisions you made will not be clear to future team members without context. Simply performing standard data modelling, you may lose a lot of understanding of your model as time progresses.
The solution is to expand your Data Vault to capture this knowledge and context. As Christopher suggests, build a MetaVault layer which explains the reason behind each part of the model, providing an expanded semantic layer, increased Data Vault “vocabulary”, and including integrated metadata. This, in turn, may empower artificial intelligence (which many believe to be the future).
This is where the Christopher showed how three examples of client problems could alleviated by the MetaVault.
Client 1 – New Data Engineers
Christopher’s first client operated with two technically competent data engineers who had built a Data Vault model from a high-value business process. However, they were both reassigned to a different area of the organisation. The new engineers who then started working with the existing model were then faced with a problem: they didn’t have enough context to understand the Data Vault.
There was nothing wrong with the Data Vault model, but it didn’t have enough information for someone new to change it. It’s important to remember that you can have a model that is correct but still lacks the information to expand because there isn’t a deep semantic layer.
A solution is to have all the metadata you need about the Data Vault model the Data Vault itself.
You won’t get the information you need from your Data Vault by looking at data that is too vague, OR too descriptive that it becomes confusing.
Instead, it’s important to address what is in the Data Vault. As with anything when working with the Data Vault method, you need to stay business centric. Ensuring that the Data Vault is semantically equivalent with the organisation. The descriptive metadata is used as a beneficial resource for discovery and identification of aspects within the Data Vault, meaning that anyone can work with the model.
Client 2 – Mature data Vault
The second client that Christopher described already had a mature data vault with 8 to 12 data sources. But they had a similar issue of onboarding new team members. The issues arise when they need to spend months getting up to speed on object relationships, particularly if a team member leaves.
Structural metadata solves this problem. It is metadata about containers of data and indicates how compound objects are put together. For example, how pages are ordered to form chapters.
Client 3 – Large organisation
The final client was a large organisation which purchased several smaller organisations under their umbrella. As you can imagine, this creates some messy situations. Different teams across the organisation modelled their own Data Vault based on their specific branch of the organisation. The teams then came together to try and make a single Data Vault for the parent organisation. However, each team struggled to match the appropriate characteristics (hubs, links, and satellites) with each team’s model.
Metadata once again proved to be the solution for this client. Likewise, being business-centric and using a company-wide Data Vault vocabulary when describing each element helped to resolve this issue long-term.
ChatGPT
At the end of his presentation, Christopher took the Data Vault User Group through a journey to what he believes to possibly be the future of most of what we do. Not only in the data industry, but in life too.
ChatGPT-4 can provide you with exceptional SQL code to build a Data Vault – instantly. As Data Vault has been around for several years, artificial intelligence knows about it. ChatGPT searches the internet to learn about hubs, links, satellites, and all you need for a Data Vault.
Summary
The future is here and has sprung upon us in recent times with the development of Open AI. At Datavault, we believe the approach of creating a MetaVault to manage metadata within the Data Vault itself is a useful innovation and a good way of improving the long term maintainability of data platforms solutions.
