Satellites in Data Vault
- Rhys Hanscombe
- Sep 30
- 3 min read
Satellites are a cornerstone of Data Vault 2.0, providing the descriptive attributes that contextualize Hubs and Links. In this article, we explore the core concepts of Satellites, their various types, and best practices for implementation, drawing from our recent podcast discussion.
What Are Satellites in Data Vault?
Satellites are essentially attribute tables that describe the business entities represented by Hubs and the relationships defined by Links. As Alex Higgs explained, "Satellites are basically attributes that describe a link or a hub." They capture the evolving details surrounding these core elements, allowing for comprehensive historical tracking.
While various types exist, including technical Satellites and those tailored for specific data patterns, we primarily focus on the "vanilla" one-value-only Satellites, the most common type.
A Practical Example: Customer Data
Consider a Hub representing a "customer." The corresponding Satellite would contain attributes such as customer name, email address, and phone number, sourced from a specific system. When a customer interacts with this system and provides personal details, this information is stored in the Satellite. As Neil Strange noted, "So you will have some details about me being entered. They that will end up in satellite, there's no name, username, e-mail, username, e-mail address, whatever is appropriate in there."
When customer information is updated, a new row is inserted into the Satellite, time-stamped with the load date. This historical tracking is crucial for auditing and analysis. As Alex Higgs pointed out, "So I can then go back into that satellite and say what was the truth about me as a particular reference date?"
Handling Large Datasets and Satellite Splitting
A common challenge is dealing with large source tables containing numerous columns. Creating a single wide Satellite is generally discouraged. As Neil Strange advised, "Generally it's a bit of an anti pattern. You want to avoid doing that if you can."
Instead, it's recommended to split Satellites based on related columns or grain. This approach, known as Satellite splitting, improves performance and manageability. For instance, separating customer contact details and address details into distinct Satellites. As Alex Higgs suggested, "I've seen people kind of group them and split them into say. Customer contact details, customer address details, and those could be two separate satellites for example."
Satellite Metadata
Typical Satellite metadata includes:
Parent Hash: The primary key linking back to the parent Hub or Link.
Load Datetime: The timestamp indicating when the record was loaded, enabling time-travel queries.
Record Source: The source system from which the data originated.
Hash Diff: A checksum of all attribute columns, used for efficient change detection.
The Hash Diff is particularly important for columnar databases, as it optimizes change detection. As Neil Strange explained, "So you don't have to compare every column against every column on the way. If one of the changes, the whole thing changes, therefore the hashed it changes."
Loading Satellites and Handling Deltas
Loading Satellites involves effectively managing changes or "deltas." This process typically involves comparing the Hash Diff of incoming records with the latest Satellite records. As Alex Higgs described, "So finding the latest recording the satellites usually a CT to do that."
When a change is detected, a new row is inserted into the Satellite. Special care is needed for handling multiple changes within a single batch, requiring careful ordering and processing. As Alex Higgs continued, "Especially if you have a change to the same record multiple times in a single batch. So if someones updated their their name for example.
Three times in a day, for whatever reason, then you're going to get three different deltas in that same batch. So you have to order and carefully handle that."
Additional Considerations
While standard Data Vault 2.0 Satellites don't include effective dates, some implementations may use them. The relationship between Satellites and Slowly Changing Dimensions (SCDs) is also a frequent topic of discussion, with Satellites providing a similar time-series tracking capability.
Final Thoughts
Satellites are vital for capturing and maintaining the historical context of your data in a Data Vault. By understanding how to effectively design and load Satellites, you can ensure your data warehouse accurately reflects the evolving nature of your business data.
For more insights and discussions on Data Vault, check out our Data Vault User Group website, where you can access past meetups, Q&A forums, and additional learning resources.