Understanding Business Keys in Data Vault
- Rhys Hanscombe

- Sep 19
- 4 min read
Business keys are a core concept in Data Vault modeling.
They play a key role in identifying and linking data across multiple systems, ensuring consistency and accuracy. In this article, we’ll break down what business keys are, why they matter, and how to handle common challenges when defining and managing them.
What is a Business Key?
A business key is a unique identifier used to define a hub in Data Vault. A hub represents a core business concept (such as customers, suppliers, invoices, products, or accounts) that your business relies on to operate.
Examples of Business Concepts:
Customer
Supplier
Invoice
Product
Account
To manage these business concepts effectively, each one needs a unique label — the business key. This key allows you to consistently identify and link data about that concept across multiple systems.
Types of Business Keys
There are two main types of business keys:
1. Natural Keys
A natural key is derived from the business itself, using attributes that naturally describe the business concept.
These are consistent across different systems and are meaningful to people using the data.
Example:
For a customer, a natural key could be:
Company name
Company registration number
VAT number (if applicable)
DUNS number (if available)
A natural key is ideal because it makes sense at face value and can be used consistently across systems.
2. System-Generated Keys
Sometimes a natural key isn’t available or practical.
In these cases, systems generate internal reference numbers or IDs to track records.
Example:
Customer ID
Sequence number
GUID (Globally Unique Identifier)
The problem with system-generated keys is that they are arbitrary and difficult to interpret. They also vary across systems, making integration more complex.
Why Natural Keys Are Preferred
Using natural keys simplifies data integration and makes it easier to identify and link data across systems.
Advantages of Natural Keys:
Consistent across systems
Easy to understand and verify
Helps with passive integration — linking records from different systems automatically
Example:
If you have a customer with a company registration number appearing in multiple systems, you can easily integrate data from those systems using the registration number as the business key.
Challenges with Natural Keys:
Not all systems store natural keys.
Different systems may use different formats for the same natural key.
For global businesses, natural keys like VAT numbers may not be consistent across countries.
When Natural Keys Aren't Available
If no suitable natural key exists, you may have to rely on system-generated keys. However, this introduces challenges in identifying and linking records across systems.
Example:
System A assigns Customer ID = 001
System B assigns Customer ID = 200
System C assigns Customer ID = 775
Without a shared natural key, you could end up with multiple entries for the same customer in your hub.
Handling Key Collisions with Collision Codes
To resolve conflicts when different systems use different keys for the same entity, you can use collision codes.
What’s a Collision Code?
A collision code is an additional attribute stored in the hub.
It identifies which system generated the key.
This allows you to distinguish between different keys from different systems.
Example:
System | Customer ID | Collision Code |
System A | 001 | A |
System B | 001 | B |
System C | 001 | C |
This allows you to tell that Customer ID = 001 in System A is different from Customer ID = 001 in System B or C.
Handling Shared Keys Across Systems
If one system creates a key and other systems use it, the collision code should reflect the originating system.
Example:
A customer is created in System A with ID = 123.
System B and C reference the same customer using ID = 123.
The collision code should be "A" to reflect the origin.
This ensures that the key is treated consistently across systems.
The "Golden Record" Approach
Once you've handled key collisions, you may want to define a golden record — a single version of the truth for each business concept.
Golden Record Strategy:
Choose one source of truth (e.g., System A).
Use business rules to determine the most reliable version of a record.
Link other keys to the golden record using Data Vault links.
This makes reporting and data analysis more reliable and consistent.
What If There's No Key at All?
Sometimes you encounter data sets without any identifiable key.
Example:
A list of shareholders in a company report
Data dump from legacy systems
How to Handle Missing Keys:
Attempt to construct a key using available attributes (e.g., name + address + date of birth).
If that fails, assign a surrogate key — a generated number to track the record.
Use business rules to map these records to existing keys where possible.
Key Takeaways
Business keys are essential for organizing and linking data in Data Vault.
Natural keys are preferred because they are meaningful and consistent.
When natural keys aren’t available, system-generated keys and collision codes help resolve conflicts.
The golden record approach creates a single source of truth for reporting and analysis.
When no key exists, use surrogate keys and business rules to create structure.
Understanding business keys is fundamental to mastering Data Vault. By carefully defining and managing keys, you ensure that your data is consistent, integrated, and ready for analysis.
