Data Quality is important. Business performance depends on good decisions. And good business decisions need to be illuminated by data if management are to avoid making decisions in the dark.
For example – if sales are below target, then sales data may provide information on sales representative performance, by product, by region – a rich picture that can be analysed to suggest an hypothesis as to what is going wrong – which will then suggest an approach to fix the problem. Without data the problem may not have been identified, analysed, or solved. Data allows for a more scientific, informed approach to the decision-making process.
If decisions depend on data, then it is important for that data to have the right level of quality. Without quality then decisions may be made using incorrect information, leading to mistakes and potential business damage. Data quality can be described in terms of – completeness, timeliness, correctness, and relevance:
- data is complete when all relevant items are supplied – for example in an address, have we filled in all items such as post code, postal town, street, building number, or country
- timeliness – is related to how current the data is, does it reflect today’s truth, or is the data set several months out of date, accurate when captured but misleading for today
- correctness – does the data reflect the data subject’s real circumstances (is the address the subject’s real address, is the date of birth right, etc.) and are data items right in context (for an address, is it a real address, is it the right postcode for the address described by the other address components)
- relevance – is the data appropriate input to solve the problem at hand (address data may be useful, but it won’t give much insight to age-profiled data sets)
So given these attributes of quality, it is possible for data to be correct – yet also be incomplete, out of date, and irrelevant for the needs of a particular decision.
And here comes the challenge…
Many businesses are sitting on a pile of poor quality data. This data is locked away inside a number of different systems across the business. Getting at this data can be time consuming and costly – so decisions are made based on cursory data analysis, or gut-feel. Data quality issues are hidden below the surface.
When enterprise management systems and especially business intelligence tools are implemented, data becomes more widely available. The tools suck data out of systems and integrate it to form an enterprise-wide view of the operation. This is when the problems start – poor data quality is a barrier to integration. First, data is difficult to collate into a sensible enterprise-wide view, customer records in different systems don’t match, addresses don’t agree, classifications are imprecise. Second, when data is integrated it throws out spurious results – because data isn’t classified correctly, or totals are incorrect.
Decisions based on such poor quality data are dangerous. Clearing up confusion caused by poor quality can be costly. And data protection laws expect you to hold high-quality data.
So the only answer is to clean the data up. Invest time to verify, validate, cross-reference, and update all of the major data sets – customers, products, organisation structures, regions, employees, financial transactions, and so on. This is a step-wise programme of work, implementing controls, cross-referencing, cleaning, archival, and process change. Measuring data quality and reporting a data quality scorecard will show quality indicators move over time as quality improves, and show the impact of each quality control mechanism as it is implemented.
What techniques can be used to drive improved data quality?
- Data capture – make data mandatory at data capture and validate it. If address information is important, don’t let records be captured without address data, and ensure complete rather than partial address data is captured. Operator education may be needed. Make sure date ranges are reasonable, ages sensible, that names don’t contain non-alpha characters, etc. Remember that there is a balance between rigour and usability.
- Look up – where data exists elsewhere in a master look it up. If you have a master customer record set then allow customer order systems to look up and find customer records and copy data across, rather than ask for it to be re-entered. Create master record sets as part of the cleaning strategy.
- Reference sets – use publically available, or in-house, reference sets. Publish these sets as http: URIs (or restricted access https: addresses for secure data sets). Use xml or rdf technology to publish data at each URI for human, or machine-readable content. Examples include address master sets, information sets published by governments or other open data set bodies who want to promote a standard.
- Quality calculators – develop software that measures quality, calculating the number of records with missing data fields, duplicate data, or other quality indicators. Plot graphs and trends over time to track quality improvement.
- Quality smells – some data may be strictly correct but unusual. It may need manual intervention to check it out and either correct or confirm the data item. For example – female names for male gender, ages out of expected ranges, one name only, very long names, people with the same name and address who claim to be a new customer, etc. Use prompts on data capture to confirm data is correct, and use back-office software checks to identify these smells for manual follow-up.
- Master data management – a whole discipline involved in taking active management control over company data, its quality, utility, commonality and suitability for use across an enterprise. This can be a hard sell to senior management to fund a MDM post and activity. However where data is important for a business the comprehensive approach it brings to data management really pays back.
- Archive old data sets rather than try to clean them up. Or move the data sets to a demarcated area where it is clear the data quality may be suspect, or that data complies with older data quality standards.
With coordinated effort, over time, data quality can improve, and so will the quality of the analysis carried out – so that the right decisions can be made, there will be less confusion over conflicting or misleading records, and the business will be able to demonstrate it is in control of its own data.
Recent Comments