architecture

Its the Data Quality, Stupid

Written March 13th, 2012
Categories: architecture, business intelligence, management
No Comments »

Data Quality is important. Business performance depends on good decisions. And good business decisions need to be illuminated by data if management are to avoid making decisions in the dark.

For example – if sales are below target, then sales data may provide information on sales representative performance, by product, by region – a rich picture that can be analysed to suggest an hypothesis as to what is going wrong – which will then suggest an approach to fix the problem. Without data the problem may not have been identified, analysed, or solved. Data allows for a more scientific, informed approach to the decision-making process.

If decisions depend on data, then it is important for that data to have the right level of quality. Without quality then decisions may be made using incorrect information, leading to mistakes and potential business damage. Data quality can be described in terms of – completeness, timeliness, correctness, and relevance:

  • data is complete when all relevant items are supplied – for example in an address, have we filled in all items such as post code, postal town, street, building number, or country
  • timeliness – is related to how current the data is, does it reflect today’s truth, or is the data set several months out of date, accurate when captured but misleading for today
  • correctness – does the data reflect the data subject’s real circumstances (is the address the subject’s real address, is the date of birth right, etc.) and are data items right in context (for an address, is it a real address, is it the right postcode for the address described by the other address components)
  • relevance – is the data appropriate input to solve the problem at hand (address data may be useful, but it won’t give much insight to age-profiled data sets)

So given these attributes of quality, it is possible for data to be correct – yet also be incomplete, out of date, and irrelevant for the needs of a particular decision.

And here comes the challenge…

Many businesses are sitting on a pile of poor quality data. This data is locked away inside a number of different systems across the business. Getting at this data can be time consuming and costly – so decisions are made based on cursory data analysis, or gut-feel. Data quality issues are hidden below the surface.

When enterprise management systems and especially business intelligence tools are implemented, data becomes more widely available. The tools suck data out of systems and integrate it to form an enterprise-wide view of the operation. This is when the problems start – poor data quality is a barrier to integration. First, data is difficult to collate into a sensible enterprise-wide view, customer records in different systems don’t match, addresses don’t agree, classifications are imprecise. Second, when data is integrated it throws out spurious results – because data isn’t classified correctly, or totals are incorrect.

Decisions based on such poor quality data are dangerous. Clearing up confusion caused by poor quality can be costly. And data protection laws expect you to hold high-quality data.

So the only answer is to clean the data up. Invest time to verify, validate, cross-reference, and update all of the major data sets – customers, products, organisation structures, regions, employees, financial transactions, and so on. This is a step-wise programme of work, implementing controls, cross-referencing, cleaning, archival, and process change. Measuring data quality and reporting a data quality scorecard will show quality indicators move over time as quality improves, and show the impact of each quality control mechanism as it is implemented.

What techniques can be used to drive improved data quality?

  • Data capture – make data mandatory at data capture and validate it. If address information is important, don’t let records be captured without address data, and ensure complete rather than partial address data is captured. Operator education may be needed. Make sure date ranges are reasonable, ages sensible, that names don’t contain non-alpha characters, etc.  Remember that there is a balance between rigour and usability.
  • Look up – where data exists elsewhere in a master look it up. If you have a master customer record set then allow customer order systems to look up and find customer records and copy data across, rather than ask for it to be re-entered. Create master record sets as part of the cleaning strategy.
  • Reference sets – use publically available, or in-house, reference sets. Publish these sets as http: URIs (or restricted access https: addresses for secure data sets). Use xml or rdf technology to publish data at each URI for human, or machine-readable content. Examples include address master sets, information sets published by governments or other open data set bodies who want to promote a standard.
  • Quality calculators – develop software that measures quality, calculating the number of records with missing data fields, duplicate data, or other quality indicators. Plot graphs and trends over time to track quality improvement.
  • Quality smells – some data may be strictly correct but unusual. It may need manual intervention to check it out and either correct or confirm the data item. For example – female names for male gender, ages out of expected ranges, one name only, very long names, people with the same name and address who claim to be a new customer, etc. Use prompts on data capture to confirm data is correct, and use back-office software checks to identify these smells for manual follow-up.
  • Master data management – a whole discipline involved in taking active management control over company data, its quality, utility, commonality and suitability for use across an enterprise. This can be a hard sell to senior management to fund a MDM post and activity. However where data is important for a business the comprehensive approach it brings to data management really pays back.
  • Archive old data sets rather than try to clean them up. Or move the data sets to a demarcated area where it is clear the data quality may be suspect, or that data complies with older data quality standards.

With coordinated effort, over time, data quality can improve, and so will the quality of the analysis carried out – so that the right decisions can be made, there will be less confusion over conflicting or misleading records, and the business will be able to demonstrate it is in control of its own data.

Requirements Have Levels – Sorting and Linking

This is article 2 on the use of requirements management techniques for business intelligence projects.

When you try to assemble requirements for your new project, to agree what it is you are going to do and not going to do (whether you use traditional or agile approaches to requirements elicitation) you end up with all kinds of statements. These can be high level statements of intent (‘we’d like to work faster’), constraints (‘it has to be like this’), low level detail (‘make sure the end date value if provided is always greater than the start date’), and so on. Most logically minded analysts will try to sort these statements out by grouping them together using some scheme or other.

I use the following structure for all my business change projects.

1) I classify requirements into the following levels of detail:

  • strategic requirements – what is set out in the business strategy, in terms of targets set, action plans agreed to deliver these targets, resourcing constraints, and risks that need to be managed
  • business constraints – things that constrain any solution because of the way the business is operated, for example – legal constraints (compliance with legislation), parts of the business that are considered out of scope but which have an interface with the solution in some way
  • solution requirements – what the project will deliver, the solution at a high-level of detail, the main features as seen by an outside observer
  • module requirements – a component of the solution, the requirements for that module and how the module interacts with other modules or the business to deliver the solution (one set of requirements per module, requires some architectural work to identify the components and how they interact)

I maintain a separate list for each level. At the simplest, use a spreadsheet with columns to capture meta data about the requirements (date, owner, approval status, tranche, annotations, previous phrasing, etc) – but eventually a specialist requirements tool becomes essential. Typically there are a few strategic requirements, more business constraints, and a growing number of requirements as the list progresses down into solution and module levels. I capture both stories and more classical forms of requirement (shall, should) the same way. Strictly speaking, stories don’t need formal treatment like this for development purposes – however I find that traceability (see below) helps deal upwards in the organisation when explaining the case for a system, or defending against arbitrary cuts.

2) Requirements may be further classified as:

  • functional – a feature, what the system should be able to do, expressed as a binary condition, either the delivered system can or cannot do this, there is no partial delivery of a function
  • non-functional – a measured quality of the solution, e.g. volumes, responsiveness, mean time to failure, security, etc.; these usually require architectural design and become a parameter for all other requirements at that level, non-functional requirements can usually be measured – so it is possible to deliver 50% or 150% of a target value
  • constraint – a requirement that a solution is implemented in a particular way, imposed by legislation, branding, policies, or user preference
  • architectural (optional) – setting out how the solution must be designed in architectural terms, artefacts that only make sense in developing an architectural capability

There are uses for all types of requirement, however the majority of requirements should be functional – what the solution should do, rather than the other three types.

3) Establish traceability – within levels and across levels.

  • within levels there may be traceability, for example:
    • some strategic requirements are consequences of other, higher-order strategic requirements
    • for example – the business may wish to improve share price by 50%, so it decides to increase revenue by 20% to contribute to the valuation target, it may subsequently decide to transform the sales process to deliver the 20% growth
    • in the example we have three requirements that map back onto each other – the leaf (end of chain, lowest level) which is to transform sales, a branch – to improve revenue, and a root (top of the tree) to improve share price; try to keep a clean hierarchy, avoiding loops or circular requirement dependencies
    • note that at this level no business intelligence requirements are expressed – we are only interested in strategic-type requirements, which stop at the decision to change the sales process
  • across levels traceability:
    • all leaf requirements (those with no downstream dependencies) can be traced across into a lower level
    • strategic requirements may trace across to business requirements or solution requirements, business requirements may trace to solution requirements, solution may trace down into module requirements
    • each time a level is spanned some design work is needed to decide how the higher level requirements will be delivered, the link across can be delivered in many ways, it is for the solution designer to work out how
    • for example – if the strategy is to transform the sales process to deliver 20% growth, then the solution must decide how the transformation will work and specify the new sales process in enough detail to show how the link back and benefits can be delivered
    • at this point we may add a business intelligence requirement - competitor intelligence may be needed to feed the new sales process

4) Use the requirements template to show traceability:

  • if we use the format ‘As a <role> I need to <do what> in order to <deliver what benefit>’ we have a natural phrasing that can express dependencies – the following chain traces across from Strategic to Solution requirements:
    • as a Director I need to grow valuation by 50% in order to deliver the strategic plan
    • as a Sales Director I need to grow revenue by 20% in order to grow valuation by 50% (in practice there may be several changes planned to deliver the valuation target)
    • as a Sales Director I need to implement a new sales process in order to grow revenue by 20%
    • as a Sales Director I need to better deploy the sales staff in order to support the new sales process
    • as a Sales Manager I need to plan sales tactics to better deploy the sales staff
    • as a Sales Manager I need to analyse competitor activity across our market segment so I can plan sales tactics
  • These all dovetail into each other neatly using the phrasing provided. Note that where two or more conditions need to be met to deliver an outcome they can linked using concatenation (and, or, nor), or the construction of intermediate requirements if this concatenation is complex

5) Using the results:

  • many projects forget to capture the strategic requirements – these are important because they are related to the management justification for doing the work, and help produce a logical and powerful business case
  • the traceability helps impact assess – when we decide to drop a feature we can see, through the traceability, what solution, business or strategic impacts may occur (judgement has to be made to assess the degree of impact, sometimes it places delivery at increased risk, other times it completely blocks delivery of some benefits)

Enterprise Debt – Architecture Gets the Banking Treatment

Written January 28th, 2012
Categories: architecture, product management, strategy
No Comments »

If you’ve taken a look at agile forms of software development you may have come across the concept of Technical Debt.

Wikipedia defines Technical Debt as: “…referring to the eventual consequences of poor software architecture and software development within a codebase.

Common causes of technical debt include (a combination of):

    • Business pressures, where the business considers getting something released sooner is of more value than avoiding technical debt
    • Lack of process or understanding, where businesses are blind to the concept of technical debt, and make decisions without considering the implications
    • Lack of building loosely coupled components, where functions are hard-coded; when business needs change, the software is inflexible.
    • Lack of documentation, where code is created, but may be difficult or time consuming for anyone other than the author to understand, as functions are not documented

“Interest payments” are both in the necessary local maintenance and the absence of maintenance by other users of the project. Ongoing development in the upstream project can increase the cost of “paying off the debt” in the future.

Best Practice in paying down technical debt is to refactor code as part of ongoing development.”  (Wikipedia)

This is a very useful concept – and one that certainly resonates with my experience of coding and software development. I like this because it allows for reality – yes you will cut corners, yes the architecture can get out of kilter, and the result is technical debt. This is a normal situation because of the business reality within which all development occurs – yet to avoid getting into too much debt, it needs to be paid off, by investing in technical tidying up from time to time.

Steve McConnell has an excellent article on technical debt that classifies it – depending on how intentional the debt was and how the debt was incurred. This tries to put the debt into business terms – an exchange of quality for time. Cutting corners leads to a reduction in quality, yet time to complete a project is reduced. The quality needs to be fixed later (paying down the debt) but doing so incurs additional costs (the cost of tearing down the area of poor quality and then building it right, and the cost of building it wrong in the first place).

All this is very well – however I’ve recently come across the term of Enterprise Debt (see PEAF for a slide show).

This is used in the context of business architecture. Business architecture gives the design of a business area or function. It contains processes, organisation units, competencies, IT systems, etc and explains how they all fit together.

As an intellectual exercise, business architecture is similar to software design. And therefore it too is subject to debt – except in this case under the title ‘enterprise debt’.

We’ve all worked in organisations where management have taken short cuts in setting up the organisation – where processes may be inefficient or just plain wrong, the team lacks the right skills, or any one of a whole range of everyday niggles about how the business functions. And these niggles result in the business just not performing to its ultimate capability – there is some kind of system yield or impedance at work causing loss in efficiency and effectiveness. This can be quite significant. If, for example you are only running at 70% effectiveness, and 70% efficiency – then that means you are losing out over 50% of your full potential. A business could double performance if only it could architect its business to become 100% effective and efficient (a laudable wish, but not really possible). Yet – surely we can improve on the 70/70 performance – and a modest investment that leads to a small improvement in performance usually offers an excellent ROI.

This whole concept has changed how I look at those difficult discussions with management – why we need to spend time doing maintenance instead of developing new things. The language of debt and the consequences of bad debt are an easy one to communicate and the organisation benefits from having an adult discussion about the options rather than the pointing and blaming that often goes on in its place. Take a look at the links and see what you think.