April 22, 2012 Leave a comment
Obey the principles without being bound by them.
- Bruce Lee
Taking a practical approach to developing a well-formed enterprise data warehouse – and by that, I mean one that is accurate, efficient and productive – involves basing it on sound design principles. These principles are specific to each sector of the reference architecture; each of which enables specific capabilities and serves specific functions. Here, I would like to lay out the principles of the Information Warehousing layer’s normalized central repository – the system of record.
The Information Warehousing layer is designed as a normalized repository for the data that has been processed “upstream”. It arrives cleansed, transformed and mastered; and is consolidated here into a single “System of Record”. The discipline of normalization restructures the data, removing it from the confines of the single perspective of the source system, into the multiple perspectives across the enterprise. The data is modelled according to its “essence” rather than its “use”.
This process of redesign imposes a strict order on the data that promotes data integrity and retains a high degree of flexibility. The way the data is broken apart into separate tables makes it challenging to query, but this is not its main purpose. Data integrity and flexibility are the primary goals, and tuning is geared towards load performance rather than data access.
- Data Integrity
Take a proactive stance to protect referential integrity and reduce instances of redundancy or potential for inconsistency.
Allow for increases in volumes or additional sources of existing information, both within subject areas (e.g., issuers, counterparties) and core concepts (e.g., parties, locations).
Allow for additional sources or changes in existing sources, so that design is not tied to a given source or mirrors the source. Design will give primary consideration to reuse, then extension and finally modification of existing structures.
Apply standard patterns for data design to promote efficiencies of data and ETL design. The decision-making process will be expedited as will data modelling work and ETL development.
Focus efficiency on three aspects:
Use of repeatable patterns for data design will minimize data modelling and ETL work effort.
Ease ongoing maintenance by keeping the number of data objects to a minimum; maintain consistent standards; and apply logical structures for ease of navigation and use.
- Load Performance
Priority given to performance of ETL load processes; including those that use the System of Record as a source to load the Data Mart sector.
- Enterprise Perspective
For all data objects, retain and remain open to, a full range of existing and potential relationships between entities to ensure that data reflects an enterprise perspective that is not limited to the perspective of any given project’s requirements.
These foundational principles are implemented through strategies that impact storage of history, hierarchical structures, degree of normalization, classifications, surrogate keys and a number of other aspects of design. The principles form the criteria to judge the best approach to take in a given situation. It’s not always straightforward, even with the principles in place – at times one has to favour one principle over another (e.g., flexibility over load performance), but this list provides the guidance to frame the debate and take a considered approach.
As suggested by the opening quote, I’m not advocating a blind conformance to a set of rules; but, in my experience, one of the greatest obstacles to efficient and effective development is the decision-making process. Limiting the parameters of debate with intelligent guidelines can facilitate decisions being made quickly and correctly.
Feel free to suggest additions or amendments to this list of design principles for the System of Record.