Gamified Data Governance and Employee Engagement

ImageGamified Data Governance (GDG) refers to an approach to governance that applies game mechanics to increase participation levels and enable performance monitoring. In a GDG program, the responsibility for Data Governance is taken on by the business and IT user community as a whole.

The process uses gamification features, such as points, levels, and awards, to set group targets and monitor the progress of the program through measurable sets of activities and milestones. The first blog in this series defined the need for data governance gamification (Gamified Data Governance: Defining the Need). The second blog introduced how a GDG program can help facilitate an organization’s progression through the levels of the capability maturity model (Gamified Data Governance and the Maturity Model). This blog looks at how gamification addresses the problem of employee engagement.

Employee involvement in data governance programs has traditionally faced a number of challenges:

  • Complexity. This is a general problem with technology, which presents users with ever changing sets of controls, often involving multi-step processes. The array of features on offer can pose a barrier to new users.
  • Scope. The scale of the undertaking can be daunting. Business glossaries are comprised of thousands of terms, each categorized, defined and associated with business policies and rules.
  • Ownership. A shared understanding of business language is essential to the accuracy, efficiency, and productivity of the enterprise. How information is communicated and received is a shared responsibility. When employees have no ownership of that language, communication breaks down.

Gamified data governance addresses these concerns directly:

  1. Keeping it Simple. The gamified approach narrows the focus of user activities, according to the goals of the program. This is done through assigning specific tasks according to roles and through realignment of the user interface to those roles.
  2. Targeting Goals. The full scope of the program is broken down into more manageable units. This can align with strategic goals, such as focusing on a given subject area (for example, Customer information) or focusing on a specific aspect of governance (for example, ensuring that all terms have corresponding business policies identified). A key part of GDG is that these targeted goals are measurable.
  3. Engaging Users. Gamified data governance is a collective activity, with immediate feedback and acknowledgement of each participant’s contribution. The program invites active involvement in making terminology clear and concise, thus providing ownership of the process to participants.

The GDG program revolves around the activities of 3 groups of users, each with a distinct set of tasks to perform.

Group 1: Data Stewards

The primary goal of the data stewards is to enter business terms into the business glossary.  Data stewards are presented with a form allowing them to enter, modify, or delete various aspects of a term. They can also load batches of terms at once.

Group 2: Data Owners

It is the data owners’ job to ensure the quality of business glossary entries. The data owners review and approve or critique selections and entries made by the data stewards. As part of the gamification workflow, these actions are automatically fed back to the data stewards, with email alerts and running scores reflected on a leaderboard.

Group 3: IT Stewards

It falls to the IT stewards to attach data assets to terms. The IT stewards link data points in reports and databases to the business terminology entries. This is an essential step to relate business concepts to usage and to be able to measure data quality in light of the meaning and rules associated with it.

In Loyalty 3.0, Rajat Paharia outlines 5 intrinsic motivators that serve to encourage people to do something for its own sake, rather than because of some external driving force compelling them to do it. He argues that these key motivators are inherent in gamification.

  1. Autonomy. The gamified approach places governance in the hands of the stakeholders, democratizing the process and empowering the participants to define business terminology.
  2. Mastery. Determining names and definitions are creative acts, requiring communication skills that improve with practice.
  3. Purpose. Data governance is a business imperative, and so the purpose of the program is built in.
  4. Progress. The program builds up a body of work, the business glossary, and participants see the measurable progression of their efforts, both in the gamified metrics, (e.g., counting the number of terms being added) and in the daily use of the business glossary throughout the organization.
  5. Social Interaction. Participants in the program engage in the work together, in part through division of labor and also collaboratively through peer reviews. Combining a competitive element through teams and individual acknowledgment also contributes to social interactivity.

Gamification can help to overcome some of the biggest challenges to a successful data governance program: simplifying the process, providing management levers, and engaging users. The GDG program is open to customization and experimentation, providing a mechanism to align an actively engaged set of users with strategic data governance goals.

Stream Integration Data Governance specialist, Eric Landis, was consulted for this article.

This article was originally published on the Stream Integration website.



Gamified Data Governance and the Maturity Model

DSCN8820There’s a classic maxim applied to many games: a minute to learn, a lifetime to master. In taking a gamified approach to data governance (GDG), it would be wise to keep this in mind. The steps are relatively simple, with the end goal being to bring order to complex systems.

Gamified Data Governance (GDG) refers to an approach to governance that applies game mechanics to increase participation levels and enable performance monitoring. In a GDG program, the responsibility for Data Governance is taken on by the business and IT user community as a whole.

The process uses gamification techniques, such as points, levels, and awards, to set group targets and monitor the progress of the program through measurable sets of activities and milestones. The need for the gamification of data governance has been discussed. This blog introduces how a GDG program can help facilitate an organization’s progression through maturity levels.

Data Governanace Capability Maturity Model

Many organizations follow the five levels of the Capability Maturity Model, which takes a set of business processes from an initial ad hoc state, to one that is managed, measured, and continually improving. The five levels of maturity, in the context of data governance, are:

  1. Initial. This level is characterized as being chaotic, driven through the singular efforts of individuals. At this stage, business users become aware of inconsistencies in their understanding of report fields, leading to siloed data quality requirements. Efforts tend to be piecemeal and uncoordinated, though individuals work hard to remedy problems as they arise.
  2. Repeatable. At this level it is possible to repeat documented processes. Processes to address data concerns are captured in guidelines and manuals; basic checklists have been created for data design, data quality assurance, and ETL error handling. However, there remains a minimal level of formality.
  3. Defined. At this level standard practices being established, documented, and managed. Data governance is formally embedded in the workflow processes of the data management development lifecycle, from business requirements gathering through analysis, data quality assurance, data design and integration, and business intelligence.
  4. Managed. Beyond establishing defined processes, this level incorporates measuring results. Here, the efficacy of data governance processes can be measured in terms of data accuracy, data management efficiencies, and overall productivity. The value of the organization’s data, as well as the quality of the governance being applied, is continually assessed and remediated.
  5. Optimizing. The last level ensures a continual evolution of the process management by striving for ongoing improvement, documentation, enforcement and measurement.

Gamification and the Capability Maturity Model

Gamification of data governance provides a platform on which to build towards an optimizing level of governance. It does this through a number of means:

  • GDG recognizes individual contributions, providing instant feedback in the form of points, badges, and achievement levels. This aspect of gamification acknowledges the heroic aspects of the initial maturity level.
  • GDG processes are repeatable, with an interface that isolates specific tasks. This is partly a byproduct of engineering the process to facilitate scoring, but it serves to clarify and streamline the processes as well.
  • GDG processes are explicitly defined, in terms of the steps to be taken and the roles of participants. Built-in to the gamification are checks and balances of peer reviews and workflow controls.
  • Also part of the gamified aspect of governance is the ability to measure results in terms of raw productivity numbers, levels of participation, and impact to data quality and usage. The program is designed to be managed, with capabilities to guide user activities by setting and monitoring specific goals.
  • With management controls in place and active participation from data owners, data stewards, and IT stewards, GDG is positioned to be continuously optimizing, putting the focus on the most valuable activities, tweaking the generation of data policies and rules, refining business terminology, and managing data assets with some level of dexterity.

A gamified data governance program may not jump the organization to level 5 out of the box, but it will provide all the pieces to get there faster. With business and IT playing from the same rulebook, data governance can be mastered.

Stream Integration Data Governance specialist, Eric Landis, was consulted for this article, which was originally posted on the Stream Integration website.

Data Design Principles

Obey the principles without being bound by them.

– Bruce Lee

Taking a practical approach to developing a well-formed enterprise data warehouse – and by that, I mean one that is accurate, efficient and productive – involves basing it on sound design principles. These principles are specific to each sector of the reference architecture; each of which enables specific capabilities and serves specific functions. Here, I would like to lay out the principles of the Information Warehousing layer’s normalized central repository – the system of record.

The Information Warehousing layer is designed as a normalized repository for the data that has been processed “upstream”. It arrives cleansed, transformed and mastered; and is consolidated here into a single “System of Record”.  The discipline of normalization restructures the data, removing it from the confines of the single perspective of the source system, into the multiple perspectives across the enterprise. The data is modelled according to its “essence” rather than its “use”.

This process of redesign imposes a strict order on the data that promotes data integrity and retains a high degree of flexibility. The way the data is broken apart into separate tables makes it challenging to query, but this is not its main purpose. Data integrity and flexibility are the primary goals, and tuning is geared towards load performance rather than data access.

  1. Data Integrity
    Take a proactive stance to protect referential integrity and reduce instances of redundancy or potential for inconsistency.
  2. Scalability
    Allow for increases in volumes or additional sources of existing information, both within subject areas (e.g., issuers, counterparties) and core concepts (e.g., parties, locations).
  3. Flexibility
    Allow for additional sources or changes in existing sources, so that design is not tied to a given source or mirrors the source. Design will give primary consideration to reuse, then extension and finally modification of existing structures.
  4. Consistency
    Apply standard patterns for data design to promote efficiencies of data and ETL design. The decision-making process will be expedited as will data modelling work and ETL development.
  5. Efficiency
    Focus efficiency on three aspects:

    1. Implementation
      Use of repeatable patterns for data design will minimize data modelling and ETL work effort.
    2. Operation
      Ease ongoing maintenance by keeping the number of data objects to a minimum; maintain consistent standards; and apply logical structures for ease of navigation and use.
    3. Load Performance
      Priority given to performance of ETL load processes; including those that use the System of Record as a source to load the Data Mart sector.
  6. Enterprise Perspective
    For all data objects, retain and remain open to, a full range of existing and potential relationships between entities to ensure that data reflects an enterprise perspective that is not limited to the perspective of any given project’s requirements.

These foundational principles are implemented through strategies that impact storage of history, hierarchical structures, degree of normalization, classifications, surrogate keys and a number of other aspects of design. The principles form the criteria to judge the best approach to take in a given situation. It’s not always straightforward, even with the principles in place – at times one has to favour one principle over another (e.g., flexibility over load performance), but this list provides the guidance to frame the debate and take a considered approach.

As suggested by the opening quote, I’m not advocating a blind conformance to a set of rules; but, in my experience, one of the greatest obstacles to efficient and effective development is the decision-making process. Limiting the parameters of debate with intelligent guidelines can facilitate decisions being made quickly and correctly.

Feel free to suggest additions or amendments to this list of design principles for the System of Record.

Nine ETL Design Principles

The principles of ETL design define the guidelines by which data migration will be constructed. Below are 9 principles to guide architectural decisions.

1. Performance
In almost all cases, the prime concern of any ETL implementation is to migrate the data from source to target as quickly as possible. There is usually a load “window” specified as part of the non-functional requirements; a duration of time that is available to complete the process. The constraints are based on either the availability of the source system data, the need for the business to have access to the information, or a combination of both.

2. Simplicity
As with all programming, a premium is placed on simplicity of design. This is in the interests of productivity of development time, consideration of ongoing maintenance, and a likely improvement in performance. The fewer steps involved, the less chance of mistakes being made, or places for things to go wrong. When changes need to be made, or fixes applied, the fewer touch points, the better. During the life of the processes, ownership will likely change hands. The system’s simplicity will aid clarity for those who need to take it on.

3. Repeatability
One needs to be able to re-run jobs to achieve consistent and predictable results each time. This means it needs to be applicable to all relevant incoming sources, and in no way dependent on specific time parameters. If sources change, the process needs to handle those changes gracefully and consistently.

4. Modularity
Units of work should be designed with an eye to repeatable patterns, interdependencies and discreet operations that function in isolation. The goal is to have as few modules as possible to be applied as templates to all future development work. This principle assists with clarity and efficiency of design, as well as reusability.

5. Reusability
As a principle, the goal should be not only to repeat modular patterns, but where possible re-use existing jobs and apply parameterization. This optimizes the efficiency of development and reduces the cycles required for testing.

6. Extensibility
Rather than “bring everything” from a given source when a data migration process is first built, it should be possible to include only that which is identified as valuable to the business in the context of a given project or release cycle. Over time, additional data elements from the sources can be added to the ETL jobs, with potentially, new targets. The ETL job should take this iterative approach into account.

7. Revocability
This refers to the ability to reset the database after a run and to return to the state it was in prior to running the process. This will be important for testing cycles during the development process, but also in production, in the event the database becomes corrupted or requires rolling back to a previous day.

8. Subject-orientation
Workloads are to be divided into units based on business subject areas rather than source-system groupings or strictly target table structures. This recognizes that a given source table may contain information about more than one subject area (e.g., Customers and Accounts) In addition, a given subject area may be composed of multiple source tables, which may populate multiple targets. Simply limiting the ETL jobs to a single source and a single target may compromise the other principles, particularly performance and simplicity. Similarly, orienting the ETL jobs to either the source or target layouts may degrade the efficiency of the design.

9. Auditability
It is essential to be able to trace the path that data takes from source to target and be able to identify any transformations that are applied on values along the way.

With these guiding principles, specific strategies can be employed with a set of criteria to judge their applicability.

I would be interested to hear feedback on this list; and am open to any additions.

In Defense of Surrogate Keys

Employing surrogate keys is an essential strategy to protect the integrity of data assets in the EDW. For those looking to defend such a position against those who would prefer human-readable business keys (a.k.a. natural keys), this article lays out some arguments. I hope by the end the non-believers will be converted, and those who already know the value of surrogates can move ahead without opposition.

Michelle A. Poolet has provided four criteria to determine whether a surrogate key is required:

1. Is the primary key unique?

Where a natural key is not unique, such as a person’s full name, a surrogate key must be used. However, a unique composite alternate key will be needed to match on the record for updates, to attach attribution or establish relationships.

2. Does it apply to all rows?

Take an example such as Customer. Customers may all come from the same source and be loaded into a Party table. However, parties that play different roles and that are loaded from different source systems, may have natural keys of different formats (e.g., company business number, vendor identifier)

3. Is it minimal?

Composite keys and long descriptive fields are obvious candidates for transformation to surrogate keys.(e.g., Postal Address)

4. Is it stable over time?

Identifiers generated by a Master Data Management (MDM) solution are designed not to alter over time. However, if the MDM solution was to change technology, or some sources were to be passed into the EDW through a different channel, the MDM key would be problematic. It is preferable for surrogate keys to be generated internally to the Information Warehousing layer.

Rationale for preference of Surrogate Keys over Natural Keys: 

# Issue Natural Key Surrogate Key
1 Primary and Foreign Key Size Natural keys and their indexes can be large, depending on the format and components involved in making a record unique. Surrogate keys are usually a single column of type integer, making them as small as possible.
2 Optionality and Applicability The population of natural keys may not be enforced in the source data. Surrogate keys can be attached to any record.
3 Uniqueness Natural keys always have the possibility of contention. Even where a single source has protected against it, contention may exist across sources. Surrogate values are guaranteed to be unique.
4 Privacy Natural keys may be prone to privacy or security concerns. Surrogate keys are cryptic values with no intrinsic meaning.
5 Cascading Updates If the business or source changes the format or datatype of the key, it will need to be updated everywhere it is used. Surrogate values never change.
6 Load Efficiencies With the exception of very short text values, VARCHAR values are less efficient to load than INTEGER. Surrogate values can be loaded more efficiently.
7 Join Efficiencies With the exception of very short text values, joins using VARCHAR values are less efficient than INTEGER. Surrogate values can be joined more efficiently.
8 Common key strategy A lack of a common key strategy will lead to longer decision-making process and more patterns for code development. A common key strategy based on the use of surrogates will streamline design and development.
9 Relationships Natural keys with different data types will lead to the creation of more objects in the database to accommodate the associations between entities. The use of surrogates promotes reuse of associative entities for multiple types of relationships.
10 History Related to points above, the use of natural keys inhibits the flexibility and reuse of objects and patterns to support change capture. Surrogate keys enable a strategic design to store history using associatives.

The list above is drawn in part from an article by Lee Richardson, discussing the pros and cons of surrogate keys.

In summary, the reasons to employ surrogate keys as a design policy are related to two factors

1. Performance

In the majority of cases the surrogate key will perform more efficiently for both loading and joining tables.

2. Productivity

Even in the minority of cases where a natural key will perform as well as a surrogate, by adopting a consistent surrogate key policy there are benefits to efficiencies gained in design, development and maintenance.