Data Warehouse Quality Management

 


By Laura Hadley

This article examines ways in which organizations can think about quality in the data warehouse environment, as well as ideas for possible metrics to measure quality and extend warehouse value. In addition to their involvement in data warehouse research and consulting, David Wells and James Thomann are instructors with The Data Warehousing Institute in Gaithersburg, MD.

David Wells and James Thomann are serious about data warehouse quality. They define “quality,” however, not in terms of data, although it is important. To them quality refers to a bigger picture, the act of measuring the progress of the data warehouse in terms of its ability to satisfy your customer base. In addition, they emphasize the best time to begin building in quality is before the warehouse is first developed. You measure what's been done, see where you are, make adjustments, and plan the next iteration using the measurement data.

Citing the work of Larry English, Wells claims that quality is meeting customer expectations but not exceeding them. This is because it costs more to get to a higher level of quality, but getting there doesn't necessarily give you a higher value or return.

“Why measure?” he asks. “Without measuring, everything is subjective. You don't know if you're improving or degrading over time.” Data warehousing is really a process, so process-oriented measures should be used. They measure “what activity,” and for how long, Wells says. “These are opposed to product measures, which might measure volumes of data or accesses to the data warehouse.” These individual measures feed into larger sets of metrics, which are surrounded by an overall data warehouse quality program.

Quality, Measurement, and Data Warehousing

In fact, Wells says that Philip Crosby (author of Quality is Free) wasn't telling the whole truth: “Quality is not free. Still, measurement doesn't cost as much as bad quality.” Thomann adds that it costs less to start a measurement program once and sustain it than spending time collecting ad hoc, uneven information. However, quality measurement does incur costs in the way of planning, implementation, ongoing measurement, and re-planning—when today's problems go away and tomorrow's problems need attention. “Re-planning” is especially important, because the last thing an organization needs is for the data warehouse to become tomorrow's legacy system.

It's critical to understand where the business value is in data warehousing. A data warehouse is a solution for businesspeople to make decisions and take the right action. Given this definition, the primary measurement is whether you have business impact as a result of the warehouse. It's not something that you want to do just for the sake of doing it; over time, you'll have to show business impact to upper management, says Thomann.

Of course, it goes without saying that a data warehouse is not a silver bullet for IT or the business. “It will not solve a backlog of one-time reports,” warns Thomann. “A data warehouse consists of a lot of pieces of technology; not one single technology. You can't buy it. Even with SAP, you have to customize.”

Data warehouses also need to evolve over time, so they are incrementally built. “It really is an evolutionary process,” Thomann says. “You can't afford a one- or two-year project anymore, but you can't put a complete, large-scale data warehouse up in three to six months, either.”

Ways of Looking at Measurement

To be successful throughout the process, it's important to know what you did right and wrong. In essence, you need “meta data” about what you're doing to accurately measure quality. Successful measurement is the key to warehouse quality, but how do you measure a data warehouse? Wells and Thomann offer a number of different methods, dimensions, and levels for understanding data warehouse measurement. To begin, Thomann describes three “types” of success and their definitions as they relate to data warehousing:

  • Economic success - the data warehouse has a positive impact on the bottom line.
  • Political success - people like what you've done. If the data warehouse isn't used, it's obvious that you failed politically.
  • Technical success - this is the easiest to accomplish. However, don't overwhelm your users with too much technology. Success also means that the chosen technologies are appropriate for the task and are applied correctly.

Since there are three ways to succeed, Wells responds, there are three ways to fail. Quality, he says, can be defined as the degree of excellence of something. “But this is a very subjective measure, and you can make it more subjective through measurement.” Three main areas are detailed in the paragraphs below, which can be used to assess the overall quality of a data warehouse.

Business Quality

Directly related to economic success, business quality is the ability of the data warehouse to provide information to those who need it, in order to have a positive impact on the business.

Business quality is made up of business drivers, or concepts that point out a company's strategic plans. So organizations should be concerned with how well the data warehouse helps accomplish these drivers, such as changing economic factors, environmental concerns, and government regulation.

Does the data warehouse align with business strategy, and how well does it support the process of strengthening core competencies and improving competitive position? What about the enablement of business tactics? Does the data warehouse play a tactical role, so that it makes a positive day-to-day difference?

Information Quality

Information doesn't have value if it's not used. Therefore to have information quality, the focus should be on the integration of information into the fabric of business processes, not on data quality itself.

Information quality is the key to political success, which was described above as people actually using the data warehouse. “Some companies don't tell users it's there,” Thomann says, “so they may not understand the warehouse or know how to use it.” Success in this area means providing awareness, access tools, and the knowledge and skills to use what they're given. For example, could your users easily make the shift from using greenbar reports to a multidimensional data model? Then, assuming they understand the warehouse, can they get to the data easily? Who gets the data? How frequently? How and when is the data used? You may be able to provide 24x7 access, but what if users are at home?

Wells and Thomann believe that information quality also encompasses data quality and performance. This can be a problem area, because everyone is happy if they can get their data overnight. Then they want it in half a day. Then immediately. So these expectations must be managed.

Technical Quality

Technical quality is the ability of the data warehouse to satisfy users' dynamic information needs. Wells describes four important technical quality factors. The first is “reach,” or whether the data warehouse can be used by those who are best served by its existence. In today's information-dependent business climate, organizations need to reach beyond the narrow and typical customer base of suppliers, customers, and a few managers.

“Range” is also important. As its name implies, this defines a range of services provided by the data warehouse. In general, these include “What data do I have, and, can I get the data?” For example, Web enablement, such as Hotmail, are services which allow users to get information from wherever they are.

“Manuverability” is the ability of the data warehouse to respond to changes in the business environment. The data warehouse doesn't remain stable, so manuverability becomes particularly important. It is also the single most important factor not given attention today in data warehousing, according to Wells. Manuverability sub-factors include managing:

  • users and their expectations,
  • upper management,
  • the overall business,
  • technology,
  • data sources, and
  • technical platform.

Finally, “capability” is an organization's technical capability to build, operate, maintain, and use a data warehouse.

Starting the Measurement Process

Thomann recommends that organizations apply the goal-question-metric (GQM) approach for measuring data warehouse quality. This approach encompasses the following steps:

  • Identify the type of impact you want to achieve with the data warehouse—business, information, or technical quality (from above).
  • Define quality goals that are specific to your business. These are specific statements that relate to the type of impact.
  • Generate questions to ask in order to know if you've achieved your goal. Some examples include questions relating to usage, response time, meeting the needs of users, errors, and on-time delivery of cubes.

Identify Quality Areas and Create Goals

Drawing from the main quality areas defined earlier, Wells suggests that organizations generate some distinguishing characteristics that would define quality for the data warehouse:

  • Business quality - focus on business drivers—those things that help a business achieve its overall goals.
  • Information quality - users know when and how the data warehouse can help them make business decisions.
  • Technical quality - this relates to “reach,” or the ability to access the necessary information in the warehouse.

Once you've defined three to five characteristics for each quality area, focus on creating some specific goals to implement them in your organization. Don't try to achieve them all, warns Wells; that would be too costly and non-achievable. Instead focus on the things that make the most sense to you. Let's say you defined a total of 10 quality characteristics (encompassing one or all of the three main quality areas). Set goals for a few of them. For each goal, come up with some questions to answer in order to determine progress of the goal in the data warehouse environment.

Describe Metrics and Measures

First of all, Wells believes that organizations should quit misusing the term “metric” and “measure.” A metric is actually a set of measures, or a methodology used to measure. Measures are the specific pieces of data you need to collect. In data warehousing, the general number of access to the data warehouse would be a metric. Measures, on the other hand, would be the number of specific accesses to SQL, accesses to certain data tables, etc.

He also points out the difference between objective measures and subjective measures. Objective measures can only measure those things which are “visible” and “countable” in the data warehouse process. Subjective measures are people's perceptions. Most organizations collect subjective data using surveys or user interviews, and they are no less valuable than objective measures. Of course, subjective measures are not as easily counted as objective measures.

Thomann recognizes the difficulty of counting things using subjective measures, so it's easy to get misleading data. The key to subjective measurement is to set up what you expect in return. Without being overly intrusive to your users, put in controls to ensure that you get the data back. For example, gather responses during user login. If you do user interviews, ask questions in different ways, but always do things in a structured manner. It's also important to provide feedback to the participants. Finally, close the loop by asking yourself: Am I surveying the right people? Did I survey at the right time?

In addition to different types of measures, there are also different levels:

  • Existence - does the warehouse exist or doesn't it? This sounds overly simple, but it's important: Have users accessed the data base or not?
  • Quantity - this refers to “how much,” or how many times the warehouse was used.
  • Quality - the most difficult level, assessing “How good did we do?” Thomann warns this third level is the fuzziest until you understand the first two levels.

Metrics also have a number of components, and for data warehousing can be broken down in the following manner:

  • Objects - the “themes” in the data warehouse environment which need to be assessed. Objects can include business drivers, warehouse contents, refresh processes, accesses, and tools.
  • Subjects - things in the data warehouse to which we assign numbers, or a quantity. For example, subjects include the cost or value of a specific warehouse activity, access frequency, duration, and utilization.
  • Strata - a criterion for manipulating metric information. This might include day of the week, specific tables accessed, location, time, or accesses by department.

These metric components may be combined to define an “application,” which states how the information will be applied. For example: “When actual monthly refresh cost exceeds targeted monthly refresh cost, the value of each data collection in the warehouse must be re-established.”

The Data Warehouse and Change

An important characteristic in data warehousing is the concept of process; in this sense, the realization that the warehouse will constantly change. Wells suggests that organizations anticipate change in data warehousing and expect it. We're surprised by change, he says, but we should just accept it—and manage it.

Growth is a form of change, but it's more predictable and thus more manageable. For example, in data warehousing, growth can be defined by the following:

  • the number of users,
  • how they use the warehouse,
  • the addition of new data, and
  • the addition of different types of data.

Wells suggests using a chart to help manage data warehouse growth. The chart can state expectations of the warehouse (which can be defined in measurable terms), and for each expectation, list the goals, metrics, and measures to be used to manage those expectations. Don't forget adding what you'll do to monitor the growth (is reality matching the data?), and plan to update the goals, metrics, and measures as the warehouse changes over time.

Thomann adds that change in data warehousing is desirable, because it needs to grow. Otherwise you'll have tomorrow's legacy system. But to keep the warehouse valuable you have to strive for continuous improvement. To illustrate, he describes a “typical” data warehousing curve: When the warehouse is first implemented, after a week or so the usage level is very high because news about the warehouse has spread and users are exploring. In a month, usage drops off significantly as users learn what the data warehouse cannot do. After the next release or feature addition usage goes up slightly, although there isn't as much interest as with the initial release. Then, a few days later it dips again. This is a typical pattern, but what you're ideally looking for is a curve with definite increases and no dips. “You can't stop entropy,” he says, “but you can delay it by being proactive in your management. So use data—like you give your users—only it's for you.”

Further Steps

Of course, the objective of measuring is to take the measurement data and brainstorm possible future improvements. For example, once you have measurement data you can do cost/benefit analyses for new data warehouse projects. Thomann and Wells suggest building a set of priorities, because you can't do everything. Then plan future projects, packaging the good ideas together if they are compatible. In addition, what you don't want to do with the data warehouse is important, so put a boundary around your projects based on the organization's specific needs. In the end, however, a measurement program isn't about just getting data—you have to apply the knowledge and take action to make it work.

Source

“Data warehouse quality management,” presented by Dr. James Thomann and David Wells at The Data Warehousing Institute's Fourth Annual Implementation Conference, held February 14-19, 1999 in Anaheim, CA.

Reference

Data Warehouse Quality Management (coursebook DWH-DWQ-004), © Copyright 1998 by SOFTWARE AG Americas, Inc.


   Go to:

      Developing a Data Warehouse Architecture | Go to the top of this page

This site was designed and authored by Laura
Site last updated 2002