深蓝海域KMPRO

Data Warehousing, Enterprise Portals and the Information Sup

2017-02-28 11:28

Real-Time Data Warehouse: Data Warehousing, Enterprise Portals and the Information Supply Chain

By Michael Haisten

What is Unique about DW Projects that Succeed?

Many articles have been written that warn about all the ways a data warehouse project can fail. But is there something more to be learned by looking at those who succeed? Is there something the successful teams do other than to not fail?

When we researched this question, we found a common theme that should not have been surprising. The teams that succeed understand their consumers. In a more complete sense, they understand the role each consumer plays in creating, using and acting on information.

Information analysis or decision support is not just a one- step process of generating a report and reaching a conclusion. Raw data and intermediate results flow from individual to individual through many channels. Along the way, numerous facts are generated, trends are analyzed, hypotheses are tested and decisions are made. This flow of content from original source(s) to ultimate action(s) is what we call the information supply chain.

Information Supply Chain

A common tendency in data warehousing is to focus on storage rather than flow of information. In manufacturing, warehousing supports the optimal flow of product from site of production through intermediate locations to the site of final use. Successful data warehouse teams adapt this dynamic view to information flows.

Picture this: Information is a fluid that comes from many sources and is blended in unique ways as it is moved in buckets from one location to another as it is needed. A data warehouse might be seen as a refinery that takes in undifferentiated raw material and produces many distilled variants that serve a specific purpose well. This creates the optimal product with which to begin analysis. The error is to think of this stage as the end of anything other than the duty assignment of this one analyst. It is only the beginning of the chain; information still moves in buckets the rest of the way.

1. Very few information consumers use raw data.

When you build a data warehouse, you acquire data from multiple sources, you validate it, you integrate it and you prepare it for use. A consumer may be able to access the data interactively, and the data warehouse may provide a variety of channels to deliver the right content in the right form. However, no matter how well you perform these difficult tasks, no more than 15-20 percent of enterprise information consumers are likely to use your data resources directly. In fact, the penetration rate is more likely to be as low as 5 percent.

Why? The easy answer is to blame the technology or people抯 willingness to accept the technology. "If only the tools were easier!" "If only we didn抰 have so many computer-phobic users!" Yes, technology advances are reducing barriers to broader use. Yes, more computer-literate employees and better training will mean more people can use what you offer. But the degree of data warehouse adoption has a (low) natural ceiling that cannot be exceeded by traditional means.

Regardless of the degree of aggregation and summary or the amount of transformation and derivation, most data warehouses are purveyors of raw data. The simple truth is the vast majority of information consumers do not use raw data. Their work begins with intermediate results, such as spreadsheets and documents, prepared by someone else. The average consumer adds value to these results and then sends this modified content on to be used by others. Most information consumers are not "end users" at all; they are "middle users."

2. Most effective information is five to seven steps from the last IT- managed source.

Let抯 define "effective information" as content which is the direct cause of action. Let抯 define the "last IT-managed source" as a deliverable created by the official information technology group which is the original source for content in a presentation document. Examples of IT managed sources include a database table in a data warehouse or a scheduled production report. The presentation document may be a hard copy report, an electronic spreadsheet or a slide presentation.

With these definitions in mind, we suggest you conduct a trial sources and uses study. Identify a true decision-maker in your organization. Find the presentation documents that contain the effective information she relies on. Trace these documents back through all the intermediate destinations to the IT managed source(s). If your enterprise is typical, you will find, as we did, that data has been manually manipulated in five to seven separate steps on its way from the original raw data source to final action.

For instance, at a major computer manufacturer in the early 1990s, top managers relied on the green, red, blue and yellow sheets to make decisions. The green sheet contained current sales with notable highlights such as large orders or key customer trends. This content originated from numerous systems and was actively massaged and repacked at least four times before being hand-assembled into the green sheet. The data for the red cost sheet went through so many hands we never knew for sure. The blue backlog sheet was sometimes accused of being wild guesses and little white lies. Nobody knew for sure where the supporting data came from or how it was used to produce the results. The yellow outside influences sheet was composed of news and clues and espionage that did not come from IT sources at all.

A well-designed data warehouse will eliminate sourcing problems and may eliminate several steps in the overall flow, but it will never eliminate the management dynamic that exists in all organizations. The person who first gets the data may not use it. The person who uses the data may not act on it. The person who acts on it may not know the underlying data. Very rarely is the decision-maker the data analyst, much less the original data gatherer. This is the essence of the information supply chain from "get" to "use" to "act."

Click here for An Information Supply Chain Example.

3. The greatest productivity and consistency loss occurs in the middle of the chain.

In a data warehouse, we build a rich and deep base of data. We buy, install and support expensive OLAP tools. It is understandable that we assume our direct customer is the ultimate consumer. The start of the chain may be where the power user is, but it is not where the action is.

A data warehouse is generally designed to serve the first step ?or, at most, few steps ?in the information supply chain. We help the initial consumer get the raw data. We may provide support for early usage steps. Rarely does our reach extend to the middle, much less to the end, of the chain.

The cost of not serving the middle to the end of the chain can be high. The desktop processing at the upper end of the chain is massively inefficient when expensive personnel are used as data entry clerks. Consistency is jeopardized by high error rates caused by re- keying and manual manipulation, the lack of checks and balances and capriciously volatile business processes.

Several factors are the direct cause of loss of productivity and information consistency in the upper parts of the chain:

  • People are unaware that the information exists elsewhere.
  • If they cannot find what they need, they reproduce it.
  • If they can find something relevant, they can抰 identify its sources.
  • The higher you go, the more you need broad support for a more specified proposition. You need multiple sources of correlated information, some of which is only indicative or anecdotal rather than quantitative and definitive.
  • Politics and cultural barriers may limit information sharing.

Enter the Enterprise Portal

An enterprise information portal (EIP), when combined with an extended services data warehouse, provides the first chance to comprehensively support the full information supply chain. I am not one to make bombastic claims; and, in fact, I am somewhat cynical about most claims by technology advocates and product vendors. That being said, here is my claim: The portal/warehouse combination will begin addressing long-standing information management issues like nothing else I have seen in my 24 years in this business. No panacea, just the best new thing in a long time.

I will not define EIPs in detail since much has been published on this topic. However, it is critical for you to understand that I am referring to a specific subset of the broad range of contenders for this title. The subset includes those portal products that provide an extensive array of document and unstructured information management services in addition to structured data access and analysis. The large number of "business intelligence" portals, which primarily offer report, query and OLAP services, need not apply.

This form of EIP provides top-down integration to complement the bottom-up integration of data warehousing. The EIP provides the mechanism to catalog the document sets that are the content delivery vehicles for the middle-to-end of the information supply chain.

An EIP can address the productivity and consistency issues from the last section. First, an EIP provides a mechanism for content creators to catalog their results for others to use. Navigation facilities help information consumers find relevant information regardless of the document or data type, the storage form or the location.

Second, an EIP helps eliminate reinventing the wheel while cataloging multiple answers to the same question along with the supporting evidence. When people cannot easily find what they need, the first problem is the productivity loss in reproducing the results. The second problem may be more severe. People unwittingly introduce a redundant "answer" to the information flow, which is highly likely to be inconsistent in source, form, or results with other analyses.

The existence of multiple, even conflicting, answers is not the problem. It is reasonable for different analysts to produce different results when their assumptions, their sources or their methods are different. The information value is higher when a decision-maker has multiple conjectures available supported by clearly defined calculation methods.

Often, though, the multiple results are found and presented without the defining background that may account for their differences. If an analyst comes across multiple "answers," they must either accept one on faith or be forced to reconcile them. This outcome is better then the typical scenario where conflicting results are presented to a key decision-maker without explanation. Her trust in the whole process is undermined.

Third, an EIP can be used to prepare and present a complete information package. It allows you to store together all content (documents and data) about a topic. It can include references or copies of news articles with supporting facts or ideas. It can include a link to any intranet or Internet source that may be relevant. Because it is based on the interactive media of the World Wide Web, it can include audio clips, video presentations, tutorials or demonstrations.

Fourth, an EIP can create a more egalitarian environment for information sharing. The existence of a familiar and ubiquitous mechanism for publishing and subscribing to information breaks down traditional barriers between departments and business units. Those whose power comes solely from limiting the flow of information will resist the introduction of an EIP into their domain. Be watchful for this hidden motivation.

Summary

The highest goal of information management should be optimizing the information supply chain of the enterprise. This requires increased attention on how information flows through your organization. Data warehouse services can be extended further up the supply chain by supporting multiple stages of structured data analysis. An enterprise portal can be introduced to handle the organization of the document- centric, middle-to-end of the supply chain. Instead of today抯 bucket brigade, using our fluid example, we will be building subject-oriented pipelines to move content efficiently throughout the enterprise.

 


 

Michael Haisten, vice president of information architecture services at Daman Consulting, is considered one of a handful of visionaries who have helped shape the data management industry. He has accrued more than 22 years of leadership in information management and architecture development. Haisten served as chief information architect at Apple Computer Corporation, where he developed the data warehouse management team. He has been chief architect, designer and technical expert for more than 72 data warehouse and decision support projects. In addition, Haisten is the author of Data Access Architecture Guide and Data Warehouse Master Plan and has published extensively on data warehouse planning, data access facilitation and other key aspects of data warehousing. He may be contacted at mhaisten@damanconsulting.com.

 

 

相关推荐