深蓝海域KMPRO

Real-Time Data Warehouse: Portals and Closed-Loop Analysis i

2017-02-24 11:45

Real-Time Data Warehouse: Portals and Closed-Loop Analysis in a Real-Time Environment

By Michael Haisten

Closed-loop analysis (CLA) is a formal business intelligence process intended to set business goals, monitor progress, assess impact or effectiveness and realign objectives as required. It becomes a real-time process when data is continuously collected and evaluated. The role of an enterprise portal is to serve as a dashboard for all the analytic and operational components in the mix.

CLA is not a new concept. It is an outgrowth of more traditional, analytically based strategic planning processes that operate on a much longer cycle. Planning cycles of a year or more were once the norm. In the case of a new marketing program, months might be dedicated to research, formulation of goals and program design. The program is then implemented and allowed to run for six months to a year. While the program is running, data is gathered and compared to expected results, generally on a monthly basis. At the end of the year, the program mangers evaluate the results and a decision is made: Does the program continue as is, is it tweaked, wholly revamped or abandoned?

CLA progresses from investigation to execution to evaluation to revision in cycles as short as a quarter or even a month. The change is not just a compression of the same activities into a shorter unit of time. While the sequence stays the same, the discreet activities are fundamentally different. The process is more hands-on, dynamic and iterative. It is more like piloting an airplane than being a passenger on an ocean liner. You are in control, and the results of your actions have more immediate consequences.

Some believe that CLA is an artifact of the Internet age; and, in a technical sense, it is. The insanely quick mass adoption, and rejection, of new offerings requires a continuous eye on the ball and immediate responsiveness. The real inducement to adopt CLA techniques is to master the information-driven economy. The Internet is the medium; closed-loop analysis is one of the means.

Let’s explore an example. A fictitious company called Acme.com has developed a Web applet that tracks useful Internet activity such as logging into registered sites and submitting forms. Most clickstream data is acquired for a single site or a collection of sites that share information. The Acme.com data covers tens of thousands of popular sites. Acme’s information is more focused and more valuable than the aimless wandering-in-the-wilderness nature of raw clickstream data. They have the potential to uncover associated patterns of interaction.

For instance, are Web surfers who visit multiple book-selling sites more likely to buy online than single site browsers? Do consumers who buy stocks online sign up with the financial account aggregators more than the average clickthrough visitor? What are the top 10 sites visited by frequent users of a specific site such as WeHaveItAll.com?

Acme.com’s business model is to exploit this data to create precisely targeted market opportunities for their corporate partners. WeHaveItAll.com might be willing to host banner advertisements for the correlated sites identified by Acme and split the ad revenue with Acme. Armed with Acme’s data, WeHaveItAll.com could charge higher rates.

So how does Acme.com start with a massive repository of event data and end up with a profitable marketing business? There are two possible starting points in the CLA process for Acme. On the one hand, they could forge an agreement with the owners of key Web sites they already have in their tracking inventory and ferret out correlated behaviors for them. On the other hand, they could first identify strong associations in the data and seek out the involved sites armed with this valuable information. In either case, we have established a premise that is the first step in a closed-loop process.

The CLA phases are analogous to the steps in the PDCA process of total quality management. PDCA means plan, do, check, act. Plan involves setting objectives and designing a program or process. Do means to execute the process, while check is the milestone step of using a predefined evaluation framework to monitor and test the success of your efforts. Act is a dynamic process of course correction that loops back to the plan step. You take your learnings and plow them back into the next PDCA cycle.

Closed-loop analysis starts with a premise as in our example. The investigate phase entails the most sophisticated analysis. You search for correlations and associations that represent discernable patterns in the data. You may use clustering or segmentation techniques to find sets of common factors or behaviors. You may set out to validate or reject a hypothesis. In all cases, the result is a proposed course of action such as a marketing program or an additional Web site feature or even a new product.

The next step is to establish the evaluation framework and monitoring mechanism. We call this the initiation phase. A critical facet of CLA is that measurable success criteria are defined up front. The measurement mechanism may be as basic as periodic reporting, a more sophisticated method such as dynamic alerts or some form of continuous tracking against a baseline target.

With the evaluation framework in place, you are ready to begin the execution phase. Unlike the sequential nature of the PDCA process, monitoring takes place throughout the execution phase – doing and checking as you go. Instead of monthly checkpoints, we are more likely to monitor progress on a weekly, daily or continuous basis.

The evaluation framework should define thresholds which can be volumes or rates or both. Examples include a sales volume target or an adoption rate or a percentage of clickthroughs that generate sales. You generally define an unqualified success level, a just getting by or marginal target, and an unacceptable attainment or oops threshold. Thresholds can be a single number by a specified point in time but are more effective if the are a trended projection over time.

With thresholds defined, the monitoring process has an ongoing evaluation component. Generally, you time box the execution phase but can pull the plug at anytime. Time boxing means you set an outside limit on how long the program runs before you force an evaluation milestone. At an evaluation milestone, you decide to expand, continue, revise, replan or terminate.

If you hit the unqualified success threshold, you might continue a successful program or expand the process (e.g., more clients, more products or more similar programs). If you only attain a marginal performance level, you may need to revise or replan. Revision involves minor corrections to the execution strategy and may potentially require additional investigation. A replan involves substantial new investigative analysis and potentially a modification to the original defining premise.

Tracking at or below the oops threshold may require you to terminate the program early. If the results are disastrous, you have to determine if the execution was fundamentally flawed. If it was, you must consider the value and feasibility of staging a new trial or redesigning your execution process. If it was not flawed, you may be forced to reject the whole premise and start from scratch.

In summary, the CLA steps are:

  1. Propose a premise.
  2. Investigate the data.
  3. Design a program.
  4. Establish an evaluation framework.
  5. Develop the monitoring mechanism.
  6. Execute the program.
  7. Monitor progress.
  8. Evaluate success.
  9. Act on results.
  10. Do it again (recycle).

Let’s return to our Acme.com example:

  1. Premise. Sell marketing services to major dot-com entities by developing profiles of associated activity by Web browsers.
  2. Investigation. Identify dominant Internet companies by some criteria such as capitalization, volume of traffic or percentage of clickthroughs. Sift the data to identify inbound and outbound sites. Rank the associated sites by volume of traffic between target site and associated site. Analyze repeat traffic. Mine the data to reveal any distinct segments of common behavior that might offer the opportunity for more focused targeting. This example investigation stage uses a myriad of different analytic methods and, thus, different tools. A real-world process might potentially involve even more.
  3. Design. Acme.com decides to trial a handful of target sites such as WeHaveItAll.com. They decide to concentrate on profiling outbound clickthroughs from the targeted sites since this seems to offer the highest value opportunity. They will sell banner ads on the targeted sites that bring more traffic to the associated sites.
  4. Evaluation Framework: Acme decides to establish an eight-week trial window. Their marginal target is 20,000 clickthroughs per week. Anything over 50,000 will be considered a rousing success. On the other hand, if they have less than 10,000 per week minimum or less than 50,000 after four weeks, the trial will be considered a failure.
  5. Monitoring Mechanism: Event data from consumer Web interactions continuously comes into Acme’s back-office system from their Web server farm. They place a filter directly on the message stream to select and store only clickthrough events from WeHaveItAll.com. A Java applet is designed to display a daily counter and update a trend line for each of the top 10 selected sites. This will allow anyone on the marketing team to see the instantaneous and historical results on a corner of their portal home page at any time. Once a week, or on demand, they will be able to run a report that is a summary of all outbound site events (moving from WeHaveItAll.com to somewhere else), not just clickthroughs to the 10 selected sites. At any time, they can invoke an OLAP tool that has detailed results from the beginning of the trial through last night for in-depth analysis.
  6. Execute. The banner control database for the WeHaveItAll site is updated Tuesday night to be ready for the media tracking week that begins on Wednesday. The marketing teams want outside validation by Internet auditing companies to corroborate their results.
  7. Monitor. The click counters move up more slowly than expected until the weekend when heavy surfing begins. By the second week, they are tracking over-the-top on one site with 240,000 clickthroughs. Three others are above the marginal goal of 20,000 per week. Unfortunately, the other six are lagging far behind minimum expectations. They decide to continue the trial with all 10 ads in random rotation but they immediately begin investigation of surfer behavior in and out of the six low running sites. By the fourth week, the six sites are still below the oops threshold.
  8. Evaluate. By now they have already looped back to investigation and must decide whether to pull the plug on the nonperforming ads. Someone on the investigation team had a breakthrough insight: might it be possible that a high proportion of the people that frequent the WeHaveItAll site were already frequent users of the six low activity sites? This would mean there are many fewer people who would be attracted to a new promotion for these sites. Sure enough, this turns out to be true of four of the six sites.

    What about the other two? By looking backward in the clickstream for patterns, they find that many people come from a consistent set of sites, in the same order, into WeHaveItAll.com and on to one of the two remaining sites. The conclusion is that these folks followed a Web- ring and did not come to these sites on their own. This was not an independent affinity decision.

    At this point, one ad is making five times the money they expected. Three are in the green. The remaining six are now known to be low performers.

  9. Act. Acme decides to continue the blockbuster ad until the activity drops below the marginal threshold. This is an expansion of the original plan. The three other performers are continued to the end of the trial. The under-performers are terminated at the end of the four weeks.
  10. Recycle. The results of the first cycle reveal that targeting must be refined in two ways. First, eliminate any sites known to be in a Web-ring with WeHaveItAll.com. A side implication is that Acme must research means of identifying Web-rings. Second, they must add a new factor into the selection process. They must deselect sites where the consumer is already a heavy user of both WeHaveItAll and its site. The trick is to define the frequency of returns to the associated site that is likely to make them less responsive to a new inducement. More investigation is warranted.

One other high-value opportunity area remains for new investigation. What is unique about the one super performer? Is it just something uniquely compelling about their offering? If this is true, there may be little Acme can do to identify other hot possibilities. However, what if there is something identifiable about the pattern of behavior of WeHaveItAll.com customers that will predispose them to a larger class of related sites? If this is the case, then Acme has identified a hot track to pursue.

The real-time implications of closed-loop analysis should be apparent. Batch processes with delayed collection of data and many physically segregated stores of data are significant barriers to quick and responsive action. It is actually far easier to collect data from key sources as it is generated than it is to detect and collect it from application data stores after the fact.

You can exploit CLA techniques with more traditional data sourcing and staged data warehouse data. However, I would not want to do it without an enterprise portal. The portal acts as the integration layer for the documents and data from each stage in the process. It is the single point of access to the tools, applications and applets used along the way. The plan and interim conclusions (in document form) can be viewed side by side with the dynamic monitors and results from previous cycles.

One of the most effective ways to exploit portals within an enterprise is visceral, tangible and dynamic integration of an extended business process. People from multiple disciplines work together toward common goals in the same virtual environment. Closed-loop analysis is an active information-intensive process that will benefit the most by being supported by an enterprise portal.

In closing, this column highlights an Internet- style company because it demonstrates CLA in a fast-paced, information-rich mode. These techniques are being applied to supply chain management, product positioning, development of niche markets in several industries, portfolio management and many more situations. There is likely an opportunity to exploit closed-loop analysis where you work.

 


 

Michael Haisten, vice president of information architecture services at Daman Consulting, is considered one of a handful of visionaries who have helped shape the data management industry. He has accrued more than 22 years of leadership in information management and architecture development. Haisten served as chief information architect at Apple Computer Corporation, where he developed the data warehouse management team. He has been chief architect, designer and technical expert for more than 72 data warehouse and decision support projects. In addition, Haisten is the author of Data Access Architecture Guide and Data Warehouse Master Plan and has published extensively on data warehouse planning, data access facilitation and other key aspects of data warehousing. He may be contacted at mhaisten@damanconsulting.com.

 

相关推荐