||View Job Listings|
||Post a job|
Ask the Experts
Welcome to DMReview.com's Ask the Experts moderated by Sid Adelman, president of Sid Adelman & Associates, a consulting firm specializing in data warehousing and strategic data architecture. Ask the Experts allows business intelligence and data warehousing professionals to ask questions of industry leaders including Adelman, Majid Abai, Les Barbusinski, Steve Hoberman, Chuck Kelley, Danette McGilvray, Larissa Moss, Joe Oates and Clay Rehm.
Ask the Experts has been a long-running feature of DMReview.com. For your convenience, we have an archive of the questions which have been answered. Be sure to check the Question Archive before submitting a new question because one of the experts may have already answered your question. It can take up to eight weeks to post answers to new questions received. Please note, the "experts" do not answer every question, especially vendor-specific questions. You are welcome to post any type of questions on the forums section of www.dataWarehouse.com to get a response from other professionals in the field. Thank you for visiting Ask the Experts.
Meet the Experts
Ask a Question
(Names of individuals and companies will not be used.)
Ask the Experts Home
MOST RECENT QUESTIONS & ANSWERS
(Posted January 3, 2006)
We are in the process of building a data warehouse. We would like our project to be one our number-one priority and need to come up with a business plan. I have been asked to explain 1) urgency of need 2) elimination of risk. Can you help?
||Sid Adelman's Answer: Urgency of need - Research what others in your industry are doing, especially your competition. Nothing motivates senior management more than the fear that your hated competitors are doing something better or smarter than you are.
Elimination of risk - You will never be able to eliminate risk, but these are some of the things you can do to minimize that risk:
1. If the mission and the objectives for the DW have not been defined:
- Identify the sponsor of the DW.
- Insist (strongly recommend) that the mission and objectives be defined prior to any serious activity.
- Develop a straw man for the mission and objectives and propose it to the DW sponsor.
2. If the mission and objectives of the DW do not map to those of the enterprise:
- If there are no explicit enterprise objectives, there are probably assumed objectives to which most people in the enterprise would subscribe. These should be documented and mapped to the DW objectives.
- If enterprise objectives exist but the DW does not support them, rethink what you are trying to accomplish with the DW.
3. When there are problems with the quality of the source data (there always are):
- If the quality of the source data is unknown, use a quality evaluation tool to determine just how bad things are. Identify operational data with quality problems to someone high in the organization (perhaps the CIO), and then step back (it is not your responsibility to clean up dirty operational data).
- Because the quality of the source data will be highly variable, try to convince the user to implement the cleanest data first (this will sometimes work).
- If the user insists on putting dirty data in the DW, at least flag the data in metadata, indicating its level of quality.
4. To provide the skills to support the DW:
- Define the functional responsibilities of data administrators, database administrators, application developers and user liaisons. Define the skill levels required for each of these positions.
- Sell management on the need to have skilled people on the DW team.
- Sell management on the need to have these people sufficiently dedicated to the project.
5. To be sure you have an adequate budget in place:
- Compile industry publications, presentations, etc. that indicate what a DW will normally cost. Watch out for those who give figures for selected subsets of the effort or who disregard costs assigned to some other departments.
- Itemize each of the costs for your project. Don't pad the numbers, but don't underestimate just because you think the true cost will frighten management into paralysis.
- If the numbers are too high, consider a smaller project or one that does not require some big ticket items (such as a new DBMS, other expensive software, or major new hardware).
6. For supporting software (extract, cleansing, BI tools, DBMS, etc.):
- Understand the benefit of this software to the project. If it does not benefit this specific project, justification can only be accomplished if major follow-on projects will significantly benefit from its use.
- Quantify the costs of not using the software. These costs should include the additional effort to write the code, the ongoing costs to maintain the code, the costs of delay and the potential for reduced quality of the implementation.
- Identify only the software that can make a major contribution. Avoid recommending a piece of software that is fun, leading edge and a resume enhancer, but does not significantly make a contribution to the project.
7. Focus on the source data:
- If source data has been neither inventoried nor modeled, it is probably because IT management does not recognize the importance of these activities. Any such recommendations would probably be seen as delaying the project. In fact, the inventory and modeling effort is long and laborious. If management has not already recognized their benefits, it's unlikely that the DW project will sell it. The DW should not be used as justification for data modeling or for inventorying the data.
- If a data modeling tool is in place that has reverse engineering capability (the ability to take database definitions [DDL], capture them in the data model encyclopedia and generate rough models), this reverse engineering could be the least costly and most acceptable course of action.
8. To be sure you have a strong, well-placed, reasonable user sponsor:
- Take your time. Make a list of sponsors that match the above criteria, and put the strongest ones on top. Research their decision support requirements, and determine which problems could be well-served by the DW. Invite #1 to lunch, sell that user on the DW, outline what would be needed from them and from their department, and ask for their sponsorship.
- If #1 is not agreeable, invite #2.
- When you are down to The User from Hell, stop and do something else.
9. Focus on the primary business users of the data warehouse:
- If your users are not computer literate, budget more money for user support.
- Allow more time for the expected volume to be achieved. Readjust your expectations.
- Revamp the training so as not to frighten the students.
- Provide mentors in the training process.
- Develop a more comprehensive set of predefined queries.
- Choose an extra-user-friendly front end (choose warm and fuzzy over power and function).
10. Address the users' expectations for the DW:
- Be honest. Don't misrepresent what the users will be getting, their required involvement, the costs or the schedules.
- Never, never, never be coerced by anyone to accept unrealistic time frames or budgets.
- Document what the users will be getting and when (some installations ask the users to sign this document).
- Continue to remind the users of what they will be getting and when.
- If you have a user who is unwilling to accept your estimates, give someone else the opportunity to work with that user.
Tom Haughey's Answer: First, I would strongly suggest starting by getting business managers to specify their goals, such as reduce customer attrition by x percent this year, decrease disability costs by y percent this year or improve customer cross-sell by z percent each year. You want to make sure your warehouse eventually supports these goals by storing the necessary data. A well-known practice is to ask managers for their top-ten questions or what answers they most need. Initially focus on big-ticket questions - questions that would have a significant payback. Try to get them to put a value on the answer to each question. Look at it this way: "If we give you the answer to this question, what is it worth to you?" See if they are willing to put a value on it. Nobody has to commit to the number but it makes the question tangible. For other examples, "If we could reduce customer onsite service calls by one call per customer per year, we could save x dollars (the cost of one visit per year times the total number of customers)." Or, "If we could anticipate potential disability accidents and thereby reduce disability payments by 1% per year, we could save xyz dollars." In fact, maybe they could even pay for the warehouse in one year!
Second, I usually create the DW strategy using two means: a simple process and a simple framework. The simple process consists of five steps. The time to complete these steps could be two to four months, depending on size, complexity and resources:
- Define goals: what business goals you need to achieve.
- Define future vision: what the warehouse will look like in the long term.
- Assess current state: what it looks like now.
- Determine gaps: find the holes that need to be fixed.
- Formulate a plan (perhaps over three years, to go from where you are to where you need to be).
The simple framework is called BIAT (business process, information, application and technology). Here is a description of BIAT:
Business Processes: Gross business processes or collections of elementary activities (e.g., implement campaign, accept customer contact, change customer data).
Information: Primary entities and, in some cases, subject areas (e.g., subject, product, purchased product, prospect). The data flows across the business.
Application: Collections of implemented processes and procedures.
Technology: Technology types (e.g., messaging, relational DBMS, etc.).
Collect information on BIAT during each of the five steps above.
In my view there is one major way to avoid risk - don't do anything risky. More seriously, as I have said in other places many times, there are three major ways to eliminate risk once you have gone through steps 1 through 4 as mentioned: namely, incremental delivery, delivery in short intervals and prototyping. [See more details in my response to this month's question on "the relevance of a work breakdown structure in a data warehouse project."] Long-term promises and far-out schedules are risky. Deliver short-term results but on a stable base; in essence, this means on a broad stable database and a robust platform - but (and this cannot be said too often) built piece by piece. When you test, test end-to-end. A warehouse has three major processes: gather, store and delivery. Test all three fully. For example, we need some aggregates. We design them and put them in. Queries run faster. However, creation of the aggregates blows our batch window by 15 minutes and the warehouse is late! End-to-end testing will discover this.
There are other factors. Do vendor evaluations (even conduct shoot-outs among vendors) and work only with dependable vendors. Look up their stats, including their financial stats. Keep the number of vendors and technologies to a minimum. Don't be afraid to get the vendors to contribute some (non-billable) assistance to your strategy project. There are no longer dozens of vendors to pick form in each area, such as ETL (extract, transform and load), BI (business intelligence) and DBMS (database management systems). You can find vendor evaluations in Gartner and all the other broad industry consultants. They can help you with your thinking, but I would not use them for the detailed work. They are very expensive for one thing. In choosing products, look not only for ease of use and performance, but also and very importantly for scalability. Can you add to the existing structure as needs grow? You want your DW to be a success, but you want it to survive it. If the DW becomes such a success that it cannot sustain the workload, then you have to re-platform, which will be costly and time-consuming.
Get some help from consultants who have been there and done that. As a manager, I preferred to use different opinions rather than rely on one large consulting vendor. Avoid those that seem to have obsessive attitudes about you must do this and that. If you are looking for a long-term solution, I would avoid deploying a lot of data marts as your main strategy, especially independent data marts. Data marts will play a role, and you might start with a data mart as your first project. However, the more cross-functional the information you need, the more centralized your solution should be. Consider the painful lessons learned from companies today whose primary DW strategy is data mart consolidation! Look also for those experts who have had success in helping other companies be successful along the same lines as your plan.
Danette McGilvray's Answer: Urgency of Need. The urgency of need depends on your business situation and why you are building the data warehouse. Answer the following questions:
- Who will use the data warehouse?
- What kind of information will be provided?
- What questions is the business asking and how will the information from the data warehouse help answer them?
- Who is asking those questions? This should identify the people in your company who will benefit from the data warehouse.
- Why are those questions important to the business?
- How are those business questions being answered today? How long and how many people does it take to answer them currently?
- How can the data warehouse improve the ability to answer the questions accurately and reduce the amount of time to assemble the information? Quantify current time and resources and compare to estimated time and resources with the data warehouse.
Research the answers to these questions. Then create the story or stories that explain the situation and shows the business need specific to your company. For instance, it comes as a surprise to many managers that what seems to them a simple request takes many people several days to produce. It's not unusual to hear that the request for a list of the company's top-ten customers along with what was purchased in the last six months requires four people one week's worth of time to prepare the report. The actual effort is often hidden from those requiring the information.
Elimination of Risk. As with any project, you can never eliminate risk when building a data warehouse. However, you can lower your risk through proper management of specific potential problems. To manage your risk:
- Identify all potential risks from your project information and compile into one list. Risks typically fall into the categories of risk associated with scope, schedule, resources and quality. One area already known to be high-risk in a data warehouse project is data quality (i.e., poor data quality in the source systems being integrated into the warehouse).
- Analyze the risks by assessing the impact to the project of each risk (e.g., high, medium or low). Assess the likelihood that the risk will occur (high, medium, low or a specific percentage that estimates the probability of occurrence for each risk). Determine if it will be easy, difficult or not possible for the problem to be detected in advance. For example, poor data quality will have a high impact on the warehouse and has a high likelihood that it will occur. Detection of the quality problems will require specific action to be found in advance.
- Prioritize the project risks using the results of your analysis. Manage only those risks with the most potential to damage the project. Data quality should be prioritized as a risk to be managed.
- Manage those high-potential risks by: 1) preventing the causes where possible or 2) developing contingency plans to deal with the effects. Incorporate the prevention tasks or the contingency plans into your overall project plan. For example, you can prevent many problems associated with data quality by incorporating data profiling tasks early in the project plan and using the results to do clean-up in source systems where possible or develop appropriate transformation rules.
- Continue to reassess the risks throughout the project.
Clay Rehm's Answer: Your business plan and urgency of need must be authored and presented by your business partners that are in such dire straits because they don't have a data warehouse. If they are really having a problem, it should be fairly easy to identify the cost and benefits to be realized.
Your project will only get number-one priority if you can prove that your organization cannot function without one. Thoughts to keep in mind include: Can the warehouse reduce expenses? Can it produce revenue that did not exist before? How can your data warehouse impact the bottom line of your company? If your data warehouse cannot save or create money, what can it do? What is the purpose and goal of it? What will be the long-term goals of having it or not having it?
I am not sure that any project can completely eliminate all risk. However, I would list the known issues and risks and how you may mitigate each one of them. Keep in mind that one way to address risk is to not have a solution at all for it.
(Posted January 3, 2006)
What roadmap should I use for an insurance system data warehouse?
Sid Adelman's Answer: The roadmap for insurance is not that different than for other industries. Pick up Business Intelligence Roadmap by Larissa Moss and Shaku Atre. However, it is very important to understand the types of DW applications that are going on in the insurance industry. These are some of them:
Insurance Company 1: Established subject areas (i.e., claims, marketing)
- Incorporate both internal and external data (i.e., information on competitors and the insurance industry trends).
- Forecast and monitor changes in the industry, thereby allowing better positioning in the marketplace.
- Identify characteristics of profitable business.
- Analyze information related to retention of business at renewal including patterns of customers who do not renew, determine reasons why and resolve issues that will assist in retaining valued clients.
Insurance Company 2:
- Accurate, consolidated view of customer portfolios.
Insurance Company 3:
- Analysis of profitability by customer, product, geography and sales hierarchy.
- Analysis of sales offices for profit and loss.
Insurance Company 4:
- Analysis of quality of new business.
- Premium uplift.
- Catastrophe exposures - which customers are at risk.
- Early-term claims by ZIP code.
- Direct response analysis - customer responds to a coupon offer.
- Survival modeling - how long a customer will stay with you.
Property and Liability Insurance:
- Data mining review of claims by actuarial department.
- Analysis of valid VINs on insurance cards.
Workers' Compensation Insurance:
- Recommend health insurance deductibles.
- Analysis of claims by the employer, cause of injury, body part injured and the percentage of employees who have suffered similar injuries.
- Fraud analysis using data mining.
- Provider evaluation.
- Physicians profiles.
- Length of hospital stay.
- Procedure evaluations.
- Provider abuse.
- Impact on subscriber services and cost.
- Employee costs per employer.
- Service usage.
- Fraud detection.
- Actuarial analysis.
- Patterns of insurance usage.
- Fraud detection - Searching for claims where the service has not actually been provided. Looking for patterns that would suggest further inquiry into the claim.
- Abuse detection - Searching for patterns indicating that certain providers are performing unnecessary procedures, prescribing expensive medication where a less expensive drug would be as effective, performing unnecessary tests and keeping a patient in the hospital longer than necessary.
- Analysis of claims.
- Providers analysis.
- Reporting to groups, government agencies, trade associations.
- Analysis of quality of care and costs.
- Marketing managed-care contracts.
- Financial analysis (actual expenses vs. planned expenses).
- Profitability of manager care arrangements.
- Capitated contract performance.
- Fraud analysis based on the provider's healthcare specialty and the geography of the claim.
- Sales and marketing.
- Provider profiling.
- Government reporting.
- Integrating pharmaceutical information with medical claims.
- Cost analysis by patient demographics and geographical distribution.
- Cost analysis by provider, provider specialties and treatment protocols.
- Analysis by diagnosis/prescription.
- Generic/brand name drug comparisons.
Tom Haughey's Answer: I'm not sure that the roadmap for an insurance data warehouse would differ significantly from the roadmap for any warehouse. I offer first a general roadmap, as follows:
- Parallel planning, design and startup.
- Scope definition (define functionality over time).
- Ends with install of live DB with base data.
- User team tests and explores data, its usage and user interfaces.
- ETL team tests content and quality.
- Database team iteratively tests physical database.
- Ends when time-boxed targets are reached.
- Traditional production rollout.
- User training and preparation.
- Ends with install of new release.
This can be viewed more broadly and in more detail as follows:
1.0 Establish the DW Program:
- 1. Develop the business case.
- 2. Define the opportunities.
- 3. Define the DW architecture.
- 4. Create the DW program.
- Future vision.
- Current environment assessment.
- Gap analysis.
- The migration plan.
2.0 Plan the Project:
- 5. Identify the project tasks.
- 6. Establish the infrastructure.
- 7. Acquire the tools.
- 8. Collect the team.
3.0 Initiate the Database:
- 9. Gather user requirements.
- 10. Create the logical data model.
- 11. Identify the data sources.
- 12. Create the physical model.
- 13. Collect source data.
- 14. Populate the database.
4.0 Prototype the System:
- 15. Repeatedly test and revise the database design.
- 16. Have users interact with the database.
- 17. Tune design and add aggregates, indexes.
- 18. Plan production increments and migration.
5.0 Implement the Release:
- 19. Prepare for incremental release.
- 20. Train the users (very, very important).
- 21. Initiate support processes.
- 22. Migrate to production.
6.0 Enhance the DW
- 23. Manage the DW inventory of requirements.
- 24. Coordinate with business needs.
- 25. Market the warehouse and spread the word.
- 26. Repeat.
Clay Rehm's Answer: By "roadmap" I assume you mean a methodology or process to build a new insurance data warehouse.
Because your subject area is insurance, you must have people on your team that are subject matter experts (SME) at insurance - and not only insurance, but also understand how your insurance processes work. Additionally, you need team members who don't understand insurance but understand how to build a data warehouse. These folks will ask the questions that may seem obvious, but they will challenge your current business processes.
Regarding the roadmap, there are many schools of thought (i.e., Kimball, Inmon, etc). The path you choose will be the one that makes the most sense for your organization, in terms of culture, size, talent, resources, etc. No matter what path you choose, it is wise to start with a small pilot to prove the need and benefit of the data warehouse. Choose something easy and small that can be done quickly with minimal resources. Make sure you involve your business partners from day one and keep them involved and possibly in charge of the project.
(Posted January 3, 2006)
How does data warehousing apply to machine monitoring information? The automobile company I work for collects machine states and events real time and stores this information for later analysis for productivity improvements. After a year's worth of data is collected, the data is purged. What value can data warehousing provide?
Chuck Kelley's Answer: Keeping a long-term view of the states and events of your machines provides the ability for lots of different types of analysis. Remember that once the data is purged, it is lost. You might use a longer-term view for analysis of productivity improvements and/or quality analysis. For example, if you found that a change to widget A was needed and you did it (along with forecasting the expected results), then after collecting the new widget A data you can compare how much you have gained. Then you may make other improvements. You might be able to understand the effects of changes over the rest of the machine. Sometimes, there is an unexpected loss in one machine due to a productivity performance of another.
Tom Haughey's Answer: Data warehousing is an environment for gathering data over time (even long periods of time), storing that data in a reusable database and delivering it as information to business and IT users. A data warehouse has three major processes: gather, store and deliver. It is used for analyzing that data and reporting against it. While data warehouses store data over time, some warehouses (soft drink distributors are an example) might roll off data after two to three years. Data warehouses should be used to report on what happened (stage one), analyze why did it happen (stage two) and predict what will happen (stage three).
I suspect that machine state and events data could be used to determine the productivity and quality of people, processes, workstations and products. However, even decades of machine data alone is only of so much value. I propose that this data needs to be enriched with other data to see its full potential. It could be enriched by adding data from other business processes, adding more dimensional data (perhpas about employees, plants, processes, products) and even perhaps buying data from the outside.
I would talk to other mangers in your business area and related business areas, such as human resources, benefits, disability, manufacturing, product quality. Then ask them, "What are the top-ten questions you need answered?" [See my answer this month on "Risk."] Get them to be as tangible as possible in their answers - 10 different answers, from each manager if possible. Then ask, "What is this worth to you?" Add it up. [Again, see also my answer this month on "Risk."] Little by little, this will allow you to begin to form a picture, from which a strategy can eventually be derived. The thing to do is begin to ask management what answers they need and what value those answers are to them.
Consider this example. How much disability do you pay? Surely and unfortunately, people get injured around machines. They have to spend time with doctors, hospitals, occupational therapists, etc. This costs a bundle. Suppose you could reduce disability by one, five or 10 percent. What would that be worth to you? How would you do that? By analyzing current disability incidents and patterns, then finding employees or situations that have those characteristics (and are thereby likely to end in disability), and seek to address them in advance. I wouldn't be surprised if in large auto manufacturer you could pay for the warehouse in one year by reducing disability payments one percent.
In addition, I suggest focusing on other processes that relate to your machine data. Once enriched it could be used for many other purposes, such as HR. It could be used to determine qualities of good employees, the characteristics of good managers, disability understanding and avoidance (as we saw). The main closely related business areas I would look at would be productivity, HR, product quality, vendor quality, manufacturing and logistics.
(Posted January 3, 2006)
What is the relevance of a work breakdown structure in a data warehouse project? Can you give me an example?
Tom Haughey's Answer: It is often said that warehouses are different. In some ways, they are. Requirements are less tangible and predictable than in OLTP (online transactional processing). Queries are very data intensive, involving few or many tables, but with many, many rows. In OLTP, transactions are data selective, involving few or many tables and comparatively few rows. Metadata is always important, but in OLTP the meaning of fields is predetermined on a screen or report. In a warehouse, metadata (in some way, shape or form) becomes critical because you can ask any question. However, warehouse projects are real projects and, like all development projects, must be managed. To manage them, they need to follow a clear plan. In my experience, warehouse managers often have a more difficult job than those managing OLTP projects because there are so many pieces and sources to manage. Managers that don't follow a good plan typically have major problems, such as lots of rework before delivery, lots of surprises or even eventual failure. Two purposes of the work breakdown structure are to manage work and to ensure success. This is the same as in any project.
However, warehouse projects are unlike typical waterfall projects in that they are based on a RAD (rapid application development) approach. Three of the main principles of RAD are as follows:
- Iteration: Division of work into small increments.
- Timeboxing: Delivery of capability in short intervals. The first release typically around three to nine months (depending on complexity) and quarterly releases thereafter.
- Prototyping: Early delivery of a prototype. Deliver working database one-third of the way through.
(Incidentally, most RAD projects I have observed follow an essentially waterfall process within a given increment. It's sort of like a bunch of smaller waterfalls!)
These three principles are very important because even the best of warehouse plans will invite failure if you ignore them. An example of a failure waiting to happen, even with a fully detailed plan, is a large data warehouse that will gather all requirements upfront and where the warehouse will be delivered all-at-once in three years. It is not the "large" that is the problem, but the "all requirements upfront" and the "all-at-once in three years." Even enterprise data warehouses are delivered piece-by-piece using these three (and other) principles. The feedback you can gather from increment to increment is critical to the success of the future increments. There are many examples of successful enterprise warehouses, but they have all adhered to such principles. [See my answer this month to " An Insurance System Data Warehouse" for an example of a work breakdown structure.]
Danette McGilvray's Answer: A work breakdown structure (WBS) is relevant for any type of project, and a data warehousing project is no exception. A work breakdown structure is a list of project tasks along with deliverables and owners responsible for ensuring the tasks are completed. (The owner may or may not be the person actually executing the task.) The WBS is sometimes known as a project plan, a workplan or a task list.
The work breakdown structure provides a foundation for a realistic project schedule and helps the project team breakdown the deliverables into manageable tasks that can be tracked. By identifying the tasks upfront, it helps reduce surprises about activities, time and resources needed for the project. An additional benefit comes in the form of greater commitment when team members are included in creating the work breakdown structure.
The goal of the task estimation process is to determine the tasks needed to complete the project, estimate effort and create a draft schedule for completion.
When developing the initial task list and timeline with a project team, it is helpful to use a facilitator. The facilitator expedites the process so the project manager and team members are free to focus on determining the actual tasks and effort needed. Depending on the size and scope of the project, you may have more than one project team creating their own project plan for their specific deliverables. The overall project manager then brings together the plans into a master plan. The approach below works whether you have one team or several teams.
Caution: Do not expect this plan to be set in stone. The plan will continue to change as the project progresses; new information is available; scope, resources and priorities change; deliverables are (or are not) completed on time, etc. The process of estimating and modifying your plan should be repeated many times throughout the project. Even the initial plan could take several iterations before you have gathered enough information to create a useable plan.
I will briefly explain a process I have used successfully for many projects.
Determine the general approach, high-level phases and the methodology to be used in the project.
- This is usually done by the project manager and team leads. You may be using a methodology provided by a vendor, other experts or creating it yourself.
- For example, high-level phases might include plan project; gather data requirements; develop data model; design and develop physical database; source, profile and map data; extract transform, load data; etc.
Gather the project team.
- For each high-level phase in the project, follow the steps below to develop your plan.
- One technique is to use large sticky notes or index cards and masking tape. Have team members write tasks on the sticky notes and place on the wall or white board. Use one sticky note per task. It is easy to move the notes as you organize the order. Each sticky note will include as much information as you have at the time: task, owner, time estimates and dependencies.
- Once the manual work is done, transfer the information to your project tracking software (such as Microsoft Project).
- If you already have a fairly detailed methodology, you can enter the tasks into your project tracking software prior to the planning meeting. Breakdown or create additional tasks by entering the information directly into the software.
- Caution: It is easier for the team to understand the project as they see it develop visually using the manual process. It is more difficult when creating tasks and estimating times using only the software.
Benefits from creating the timeline manually first with the project team include: 1) tasks, effort and dependencies are visible to all team members, 2) team has a greater understanding of and commitment to the project, 3) team members have an opportunity to get to know each other and set the foundation for working together. This is particularly important if the team is dispersed geographically and will not be face-to-face throughout much of the project.
Develop task list and description.
- Review the project and business objectives, constraints and the high-level phases.
- Determine tasks needed to complete each phase.
- Keep the description simple. A verb-noun form works well (e.g., interview users, document requirements).
- Continue to decompose the tasks until they have rough durations of two to 20 days.
- Break down the tasks only to the level of detail that you are willing to track.
- Include key checkpoints or milestones as tasks to be completed. A noun-verb form works well for milestones (e.g., requirements completed, data model completed).
Identify owner for each task.
- For each task, identify a single owner who will be responsible for the task deliverable.
- Other people may help with the task, but the owner is the one accountable for ensuring the task is completed on time.
- Time can be estimated as effort and/or duration. Effort is the time needed to complete the task. Duration is the elapsed time in number of workdays before the task will be completed. For example, it may take 20 hours of effort over a duration of two weeks to complete a task.
- Determine which type of time estimate to use. It may be useful to capture what you know about both types while building the initial timeline.
- Some techniques for estimating time include using historical data, actual experience, experts, rules of thumb, known formulas, segment tasks into more detail or actually do part of the task, keep track of the time and extrapolate your estimate.
Note any dependencies.
- Describe any constraints or dependencies that impact starting or completing a task (e.g., data model must be completed before physical database is developed).
Organize the tasks.
- Place the tasks on the wall into a rough order of dependence. Group associated tasks into major categories of work.
- Filter the tasks for any duplicates.
- Number the tasks. The high-level phases could be 1, 2, 3, etc. Tasks under each phase would be sequenced as 1.2, 1.3, or 2.1, 2.2, 2.3, etc.
Create a timeline.
- Write time periods across the white board.
- Use the task time estimates to determine the earliest start and finish dates for each project task.
- Arrange tasks according to time needed and dependencies.
- Link the sticky notes on the board by drawing in the dependencies.
- Transfer the information into your project software.
- Transfer tasks, time, and dependencies into your project planning tool. The software will help you create a timeline with due dates factoring in resource availability.
Use your project plan to track progress. Review and modify your estimates and keep your project plan updated throughout the project.
Larissa Moss' Answer: A WBS is a list of all potential tasks that may be relevant to a DW project. Having access to a WBS, the project manager does not have to start with a blank page and recall from memory what tasks to perform on his/her DW project, but can refer to the list of tasks on the WBS. The book Business Intelligence Roadmap (a DW methodology, ISBN 0-201-78420-3) has a CD in the back with such a WBS (920 tasks in MS Project).
Joe Oates' Answer: The work breakdown structure (WBS) reflects the hierarchy of tasks that must be done in a data warehouse implementation project plan. In my opinion, a good project plan is critical to the success of a data warehouse (or any other software) project. It provides the crucial planning and estimation answers to: 1) what tasks must be done, 2) who is going to do a task, 3) when is the task going to start and when will it finish, 4) how many person hours will the task take, 5) dependencies on tasks that must be finished before a task can start, 6) the sequencing of the tasks and 7) how much is the project estimated to cost.
A good project plan is a very important management tool, because once you do a project plan, you don't just put it on a shelf and forget about it. You use the initial complete version of the project plan as a baseline to see how close your original estimates were. As each task is completed, the project manager should update the project plan with the actual start and end date, who worked on the task, how many person hours it actually took and how much did it actually cost.
Additionally, most project management software will initially provide the project manager with a critical path. The critical path is the sequence of tasks that determine how long the project will take to complete. Unfortunately, it is very rare that the initial critical path holds through a project. As problems come up and individual tasks are delayed, the critical path will change.
Unless the project manager provides tasks at a sufficient level of detail, the critical path will not be accurate. The reason that it is important for the critical path to be accurate is that these are the tasks that should receive management and resource priority. If the tasks are in sufficient detail and are sequenced properly, it is possible to keep a task on the critical path from taking longer than it should.
A properly prepared and maintained project plan provides critical feedback to help management adjust resources to complete the project efficiently. A project plan can also help to manage expectations for customers and management sponsors.
Clay Rehm's Answer: A work breakdown structure (known as WBS) is essential for any kind of project. It describes the work to be done; that is, it defines the scope of the project or phase. It does not describe who is doing it, when it will be done or what dependencies are required.
This is a deliverable that can be shared with every member of the team because it explains the work that is to be completed to meet an agreed-upon goal.
The beauty of it is that it can be as high-level or detailed as you want it. Obviously, the more detailed the better, but you may not have all of the details when the project starts. You will refine this as the project gets underway, and then you will need to freeze it at some point so you can manage change control.
(Posted January 3, 2006)
Is there any database management system (DBMS) which supports temporal databases?
Chuck Kelley's Answer: None that I am aware. However, there are lots of research papers and articles that discuss how to implement temporal within relational databases.
Tom Haughey's Answer: I am not completely versed on the new vendor products in this area but here is my general thinking. Currently, there are two vendor approaches to temporal databases: build a new DBMS or extend an existing DBMS. In terms of extending an existing DBMS, there are two approaches: extend a relational DBMS (RDBMS) or extend an object oriented DBMS (ODBMS). RDBMS are very oriented toward processing high volume, short-lived transactions, such as you see in financial transactions, whereas ODBMSs are oriented more toward complex objects and long-lived transactions, such as you see in GIS and spatio-temporal applications. The ODBMSs in my opinion are not capable of handling large volumes of transactions such as you seen in order processing or financial processing, or large volumes of queries (involving huge amounts of data) such as you see in data warehousing. That is the domain of the RDBMS. On the other hand, because the cost to build a new DBMS is huge, in the short-term I would look to existing DBMSs and to temporal extensions in the DBMS and SQL that existing major vendors have to offer. Each of the major vendors (IBM, Oracle, Sybase, etc.) have some offerings in these areas.
The design of temporal databases using conventional DBMSs is becoming more and more prominent. Even though technical features to support temporal data already exist in many products, it is my experience that they are not frequently used in mission-critical applications. Typically, time is handled directly in the design in one of several ways: by qualifying existing tables with time or by setting up separate tables to collect history. Qualifying existing tables means not just adding time but making time part of the natural key (whether you use a surrogate key physically is another matter). Setting up separate tables for time can be done in both OLTP and in data warehousing. Developers then handle temporal operations through SQL or a BI query tool. A true temporal DBMS has time as an inherent part of the architecture of the product and supports a fully temporal version of SQL. See the writings of Richard T. Snodgrass and Chris Date for more details on these. Snodgrass has been writing on this subject for years. Fasten your seat belt before you read them (especially Date) because it is not bedtime reading.
Given that most data warehouses, data marts and other BI applications are temporal in nature, one would have predicted a much greater use of advanced DBMS capabilities today. There are several reasons that temporal DBMS extensions haven't been used more widely. One is that it is not always a safe bet to be on the leading edge of such technologies due to the risk involved. Second, we have established practices for handling time/audit in existing OLTP applications and for dimensioning time in data warehouses. In addition, both RDBMS and multidimensional DBMSs (MDDBMS) can handle data warehouse history needs very well. In these, time becomes one of the dominant dimensions, even in the form of one or more calendars. Facts are inherently historical. As new facts happen, including changes to existing facts, new fact rows are entered. The time dimension, and time varying dimensions, works well for regularly scheduled data loads in which the data warehouse is refreshed daily, weekly or even monthly. It can even be used to support rolling summaries in the warehouse. Recently, there is a trend toward near real-time warehousing and to operational data stores. In these, feeds occur at very short intervals or even nearly synchronous with operational transactions and are often done through message brokers rather than standard ETL. In these situations, there might be a need to keep some amount of history where updates are volatile and unpredictable, which cannot readily be represented using a dimension table. This will be the specific domain of temporal databases, when they fully arrive.
One last point. Time-series data is another temporal subject. History is usually considered as the random occurrence of heterogeneous data and events, such as you see in customer changes, customer orders and order changes, etc. A time-series, on the other hand, is the regular collection of specific attributes at a regular interval. Though the data warehouse is sometimes called a "time-series," in reality is it partly a time series and partly history. Aggregates and periodic statuses, both important grains, represent time-series, such as customer by product by month, or policy monthly status. The changes to an order or to a customer are more historical; they can happen in any way at any time. Look at it this way. Every day every stock on the NYSE has a closing price, but you don't have to buy any stock on any given day. Closing prices are a time-series; stock trading is historical. Some existing DBMSs offer time-series extensions, plus there are specific time-series DBMSs. I believe that it is safer to consider temporal DBMSs and time-series DBMS as niche DBMSs, used for specific purposes, and to continue to use existing robust DBMSs and their extensions (together with established database design techniques) for mainstream applications.
Joe Oates' Answer: A temporal DBMS is one that comes with built-in time aspects, e.g., a temporal data model and a temporal version of structured query language. The goal of a temporal DBMS is to provide temporal aspects that usually include the concept of the time period in which a fact is true (valid-time) as well as the time period when the transaction was actually recorded and believed to be true in the database (transaction-time). These attributes are combined in a temporal DBMS to provide the ability to record bitemporal data which is the combination of both of valid-time and transaction-time.
I don't know of any commercially popular relational DBMS that supports the temporal aspects of a structured query language (e.g., "overlaps," etc.). The above concepts, while very important to BI analytics, were not important to transactional processing for which most popular DBMS systems were designed.
However, a properly designed BI database can simulate temporal DBMS features very well. The only caveat is that "normal" SQL statements must be used.
For example, consider if John Smith was a product manager for widgets from October 1, 2002, until December 31, 2004. Let's also say that the fact that John Smith started being the widget product manager was entered on October 3, 2002, with the "start date" recorded as October 1, 2002, and the "end date" with no entry because at the time that John Smith was appointed, no one knew how long he would be the product manager.
Let's also say that when John Smith became product manager for widgets the price was $12 per widget, and this was true until November 1, 2004. So, the original record for Widgets having the price of $12 had the "start date" recorded as July 1, 2001, which was entered on July 5, 2001, and the "end date" had no entry.
When the price was changed from $12 per widget to $13, it became necessary to update the above record by entering October 31, 2004, in the "end date." This update was recorded on November 7, 2004. Additionally, a new record was added for widgets that contained the new price of $13, the "start date" for this price was November 1, 2004, the "end date" had no entry, and the new record was entered on November 8, 2004.
On January 1, 2005, someone else took over as widget product manager from John Smith. Therefore, the record that recorded John Smith being the product manager for widgets will now have to be updated with a valid "end date" and the date on which the record was updated.
This simple example allows queries to be written that can tell us, among other things:
- When John Smith was the widget product manager and how long he held this position;
- That John Smith was the widget product manager when widgets were $12 (his time being the widget product manager overlapped the time in which widgets were $12);
- That anyone querying the database on November 3, 2004, would not have known that the price for Widgets had been increased to $13.
- What standard/guidelines should be implemented in the transactional systems to make the data business intelligence ready?
Sid Adelman's Answer:
- My locksmith distributes a keychain with the initials DIRTFT - Do it right the first time. That should be the standard (not guideline - guidelines are only honored when they are convenient). This means that the data should be captured correctly in the source transactional systems. The idea that bad data should be corrected in the ETL process is foolish. Look in the data quality archive section of this Ask the Experts forum and the books written by Larry English and David Loshin.
Tom Haughey's Answer:
(Posted December 12, 2005)
I don't know of any books or papers specifically on this but here are some practical ideas: capture time, capture measurement and goals data, improve quality and consolidate disparate sources. If a data warehouse is supposed to be a historical, integrated, read-only database of subject data, then make sure you can detect and keep history, that you have adequate quality data so it's worth reading, that you reduce the number of data sources and that you rationalize reference data in preparation for the warehouse. One way to proactively reduce the number of sources is when building operational systems to ensure that project teams reuse existing databases where possible instead of always creating "their own" new database and loading "their own" data into it.
The warehouse is based on time so time must be available for event or fact data (which it always is) and for reference or dimension data (which it always isn't). Of course, define your calendars; you may have several, such as a fiscal, periodic and Gregorian calendar.
Understand business goals and embed measurement of them into the operational data. First, you have to collect and understand the goals. Then you have to capture the data you need to perform the success measurement or metric implied in the goal. Here is an example. You want to measure customer order service levels, such as knowing how many potential orders never become orders. To do this, you need to know an order was commenced but later abandoned. It never actually became an order. Ensure you have the necessary base data to measure this.
The more integrated the data, the easier it is to use with business intelligence (BI). This is not easy and can be costly to correct, but seek to reduce redundant data stores. Find some way to consolidate data. Rewriting systems is a good way but is costly and is not the only way. Creating a system to integrate data from multiple, disparate operational systems is another, less expensive way. It is also, by the way, one justification for building an operational data store (ODS).
Improve data quality. What quality of data does the business require? Not every warehouse requires data that reconciles to the penny. Determine what your data quality requirements are and work progressively to achieve them. Find poor quality culprit systems and fix them, preferably at the source.
Finding change data is one of the real challenges in data warehousing. Change data is that data that has changed since the last warehouse extract and load. See what tools you have that will allow you to detect change. Is your DBMS capable of keeping an adequate log that you could examine for change data? See my earlier point on near-real time for more on this.
- I want to compare the top five ETL tools. What parameters should be considered?
Sid Adelman's Answer:
(Posted December 12, 2005)
- You should consider the following:
- Ease of use - some ETL tools are far more difficult than others, consider the learning curve.
- Support - How well the vendor responds when you have problems.
- Cost - Some of these products are expensive.
- Performance - This is especially important if you have large databases.
- Vendor stability - Will the vendor be around in 2007?
- Availability of training.
- Availability of experienced ETL jockeys to hire.
- Availability of experienced ETL consultants.
- Metadata capability of the ETL tool (if you are planning to use the ETL metadata repository).
- Metadata interfaces with your other tools (BI and modeling) that produce and use metadata.
- I am looking for cases on data warehouse and dashboard solutions. Can you provide any or give me links to resources that can?
Sid Adeleman's Answer:
- The Data Warehouse Institute (www.TDWI.org) and DMReview.com (you are already on their Web site) have “World Class” solution awards that document and publish these cases. Be wary, some of which are written by the vendors’ PR departments who are unabashed as they highlight speed to implement, delighted users and an astounding ROI. The DW vendors also have such cases that are unstinting in their praise for the vendors’ products.
Tom Haughey's Answer:
(Posted December 12, 2005)
- Go look at vendors and at vendor-independent sites. Vendors such as Teradata, IBM, Informatica, MicroStrategy, Cognos, Information Builders and others all have abundant case studies. Vendor-independent sites such as Gartner, TDWI, Yankee Group, etc. also have good case examples.
- What are the advantages of ETL tools over Pl/SQL loaders?
Chuck Kelley's Answer:
I look at this like I look at the use of assembly language versus C/Java/COBOL. You can write everything in assembly language, but it will be harder to maintain. PL/SQL (I assumed that you meant ETL to PL/SQL. SQL*Loader is equivalent to the “L” in ETL) can be used to do the “ET” part of ETL. But maintenance of PL/SQL is not quite as easy as maintain as ETL tools like Informatica or DataStage.
The one place where the ETL tools have a huge advantage is in metadata. Since I believe metadata is extremely important, I would lean toward using an ETL tool versus PL/SQL.
Tom Haughey's Answer:
(Posted December 12, 2005)
I presume you are referring to the full ETL process and not just loading. We’ll address both transformation and loading.
The fastest loaders are usually DBMS bulk load utilities. However, DBMS bulk loaders can create new tables and append to existing tables, but not update tables. If you need also to update the table during load, and have a limited batch window, products like OptiLoad for UDB are very good. I have not found SQL insert and update, or Import utilities, to have good performance for large data loads.
The advantages of ETL tools are as follows:
Retention of metadata. This is a major point because analytical applications are highly dependent on proper understanding of metadata.
Ease of use. Because most ETL tools are GUI based and have repositories, they have increased ease of use and ease of modification.
Built-ins. They have built in objects to handle recurring tasks such as aggregation, so these do not need to be coded and recoded.
Skill. Because of the above factors, the skill level requirements for ETL tools are less than with SQL.
Support. There is a large experience base to fall back on. This includes customer experiences and vendor support.
There is no such thing as a free lunch, so ETL tools have some disadvantages, among them:
Cost. They are costly and have large technology and space requirements.
Complexity. ETL tools may sometimes have difficulty with very complex transformation logic, as well as with complex staging requirements.
Performance. Because they are generic, and many of them are interpretive, there can sometimes be performance issues over SQL, for transformations.
MEET THE EXPERTS
Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts. His new book, Data Strategy, is scheduled for publication this year. He can be reached at 818-783-9634 or firstname.lastname@example.org. Visit his Web site at www.sidadelman.com.
Majid Abai is president of Seena Technologies Corporation, where focuses on high-level strategies, business intelligence and enterprise data architecture for clients. His two decades of IT experience have been primarily in planning, architecture and implementation of massive database systems in both operational and business intelligence arenas. Abai coauthored the book, Data Strategy, with Sid Adelman and Larissa Moss. He has also developed and teaches classes in Business Intelligence and Enterprise Data Architecture at the University of California, Los Angeles (UCLA) as well as several other seminars for national & international corporations. You can reach him at email@example.com.
Les Barbusinski is vice president of technology and co-founder of Digital Symmetry, LLC, a consulting firm that specializes in the design and development of data warehousing and business intelligence solutions. He has more than 20 years of experience in data warehouse and operational systems development and provides hands-on expertise in data warehouse design, development and project management. Les can be reached at firstname.lastname@example.org.
Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.
Steve Hoberman is a global reference data expert for Mars, Inc. and a data modeling consultant and trainer for organizations worldwide. He is the author of the book, Data Modeler's Workbench, Tools and Techniques for Analysis and Design. Hoberman specializes in data modeling training, design strategy and in creating techniques to improve the data modeling process and deliverables. He has an innovative approach to reviewing data models and is the founder of Design Challenges, a discussion group that tackles complex data modeling scenarios. You may visit his Web site at www.stevehoberman.com.
Chuck Kelley is a senior advisory consultant for Navigator Systems (www.navigatorsystems.com), an independent consulting firm fully focused on corporate performance management (CPM) and BI solutions. Kelley is an internationally known expert in database and data warehousing technology. He has more than 25 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 45 data warehouses and data marts. He also teaches seminars, coauthored three books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at email@example.com.
Evan Levy is a partner and co-founder of Baseline Consulting Group, a multivendor systems integration and consulting firm. As the partner in charge of Baseline?s largest practice, Levy leads both executives and practitioners in delivering technology solutions that help business users make better decisions. He has led strategic technology implementations at commercial and public sector organizations and advises vendors on their product development and delivery strategies. Levy has been published in a wide array of industry magazines and has lectured on a range of technology delivery experiences at leading conferences and vendor events. He has been a featured speaker at the Marcus Evans Analytical CRM symposium, DCI?s Data Warehousing conference, the CRM Association, DAMA International, the AMA and the Data Warehousing Institute. His current work involves delivering and lecturing extensively on the topic of data integration. You can contact him at firstname.lastname@example.org.
Danette McGilvray is president and principal of Granite Falls Consulting, Inc., a firm specializing in information quality management to support key business processes around customer satisfaction, decision support and operational excellence. Projects include enterprise data integration programs, data warehousing strategies and best practices for large-scale ERP data migrations for Fortune 50 organizations. For more than ten years she led information quality initiatives at Hewlett-Packard and Agilent Technologies. An accomplished program manager and facilitator, she is an internationally respected expert on data profiling, metrics, quality, audits, benchmarking, and tool acquisition and implementation. McGilvray is an invited speaker at conferences throughout the U.S. and Europe, where she trains other industry experts in enterprise information management and data stewardship. You can reach her at email@example.com.
Larissa Moss is founder and president of Method Focus Inc., a company specializing in improving the quality of business information systems. She has more than 20 years of IT experience with information asset management. Moss is coauthor of three books: Data Warehouse Project Management (Addison-Wesley, 2000), Impossible Data Warehouse Situations (Addison-Wesley, 2002) and Business Intelligence Roadmap: The Complete Project Lifecycle for Decision- Support Applications (Addison-Wesley, 2003). Moss can be reached at firstname.lastname@example.org.
Joe Oates is an internationally known speaker, author and consultant on data warehousing. Oates has more than 30 years of experience in the successful management and technical development of business, real-time and data warehouse applications for industry and government clients. He has designed or helped design and implement more than 30 successful data warehouse projects. He can be reached at email@example.com.
Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, meta data and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a coauthor of the book Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science from Carroll College and is currently working on his Masters degree in Software Engineering. He can be reached at firstname.lastname@example.org.