FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

Passing the TiVo Test

Judging from recent events and articles, data warehouse appliances have come into their own. Netezza had a very successful IPO in July and new companies are emerging. Vendors like HP and IBM have come out with new offerings.

Why all this interest and activity now? Quite possibly it's because the mainstream business community has reached the data threshold and is looking hard for data management solutions in the multi-terabyte range. More and more, appliances are on the IT expenditure short list.

Just a few years ago, multi-terabyte data warehouses were few and far between. Organizations are voluntarily collecting more data than ever because they know they can use it to guide their business. Examples of this category include retail and e-commerce and their collections of clickstream data, customer information and transaction details. Any organization impacted by data retention regulations or reporting and auditing mandates is required to accumulate larger amounts of data. Almost everywhere you look, organizations have to, want to, or plan to capture, retain and eventually use vast amounts of data.

The data warehouse appliance was introduced specifically to address the needs of the "big data" vanguards. At that time, the wherewithal to amass and analyze large databases was in the hands of a few technologically sophisticated companies who sought to simplify their data warehousing infrastructure. The appliance approach relieved IT of having to build their own infrastructure out of a mix of iron, wiring and hand-coded software modules. If the infrastructure could be streamlined, more time could be spent on the data and information side of the equation, that is, the parts that brought tangible value to the business.

The data warehouse appliance achieved that by packaging together everything needed to build a data warehouse. Its goal was to deliver a "data warehouse in a box." The early ones were very powerful and expensive boxes that called for significant investment and expertise to implement successfully. Figure 1 below shows how the appliance simplifies the data warehouse infrastructure by reducing the number of components, vendors and connections that need to be managed by IT staff.

How Do You Know an Appliance When You See One?

These early appliances, or data warehouse boxes, fulfilled the strict definition of an appliance: they were built to a specific purpose. Implied in this definition, though, and certainly an important part of how we understand the term "appliance," is the sense that the appliance makes the task easier to perform. For example, we expect an appliance designed to grill slices of bread to make it easier. And if the bread browns more evenly, that's all the better.

An even better analogy to the data warehouse appliance is TiVo. The TiVo device is a package of very sophisticated hardware, connectivity and software. But, it comes as a box with straightforward cables and is pretty close to "plug and play." Consumers don't know that there's a hard drive or Linux operating system in there. And frankly, they shouldn't have to. They bought TiVo so they wouldn't have to miss their favorite shows. TiVo's approach means that they get to spend their time personalizing the offering to match their objectives instead of configuring and tweaking some low-level parameters.

Data warehouse appliances are far simpler to install and maintain than a typical database and storage infrastructure. They're easier to get up and running than a custom-built one. But is that enough? Where do they fall on the TiVo spectrum? How hard is it to personalize them?

Personalizing Infrastructure

Why talk about personalization in the context of data warehousing? Why add personalization to the list of desired characteristics for a data warehouse appliance? I believe personalization is the element that brings the data warehouse appliance to the next level of usefulness or relevance to the business drivers that are behind current data warehouse funding. The current state of data warehouse appliances sees them being very good at performing the operations they were designed and implemented for. There's one problem - things change.

A business buys a data warehouse appliance. That appliance's design has been based on assumptions about how a data warehouse looks and is used. A business chooses the appliance that best matches their expectations of a data warehouse. Then a business spends a lot of time and resources implementing (configuring, tuning) the data warehouse, often reshaping how entire departments touch data. Then things change - a new regulation, a new payment model, a merger or acquisition. Data warehouse infrastructure can change, too - but with enormous overhead and disruption.

Today, changing infrastructure to accommodate business shift is cumbersome and personalization is near impossible. By personalization I mean continually adjusting aspects of the appliance to suit an organization's needs. I believe that the market will soon demand the ability to personalize infrastructure as customers start expecting their data warehouses to contribute to their agility instead of defining its limits. Greater flexibility will translate into more demands for scalability, accommodating both complex analytics and routine reporting with everything in between, handling more users wanting to do a greater variety of things with more data. And, businesses will want to do this in ways that are unique to their enterprises because therein lies their competitive advantage.

What's your Infrastructure's TiVo Rating?

Understanding how close your data warehouse infrastructure is to a TiVo system will allow you to anticipate the requirements your business users will soon bring. The closer you are to matching TiVo, the more effectively you'll be able to respond.

To profile your infrastructure's flexibility, consider all the components that have to be touched when you make significant additions to users, data volume, or dependent applications:

  1. How many physical parts are involved? Include hardware, cables, power packs, etc.
  2. How many changes have to be made to the physical environment? Think green in terms of power usage and cooling arrangements, as well as space.
  3. How many changes to connected hardware? Include all backup, disaster recovery and additional storage hardware. Impacts to second-tier systems must be considered also.
  4. How many changes to dependent software? For example, the software used to support items covered in question 3.
  5. How much work is expected to get it to meet your basic operating requirements? Think of which skill sets you'll need, how much staff, and what is on business users input are involved.
  6. How much work to test? Consider IT hours and hours on the business side to design tests, set up the environment and do actual testing.

The higher the numbers in your answers, the more work, complexity and risk are involved with every change. High numbers mean that your infrastructure has serious barriers to flexibility, on which personalization depends. Traditional tiered storage architectures have very high numbers with its many moving parts and touchpoints. A first-generation data warehouse appliance has fewer internal touchpoints; however, these often require vendor consultants. Interfaces to the systems outside the appliance require special expertise as well. The newest data warehouse appliances will have the lowest numbers because they are designed to be nondisruptive, requiring no adjustments to applications and no disturbance in the business user's world. From the IT perspective, they quickly become good data center citizens, scaling easily, requiring little attention and few resources. They reduce the impact of change, making change feasible and personalization possible.

To build flexibility into your data warehouse, you need to start streamlining each of these change effects. The more minimal the impact of a change, the more easily you can adapt to the next wave of business requirements. Taking the steps you'll need to bring your infrastructure forward to the point where personalizing is business as usual, is now an affordable option you can't afford to ignore.


Foster Hinshaw, often referred to as the Father of Data Warehouse Appliances, brings a wealth of creativity, technical and operational expertise in both hardware and software to Dataupia. He is accomplished at designing and developing large complex systems for business-critical enterprise and departmental applications, as well as Web-based e-commerce systems. Prior to Dataupia, Hinshaw founded Netezza, the provider of enterprise-class business intelligence appliances.

For more information on related topics, visit the following channels:



Industry Vendors