|Sign-Up for Free Exclusive Services:||Portals|||||eNewsletters|||||Web Seminars|||||dataWarehouse.com|||||DM Review Magazine|
|Covering Business Intelligence, Integration & Analytics||Advanced Search|
Computer scientists have spent decades considering the problem of how to effectively process ever-increasing amounts of data. The answer has always been somewhat akin to the question of how one would eat an elephant (small bites, please). In other words, overcome the big problem (a huge amount of data to analyze in a short amount of time) by breaking it down to a large number of small jobs running in parallel. For example, assume you had an acre of grass to mow. If you had 10 people and 10 lawnmowers attacking the problem, we could accomplish the job in one-tenth the time.
This is essentially the theory behind what is known as massive parallel processing, and it is how today's largest corporate data warehouses are capable of handling the increasing volume of corporate data. All of this sounds relatively simple and straightforward; however, massively parallel computing infrastructures are both expensive to buy and complex (another word for expensive) to manage.
To make matters worse, organizations of all sizes are becoming increasingly dependent on data analytics. Indeed, since 2000, data warehouse/business analytic infrastructure has become or is in the process of becoming a business critical application for most companies. Companies have always searched for better ways to understand their customers and anticipate their needs. They have longed to improve the speed and accuracy of operational decision making. In short, they wanted to know all the secrets hidden within the massive amounts of ever-increasing data volumes. While the desire to improve the analysis/timeliness of an organization's data has been felt for more than 20 years, the practical capability to do so eluded all but the largest IT shops. However, powerful trends have been impacting the data warehousing space over the past few years. These trends are creating a convergence of an organization's historical desire to derive value from the data, with the opportunity - and, more importantly, the capability - to address the growing demand for business analytics with a simpler, cost effective approach.
Now that most IT organizations have implemented the large ERP packages and have Web-enabled most key customer applications, the focus has moved toward data warehousing and analytics. Technology innovation continues to drive down the costs associated with server processing power and storage capacity. Software licensing costs are beginning to be impacted by these trends, in addition to the growing influence that open source software is having on commercial software licensing and pricing. Greater computing capacity at a lower cost offers an opportunity to redefine what "big" means, with respect to a data warehouse or data mart. Multiterabyte-sized analytic stores will be the norm, not the exception. Processing power is getting cheaper, but organizations are chewing up that capacity as fast as it is available. By continuing to innovate how data is used, or even creating new classifications of data (such as subtransactional data), organizations will continue to stress traditional analytic infrastructure. So how can the complexity issue be addressed?
The Data Warehouse Appliance
Since the concept of the data warehouse was first introduced, end users have wanted a solution that was less complex. Many end users wish they could simply purchase a data warehouse the way they purchase a payroll application. Unfortunately, business analytic needs are constantly evolving, making productization of the warehouse difficult. Even the word "evolving" is inaccurate in the context of an organization's business analytic needs, as it implies constant but slow-moving change. The reality is that analytic needs within an organization change very rapidly. Additionally, demands for immediate tactical analysis (versus longer-term strategic analysis) make analytic infrastructures inherently complex.
Innovative vendors are now emerging to attack warehouse complexity by taking advantage of many of the previously mentioned trends in hardware and software. While delivering a packaged data warehouse might be impractical, complexity can be addressed through the productization of a warehouse or data mart's underlying infrastructure.
The data warehouse appliance combines the price/performance of Intel-based processors, open source software and low cost disk storage in a single cabinet. The combination is purpose-built to handle analysis against terabytes of data quickly and simply. By using a massive number of CPUs, these data warehouse appliances are uniquely designed to eat the elephant that is a multiterabyte analytic data store.
The market for data warehouse appliances is growing quickly. Netezza is the pioneering vendor leading the data warehouse appliance trend. Its Netezza NPS system scales from less than one terabyte of user data up to as much as 27 terabytes of user data. Other vendors are already rushing to market with similar solutions, and users are buying. But why are large companies willing to take a flyer on such a new trend?
Total Cost of Ownership: The Key Differentiator
Total cost of ownership (TCO) is a major, top-of-mind issue for virtually every IT organization today. Defining what TCO consists of can be ambiguous at times for many organizations. We define it as the initial purchase price for the solution plus how long it takes for the vendor to deliver an acceptable working production environment. Then we add the cost of maintaining or sustaining a well-performing stable environment. It is this third piece that often comprises as much as 80% of the TCO for an application. This portion consists primarily of personnel costs to monitor and tune the system.
Since appliances are built specifically to address large analytic workloads, the time-to-value piece of the TCO equation is rather simple. Time-to-value is an extremely important metric because it directly drives an organization's return on investment (ROI) for the warehouse or mart environment. Some early adopters of the Netezza appliance have reported provisioning times of four hours to get a working sustainable analytic environment - compared to four weeks with an Oracle/Sun/EMC infrastructure to do the same thing. More importantly, the performance was 10 to50 times faster on the appliance. Clearly, nonappliance vendors such as Teradata and IBM have also demonstrated good time to value. They are total solution providers with well-defined configurable units that they can deliver quickly based on their extensive warehousing experience and deep knowledge of their reference infrastructure's capabilities. However, IBM and Teradata are typically used for enterprise-wide strategic BI initiatives that typically require customized solutions and professional services. T he data warehouse appliance may be used in the future for the same strategic purposes, but today there is so much demand for tactical and operational analysis that must be done quickly. Users should strongly consider using the right tool for the right job. So for example, if you are a telco that needs to query 18 billion call detail records daily to stay current with billings, the job is operational in its timeliness but fundamentally analytic in nature. The data warehouse appliance is able to accomplish this task in minutes versus hours.
The data warehouse appliance also shines relative to its traditional warehouse infrastructure brethren in the area of maintenance. Appliances are "load and go" environments. Since they brute force the data efficiently with a high ratio of disk to processor, creating a massively parallel query engine in a box, they don't require indexing. More importantly, they don't require any specific physical database design or hints to make the database optimizer use indexes designed so painstakingly by a DBA. So organizations spend the bulk of their time actually querying data, not tuning the database to query the data. What a concept!
With the demand for data analysis increasing, IT organizations must look for the proper tools to address the fast-changing needs of their business-user clientele. While the data warehouse appliance may not be the same thing as a data warehouse in a box (or should we say cabinet), it does simplify the underlying analytic infrastructure. While no tool yet addresses the needs of the entire spectrum of analytic needs, the data warehouse appliance model is sure to be an option that most IT organizations will want in their analytic toolbox.
For more information on related topics visit the following related portals...
Databases, DW Administration, Mgmt., Performance and DW Basics.
Charles Garry is a former vice president and director with META Group's Technology Research Services organization and has more than 18 years experience in the database market. He can be reached at email@example.com.