FREE DM Review Site Registration!
Sign-up today and access DM Review on the Web!

Your FREE registration entitles you to:

FREE email newsletters

FREE access to all DM Review content

FREE access to web seminars, resource portals, our white paper library and more!

   

The Virtual Data Appliance

Building Powerful BI Database Platforms Using Grid Computing and Virtualization

The phrase "grid computing" has been used to describe a variety of different computing architectures. In its broadest sense, it covers any array of computing resources, local or remote, that can be harnessed to perform a computational task. It is often used to describe the scenario where computationally intensive tasks are farmed out to exploit the spare capacity on remote computers. While this type of grid computing has interesting scientific applications, it is of little use in enterprise computing, where data security and timely results are important. However, grid computing is also used to describe a flexible, fully connected, shared pool of computer resources within a data center, that can be deployed and redeployed between applications as demand requires. This second form of grid computing (sometimes called platform virtualization) can, with the right software and without any fixed purpose appliances, be used as a very powerful platform for the analysis of complex, large volume data.

Why is this important? The answer is twofold. Virtualization is important because it is arguably the most important weapon an IT organization has in the ongoing battle to achieve more with less. Virtualization makes processing power a global resource that can be easily moved to accommodate business change and fluctuating demand. Knowing that software solutions are out there that can use the enormous processing power of this virtualized infrastructure for complex, large-scale data analysis is important because many vendors are promoting the idea that this is only possible with boxes of specialized proprietary hardware or "data appliances."

The data appliance concept has a number of undoubted benefits, including low TCO, ease of deployment and high performance amongst others. However, appliances fit very uncomfortably in a virtualized grid architecture. This is because a data appliance is not a general-purpose processing resource. It can only ever run one application; but more importantly, the application the appliance provides cannot scale across other elements of the grid when demand increases. Another bigger appliance must be purchased or the business will be constrained. So how can a virtualized grid infrastructure be used to enable the analysis of large and complex data?

The answer is to use massively parallel processing (MPP) database software to pool the processing power of multiple servers on the grid into a single database machine. MPP database software automatically splits the data to be analysed across a number of servers, with each server working on its own bit of the data. The number of servers can be scaled to fit the amount of data being analysed, and it can be increased and decreased as demand fluctuates. MPP software allows systems to scale in size from one server to thousands of servers. Each server looks after its own unique piece of the data and is able to search and filter the data locally, with only the results sets being sent across the network.

Deploying and administrating large numbers of individual servers to form a single database machine sounds very complex, and inherently it is. But MPP database software that completely automates the process is available today. The system administrator simply has to tell the software which servers to use, and the software will automatically build an optimal system, fully configured and ready to go. In effect, the software builds a "virtual appliance" out of elements of the virtualized grid infrastructure.

Figure 1: Where the Virtual Appliance Fits

Architecturally this sounds very neat, but the best news of all is that these virtual appliances are both faster and cheaper than their proprietary hardware data appliance equivalents.

This is not "vaporware"; virtual appliances are here today. Virtual appliances are already powering many organizations' mission-critical BI systems. While HP Labs in Palo Alto, California, have achieved some remarkable benchmark results on a grid consisting of several hundred industry-standard servers.

The upshot of all of this is that organizations looking to follow the flexible, cost-effective IT infrastructure path need not be held back by the myth that only fixed-purpose, inflexible data appliances can solve their large-scale data analytics problems. There is another solution that fits perfectly into a virtualized grid infrastructure, while still being faster and cheaper - the virtual appliance.


Roger Gaskell is director of product development for Kognitio. Kognitio was formed in August 2005 with the merger of Kognitio and WhiteCross. Gaskell joined WhiteCross in 1988 and has overall responsibility for product development. He has been responsible for the development of WhiteCross's data appliance technology, evolving WhiteCross from a proprietary hardware appliance to a software-only virtual appliance built on industry-standard servers. He may be reached at roger.gaskell@kognitio.com.

For more information on related topics, visit the following channels:



Industry Vendors