Published in DM Direct in March 2006.|
Printed from DMReview.com
Enterprise Data Fabric: Data Backbone for SOAsby Bharath Rangarajan
Summary: A company that wants to compete in today's fast-paced landscape should employ a cohesive strategy that includes an enterprise data fabric-based information layer to fully capitalize on the huge potential of SOAs.
A recent survey of 473 enterprise buyers by the Yankee Group of Boston revealed that in the next 12 months, 75 percent plan on investing in the technology and staffing necessary to enable a service-oriented architecture (SOA). More and more, enterprises are realizing the benefits of de-coupling applications and reusing them as shared services. However, in many of these IT organizations, SOA is synonymous with simple, services-based functional integration of applications. Complex business processes and workflow elements are often mere figments of vendor presentations that rarely materialize in the real world. While enterprises have built standards-based interfaces (such as Web services description language - WSDL) to enable application interoperability, the underlying infrastructure, especially the data infrastructure, is seldom revamped to support an SOA. The SOA revolution, therefore, is often confined to the service interface layer and isolated from the service infrastructure layer. The movement toward holistic SOA adoption will involve a broad technology strategy representing a strategic shift in IT thinking. New business processes must be deployed, and the infrastructure layer must support more users, larger data volumes and more stateful, real-time workflows. To make matters worse, the service level agreements (SLAs) become more stringent as well.
Relevance of an Enterprise Data Fabric in SOAs
Traditional data architectures built upon databases and client/server systems were designed with a specific set of use-cases in mind, and are often unsuited for an SOA-based distributed application environment. In an SOA environment, clients and users are completely decoupled from the back-end systems, and data access patterns are unknown and unpredictable. As new services and clients are added, there is an unexpected increase in data volume that overwhelms existing systems. This impacts system reliability and scalability, leading to noncompliance with SLAs. With a growing number of data sources and the subsequent data deluge, IT organizations usually force existing infrastructures such as databases and messaging systems into an SOA framework characterized by redundant data transformations and data replication. Poor system performance and inefficient data distribution among services are natural consequences of such force-fitted architectures.
An enterprise data fabric (EDF), shown in Figure 1, is a middle-tier data platform that addresses the aforementioned challenges and augments SOAs by:
Figure 1: Enterprise Data Fabric in Service-Oriented Architectures
As illustrated in Figure 1, an EDF is positioned between the services layer and back-end data sources to complement a SOA. It provides a distributed data backbone for services to access, query, transact and share data in random access memory (RAM) spanning multiple nodes.
Service-Orienting Your Data Infrastructure with an EDF
In SOA deployments, services need the ability to access enterprise data though standardized interfaces. However, most traditional data repositories necessitate the use of custom access protocols. An EDF alleviates this problem by service-orienting underlying data repositories and exposing relevant information from a variety of sources through application programming interfaces (APIs) such as simple object access protocol (SOAP) and XML:DB, in addition to programmatic APIs such as Java, C++ and C#. Distributed querying through languages such as Xpath, structured query language (SQL) and object query language (OQL) is also supported via a data fabric. An EDF thus enables "location transparency," whereby a deployed Web service can access relevant information without any knowledge of the actual data source(s) or access protocols.
EDF's metadata management capabilities and support for the aforementioned querying and data access standards falls in line with the data mediation services defined in the service data objects (SDO) specification, a standard proposed by BEA and IBM. However, an EDF goes well beyond the requirements mandated by SDO. It not only provides a standards-based conduit for enterprise data access, but also serves as a persistent data store for XML documents, objects and relational data. Intelligent data placement strategies such as colocating/caching data within a service for performance or replicating data for failover are possible through an EDF. These strategies address another extremely important dimension of service orientation, which relates to compliance with SLA metrics for service latency and availability.
Fabric-Based Information Management for SOAs
The ability of an EDF to manage data in the middle tier across distributed memory systems enables effective information management patterns in SOAs through several key features:
Caching frequently used business data: Business workflows often require access to critical business data such as customer profiles or product data. Unfortunately, these data elements are often fragmented in multiple systems, making them unusable in an SOA environment. With an EDF, frequently accessed data can be cached in-memory and in a format of choice, so that I/O latency, network latency and transformation latency are avoided. The data held in-memory can be an operational subset of the entire data set held in a database or other systems of record. For instance, a claims-processing system can selectively load customer profiles of all California customers into an EDF for a particular workflow. Based on the client request patterns, the EDF determines which operational data subset needs to be held in-memory, ensuring only the most useful information resides in RAM. The physical memory limits of a single hardware node (typically approximately 4GB in a 32-bit server) are overcome by partitioning data across multiple nodes. The partitioned data is then presented as a single logical entity to the data consumers. Querying functions supported in an EDF enable further manipulation of the data held in-memory. As a result, instantaneous data availability and faster business processes that benefit the end user are made possible.
Data sharing among services: In certain workflows a set of services share access to a common data entity, such as reference data in a financial trading workflow. Having this data in a centralized location such as a database can quickly lead to contention issues and I/O bottlenecks. An EDF provides a convenient model for multiple services to share access to a common data entity. Because data is inherently distributed with an EDF, no conflicts arise upon parallel access. If a service changes some elements of this common data entity, updates are propagated only to those services that are impacted by this modification. This intelligent and dynamic data propagation, seldom possible with traditional messaging systems, reduces network congestion.
State and context management within business processes: A 2005 InformationWeek survey identified data sharing and collaboration as a key requirement for improving business processes. This is not surprising given that real-world business processes are often stateful by nature. For instance, in an online loan application process, the user data entered in the first form is in a contextual state that needs to be managed and shared during the course of the workflow. The verification service, the origination service and other services access and update this information. Because these Web services are usually stateless and deployed on different physical hardware and software systems, state must be managed in a distributed fashion across disparate entities. An EDF enables such a workflow by delivering contextual state instantaneously to a variety of services. By replicating state in distributed RAM, parallel tasks can be initiated and failed business processes restarted from the point of termination. Benefits of managing state in an EDF become evident when the same service is executed as a part of several different business processes or by several instances of the same business process.
SOAP request and response caching: XML data processing is one of the main hurdles in Web service deployments. Parsing, encoding and decoding XML documents often impact service performance, especially under sporadic client load. An EDF can be used to cache SOAP requests and responses in-memory in native XML format. By storing SOAP responses in the cache, client requests can be handled without invoking the actual service implementations. Significant performance and scalability improvements result from this approach. Based on underlying data or application changes, SOAP responses cached in memory can be updated or invalidated as needed.
Web session management: Because clients access services through a Web-tier in most scenarios, session management becomes a critical function within SOAs. Traditionally, session objects have been stored in databases, or more recently in Web containers such as J2EE application servers. However, these approaches are prone to data bottlenecks, cannot scale to increasing user loads and often cause timeouts. An EDF offers an efficient model for managing session data in distributed RAM across virtual machines. Session data is managed in a highly available manner through replication and disk persistence. An EDF can also scale to large numbers of user sessions by partitioning session data across a cluster or by overflowing passive sessions to disk.
From a Messaging-Centric Workflow to a Data Fabric-Centric Workflow
In most enterprises, application integration is implemented using a messaging layer. Consider this real-life use-case: A leading sports apparel manufacturer's order processing workflow involved data flow between their sales order, shipment and invoicing services. Purchase orders, shipment notices and invoice data (modeled as Java objects) were moved across these different services via a message bus. The challenge here was that these data objects had to be marshaled and unmarshaled for every transfer, and the receiving system had to reconstruct all the object relationships. Moreover, when changes were made, to a purchase order for instance, the message bus would only send an event notification and the recipient had to extract the actual data from a system of record, such as a database. The net result was a highly inefficient, slow and cumbersome process.
Instead, if an EDF is adopted in such a scenario, the three systems can share data easily and maintain domain object relationships without the marshaling and unmarshaling overhead. Moreover, the fabric layer would automatically transmit any data changes, while making the underlying data objects instantly available to the business applications. In a data-fabric-centric workflow, applications deal only with their domain data objects and leave the underlying plumbing to an EDF, which offers the ability to support both pull-based (client-driven) as well as push-based (event-driven) workflows. Such a fabric offers the best of both worlds by embodying database-like persistence mechanisms and messaging-like distribution semantics to fully leverage an SOA.The advent of service-oriented architectures represents great promise for organizations of all types, allowing firms to maximize investments in existing systems and substantially lower operating costs. However, the need for a robust data infrastructure cannot be overstated when one considers the overarching performance, reliability and scalability requirements of SOAs. The transition to an EDF-based data management platform can have far reaching effects on an organization, especially when one considers the evolution to be expected with SOAs. One such concept that has already emerged is that of the event-driven architecture (EDA), touted by IT pioneers as an organic extension of SOAs. In the world of EDA, IT systems must deal with real-time streams of data/events, such as market data in financial services or RFID data in retail supply chains. An EDF provides the unparalleled ability to monitor, analyze and store such fast-changing streams of data, and correlate the same with other enterprise information resources. While SOAs help streamline applications and IT resources and build customer-centric workflows, EDAs enable an organization to become more responsive to opportunities and threats both within and outside its four walls. A company that wants to compete in today's fast-paced, ever evolving landscape and reap the benefits of such architectures must employ a cohesive strategy that includes an EDF-based information layer to fully capitalize on the huge potential of SOAs.
Bharath Rangarajan is director of product marketing at GemStone Systems, where he oversees product positioning and market strategy for the GemFire product line. He has more than seven years of experience in enterprise software, dealing with data management issues in functional areas such as EAI, B2B collaboration and supply chain management. He has led technical and product teams at companies including i2 Technologies, SeeBeyond Corp. and Candle Corp.You can reach him at email@example.com.
Copyright 2007, SourceMedia and DM Review.