Making the Case for Complex Event Processing Software
Complex event processing (CEP) involves the continuous processing and analysis of high-volume, high-speed data streams from inside and outside an organization to detect business-critical issues as they happen. In comparison to traditional intelligence processes, which provide delayed analysis, CEP software processes data streams and detects business events in real-time. Some examples of CEP applications are:
- Real-time financial market data analysis and enrichment,
- Financial trade auditing and compliance,
- IT security event correlation,
- Asset management and tracking using RFID, and
- Manufacturing process, power grid or energy pipeline monitoring.
The vast majority of event processing applications today are custom-coded. Much of this custom coding effort, however, can be eliminated by using CEP software; the level of time and cost savings corresponds with the complexity of the event processing application. The remainder of this article will articulate a framework by which you can understand where and to what degree CEP software can offer cost savings over custom development.
What Does CEP Offer?
CEP software offers two major components: a high-level language for programmers to easily describe how to process the streams, and an infrastructure engine for processing and analyzing high-volume data streams. Although CEP software performs different functions, the component structure is mildly analogous to database software, where there is a language (SQL) and an engine (the database server).
Because some of the operations a programmer wants to perform on data streams are similar to a relational model, a select number of CEP vendors offer a language that is based on SQL. This provides a familiar programming environment, speeding the creation of event processing applications.
The engine provides the core components to execute the analysis at run-time. The engine takes on many complex tasks typical in data management infrastructure software as well as those unique to event processing:
- Stream management: Data streams are analogous to a database table of infinite size, with each new event appending a row onto the table. As streams often travel over a network, there can be issues such as dropped, delayed or out-of-order messages. A good CEP engine will automatically handle all these issues without requiring programmer intervention, ensure reliable message delivery and generate a valid, dependable stream for processing.
- Memory management: Data streams can become very large and have many queries running against them. A good CEP engine needs to optimize how memory is managed to ensure high throughput. Special care must be taken to avoid copying and ensure that every piece of data is only stored once.
- Parallel execution and synchronization: To maintain performance, a CEP engine will perform operations in parallel and synchronize data between the threads. Excess synchronization can hurt performance. Thus, a CEP engine not only has to automatically perform state synchronization for the programmer, but it must also balance the synchronization rates for efficient execution.
- Windows: Processing on data streams is performed in "windows," typically, units of time. An efficient CEP engine must be able to expire messages properly, both on new events and timer events.
- Indexing: Fast-moving data streams require indexes to be continually updated at a similar high rate for efficient processing. A good CEP engine will automatically manage these indexes so the programmer does not have to deal with such issues.
These and many more functions are abstracted from the programmer, making the development of CEP applications easier.
Types of Event Processing Applications
If a developer were to create a custom-coded event processing application, he or she would need to code some if not all of the CEP engine features mentioned above, depending on the complexity of the event processing application.
To simplify the framework for determining the applicability of CEP software, let's examine event processing applications in four tiers:
- Tier One: simple event processing applications,
- Tier Two: event processing applications involving multiple streams and/or stored data,
- Tier Three: complex analysis and pattern matching across event streams, and
- Tier Four: multiple, enterprise-class event processing applications.
As you move from Tier One to Tiers Two and Three, the analytical functions increase in complexity, raising the amount of code required to process the streams. Moving to Tier Four introduces infrastructure requirements, further increasing the programming task of the development team.
Tier One Applications
Basic event processing applications will work with a single message stream and typically perform processing one message at a time. The processing usually involves simple filtering, calculation and routing, looking for a specific event to notify a person or application.
An additional level of complexity to a Tier One application is aggregations over a window. This is where the application will continuously calculate a value based on fields in a single event, such as a moving average, over a specified period of time, called a window. A simple example query for such an application, is:
As you can see, coding such queries in a high-level CEP language is very straightforward. But a CEP engine also has its costs, mostly centered on the learning curve for the language, the learning curve for the CEP engine, and licensing the software.
The custom-developed code for Tier One event processing applications would also be relatively straightforward and can be created quickly and efficiently by most developers. Thus, the initial costs of using a CEP engine may not outweigh the efficient custom-coding effort.
An additional cost factor when comparing custom-coded event processing applications to using a CEP engine, however, is maintenance; custom code is often difficult to maintain. Programmers change jobs or firms. Code is often not well documented. The high-level language approach of a CEP engine will make it easier to maintain CEP applications. Maintenance costs should also be taken into consideration.
For a single Tier One event processing application, custom coding is often the faster route and least expensive choice versus a CEP engine. There are certain maintenance benefits to using a CEP engine, however, that can lower the lifetime cost of the application, which could make a CEP engine a viable option even for simple applications. Lastly, if the teams need to support multiple Tier One CEP applications, then they should look at the infrastructure cost structure outlined in Tier Four .
Tier Two Applications
Tier Two event processing applications introduce the need to process more complex data streams, integrate multiple data streams and use stored data. A programmer would need to write very complex code that is typically handled in data management software. To custom program the joins, unions, aggregations and other functions would require sophisticated code for indexing, merging, memory management and more.
CEP languages and engines provide a high level abstraction of this functionality for the programmer, dramatically reducing the programming effort. The CEP languages that are based on SQL borrow much of this syntax from this pervasive language, giving the developer a familiar environment that lowers their learning curve. For example, the following query joins two data streams within a 10-second window:
The high-level language of a CEP engine offers a much easier programming model for the more advanced processing and analytical functions, and the engine abstracts these functions from the programmer, lowering the development cost of the application.
With Tier Two event processing applications, a CEP engine starts to become a better option than custom coding. To make your justification, total the developer time to code the more advanced data processing functions (joins, integration, etc.), which will likely be larger than the CEP engine licensing costs and learning curve.
Tier Three Applications
A common task for event processing applications is to find patterns within the event streams or aggregates. With Tier Three applications, sophisticated pattern matching is added to the mix of stream processing functions such as joins and aggregates.
Custom-coded pattern matching functionality requires very complex state control and synchronization programs, in addition to the difficult merge, union and join code. The code needs to find that events occurred, the sequence in which they happened, and detect that certain events did not occur. The high-speed data streams and the use of Windows would require maintaining many parallel threads of execution, further stressing custom-built memory management and synchronization code.
Some CEP languages provide extensions for the expression of event patterns. In the following example, the MATCHING clause dictates to look for event "a" and event "b," followed by event "c," followed by the absence of event "d," all within a 10-second window:
The high-level language of a CEP engine offers a much easier programming model for pattern matching, and the engine abstracts these functions from the programmer, lowering the development cost of the application.
With Tier Three event processing applications, a CEP engine really shows its mettle and is the logical choice over custom coding. The cost savings could tally to multiple man-years. To make your justification, total the developer time to code the advanced event processing functions (joins, integration, patterns, state control, etc.), which will be much larger than the CEP engine licensing costs and learning curve.
Tier Four Applications
At Tier Four, organizations are building multiple, distributed event processing applications in mission-critical environments. This will require the development teams to support the following functionality in their CEP applications:
- Application or query modules that componentize the application functionality and enable easy change,
- Multi-threading, clustering and pipelining for scaling the applications,
- Publish and subscribe connectivity to scale the number of clients,
- Persistence and failover for high availability, and
- Management consoles for deploying applications and monitoring performance.
Supporting this level of functionality in a custom CEP application would require sophisticated code and a significant development and testing effort. CEP engines are built with significant server infrastructure containing much of this functionality. But not all CEP engines are built alike. Key items you want to look for are:
- The ability to dynamically deploy query modules without the need to shut down existing servers or applications
- Configurable clustering that requires no coding and supports mix-n-match configurations that combine parallel, pipelined and failover clusters;
- Client connectivity that allows clients to easily connect to output streams and automatically scales the number of clients;
- Graphical consoles that allow you to view all deployed applications and servers, and easily deploy new modules; and
- Support for SNMP management frameworks to allow monitoring by enterprise management consoles.
Quite often the effort to deploy, manage and scale an application is not taken into account when calculating the cost of developing an application. For Tier Four event processing applications, the infrastructure of CEP servers can offer a much easier and less expensive means to control many CEP applications. To make your justification, add up the time to develop "infrastructure" code and manually deploy and manage custom applications. Then, compare this figure to the similar task using the managed infrastructure of a CEP engine.
For basic event processing applications (characterized by low volume, single-stream, simple aggregation), developers can often roll their own custom-coded applications as quickly and efficiently as using a packaged complex event processing engine. Even though certain CEP engines offer a SQL-like language, there is still a small learning curve involved. For such simple event processing applications, the switching costs, as small as they may be, are higher than the cost to maintain existing custom event processing code.
Using a CEP engine becomes more cost-effective versus custom-coded event processing applications when:
- Sources become more complex - The data is high speed and high volume; the application processes multiple streams, integrates stored data and mixes different data formats.
- Query complexity grows - The application queries become more complex (many joins, correlations); the application uses multiple types of queries and pattern matching.
- Application infrastructure requirements grow - The development team needs to support many query modules and applications; the applications require continual change or addition of queries; the high volume of processing requires the use clusters; the mission-critical nature of the application requires high availability.
When faced with these three types of requirements, you can build a cost justification for using a CEP engine in your development projects by summing the development, testing and management time required to support this more sophisticated functionality. We believe you will find this cost will be significantly higher than licensing, learning and using a CEP engine.
For more information on related topics visit the following related portals...
Business Process Management (BPM),
Real-Time Enterprise and
John Morrell is the director of Product Marketing for Coral8, Inc., a provider of complex processing software and innovator of the SQL-based Continuous Computation Language. Morrell has over 20 years experience in enterprise software and data management. He may be reached at firstname.lastname@example.org.