How to Bridge Stateful and Event-Sourced Systems

In the world of modern software architecture, there are two dominant paradigms for managing and persisting application state: the traditional stateful approach, and the newer event-sourced approach.

Stateful systems, as the name implies, keep track of the current state of the application by storing it in a database or other persistence layer. This could be a relational SQL database, a NoSQL key-value store, or any other technology that allows you to save and query the current snapshot of your application‘s entities and aggregates.

Event-sourced systems, on the other hand, persist state as a sequence of state-changing events. Rather than storing the current state, event sourcing stores the history of changes that led to that state. The current state can then be derived by replaying the events in order.

Both of these approaches have their pros and cons, and are better suited for different use cases. Stateful systems are simpler and more familiar to most developers. They make it easy to query and update the current state of the system. However, they can run into consistency and scalability issues in distributed environments.

Event-sourced systems provide better auditability, as they preserve the entire history of changes. They enable powerful capabilities like being able to reconstruct past states, or even do "what if" analysis by replaying alternative event sequences. Event sourcing also fits well with other design patterns like CQRS and Event-Driven Architecture. The downside is added complexity, both in terms of the code and the mental models developers need to understand.

The Need for Integration

In an ideal world, we could pick one state management approach for our entire application and stick with it. In the real world, however, we often need to integrate disparate systems that follow different models.

This is especially true in large enterprises with complex IT landscapes. There you‘ll find a mix of modern cloud-native microservices alongside legacy monoliths, third-party packages, and home-grown systems built up over decades. Getting all of these to play nicely together is a major challenge.

Even in newer, greenfield development, there are often good reasons to adopt a hybrid model. Some parts of the domain may be best modeled as a stateful system, while others are better suited for event sourcing.

The question then becomes: how do we bridge the gap between these two worlds? How can we have our stateful cake and eat the event-sourced icing too?

Challenges of Bridging the Gap

The crux of the problem lies in the fundamentally different way that stateful and event-sourced systems handle state. In a stateful system, the current state is front-and-center. You can easily retrieve the current value of any entity or aggregate. Updates are typically done by directly modifying that state.

In an event-sourced system, state is a second-class citizen. The real first-class citizens are the events. To get the current state, you need to sequence through the event log and apply each event to reconstruct the entity or aggregate. Updates are done by appending a new event to the log, not by directly modifying the state.

This leads to a number of challenges when trying to integrate the two:

Consistency and synchronization: How do we ensure that the stateful and event-sourced representations of state remain consistent? If we update one, how do we make sure the other is updated accordingly?

Dealing with unreliable event streams: In an event-sourced system, the event log is the source of truth. But what happens if events are missed or out of order? How can a stateful system rely on an unreliable event stream?

Performance and scalability: Deriving current state from an event log can be expensive, especially for large, frequently-updated aggregates. How can we shield stateful systems from this performance hit?

Querying and analytics: Stateful systems make it easy to query current state. Event-sourced systems make it easy to analyze changes over time. How can we get the best of both worlds?

Fortunately, there are a number of architectural patterns we can apply to mitigate these challenges. Let‘s take a look at a few of the most powerful ones.

Patterns for Bridging the Gap

Command Query Responsibility Segregation (CQRS)

CQRS is a pattern that separates read and write operations into different models. The write model captures state changes as events, while the read model provides optimized views of the current state for querying.

In the context of integrating with stateful systems, CQRS allows us to keep the complexity of event sourcing contained within the write model. The read model can act as a simple, queryable facade over the event stream, providing the stateful view that other systems expect.

Event-Driven Architecture

In an event-driven architecture, components communicate via asynchronous event notifications rather than direct method invocations. This decouples systems and allows them to evolve independently.

For our purposes, we can use events to trigger updates in downstream stateful systems. When a relevant event occurs in the event-sourced system, it publishes a notification. Subscribers in the stateful system receive this notification and update their own state accordingly. This way, the stateful system doesn‘t need direct access to the event log.

Change Data Capture

Change Data Capture (CDC) is a technique for identifying changes in a data source and propagating those changes to a target system. It‘s commonly used for replicating data between databases.

We can apply CDC to capture state changes in an event-sourced system and pipe them to a stateful system. This could be done by tailing the event log and transforming each event into an equivalent state update. Or, if the event-sourced system also maintains a queryable view of state (e.g. for read-side CQRS), we can capture changes in that view and sync them over.

The Skeptical Subscriber Pattern

The Skeptical Subscriber pattern, as described in the example, takes a trust-but-verify approach to consuming events from an unreliable event stream. Rather than blindly accepting every event as gospel, the subscriber treats them as hints that may indicate a relevant state change.

Upon receiving an event, the subscriber queries the event-sourced system for the current state of the affected entity or aggregate. It then compares that state to the last known state it had. If there‘s a meaningful difference (from the subscriber‘s point of view), the subscriber updates its own state and initiates any relevant business processes. If not, it disregards the event as a false alarm.

This pattern decouples the subscriber from the quirks of the event stream. Missed or out-of-order events aren‘t a problem, because the subscriber always checks the authoritative state from the source. It also gives the subscriber the ability to decide for itself what state changes are relevant, rather than having to process every event.

Implementing the Skeptical Subscriber Pattern

Let‘s dive a bit deeper into what it takes to implement the Skeptical Subscriber pattern. There are a few key components:

Event subscription: The subscriber needs some way to be notified of potentially relevant events. This could be a pub/sub message queue, a webhook callback, a polling mechanism, or any other asynchronous integration technique.

State gateway: To check the current state in the event-sourced system, the subscriber needs a way to query that state. This is the role of the state gateway. It provides a simple API for fetching entity or aggregate state by ID. Depending on the implementation of the event-sourced system, this could be a direct query against the event log, a call to a CQRS read model, or an API provided by the event-sourced system itself.

Domain comparison logic: When the subscriber receives an event and fetches the corresponding current state, it needs to compare that state to its own last known state. This comparison is domain-specific. The subscriber needs to understand what state changes are meaningful from its own perspective, and disregard any irrelevant differences.

State update and process initiation: If the domain comparison detects a relevant state change, the subscriber needs to update its own state to match. This could involve updating a local database, cache, or in-memory representation. The subscriber may also need to initiate its own business processes in response to the state change.

With these components in place, the flow becomes:

  1. Receive an event notification
  2. Fetch the current state of the affected entity/aggregate from the event-sourced system via the state gateway
  3. Compare that state to the last known local state using the domain comparison logic
  4. If relevant changes are detected, update local state and initiate business processes

If no relevant changes are detected, disregard the event and do nothing.

Handling Edge Cases

The Skeptical Subscriber pattern helps us deal with a number of common edge cases when bridging stateful and event-sourced systems:

False positives: Not every event in the stream will necessarily indicate a state change that‘s relevant to the subscriber. By always checking the current state, the subscriber can safely ignore any events that don‘t result in meaningful changes.

Missed events: If an event gets dropped or missed for some reason, it‘s not a big deal. The next time the subscriber receives an event for that entity or aggregate, it will fetch the latest state and catch up on any missed changes.

Out-of-order events: The order of events doesn‘t matter to the subscriber, because it‘s always checking against the current state. Even if events get delivered out of sequence, the subscriber will eventually converge on the correct state.

Corrupt or invalid events: If an event is misconfigured or contains bad data, the subscriber‘s domain comparison logic acts as a safeguard. It will only update local state if the fetched state passes the validity checks.

Of course, the Skeptical Subscriber pattern isn‘t a silver bullet. It does add some overhead and complexity. And it relies on the event-sourced system providing a reliable way to query current state. But in many cases, the benefits of decoupling and robustness outweigh the costs.

Best Practices and Considerations

As you embark on integrating stateful and event-sourced systems, there are a few best practices and considerations to keep in mind:

Choose the right tool for the job: Event sourcing isn‘t the right fit for every scenario. If you have simple, stable domain models that fit well in a traditional database, there‘s no shame in sticking with a stateful approach. Conversely, if you need to maintain a full audit log, or if your domain is highly dynamic and event-driven, event sourcing can be a powerful tool in your toolbelt. The key is to understand the tradeoffs and choose the approach that best fits your needs.

Evolve incrementally: You don‘t need to go all-in on event sourcing from day one. Start small, perhaps by introducing it in a limited subdomain or bounded context. Gradually expand outward as you gain confidence and see the benefits. This incremental approach reduces risk and allows you to learn as you go.

Leverage polyglot persistence: Just because you‘re using event sourcing doesn‘t mean you can‘t also use traditional databases. In fact, using different persistence technologies for different parts of your system is often a good idea. You might use event sourcing for the core domain, but keep reference data in a relational database. Or use a document database for your CQRS read models. Choose the right storage for each job.

Design for resilience: When integrating disparate systems, there are many ways things can go wrong. Networks can fail, messages can get lost, data can be corrupted. Design your integrations to be resilient in the face of these failures. Use asynchronous messaging to decouple systems. Build in retry and failover mechanisms. And always assume that anything that can go wrong, will go wrong.

Real-World Example: Integrating an ERP System with Microservices

Let‘s look at a real-world example of how these patterns can be applied. Imagine a large enterprise that has a core ERP system managing financials, inventory, and customer orders. This ERP system is the backbone of the business, but it‘s also a monolith that‘s difficult to change.

Now imagine that the enterprise wants to start breaking down that monolith into smaller, more manageable microservices. They want to be able to iterate quickly on new features and integrate with modern SaaS applications. But they can‘t afford to rip and replace the ERP system overnight.

One approach is to start carving off individual subdomains into event-sourced microservices. For example, they might extract the inventory management subdomain into its own service. This service would capture all inventory-related events – receipts, shipments, adjustments, etc. It would maintain its own event log as the source of truth for inventory state.

But the ERP system still needs access to this inventory data for things like financial reporting and order promising. So we need to bridge the gap between the event-sourced inventory service and the stateful ERP system.

We can apply the Skeptical Subscriber pattern here. The ERP system subscribes to inventory-related events from the microservice. When it receives an event, it calls back to the microservice to fetch the current inventory state for the affected products. It compares this state to its own last known state. If there are relevant differences (e.g. the on-hand quantity has changed), it updates its own inventory tables and initiates any downstream processes that depend on inventory (like reordering or deallocating).

Over time, more and more subdomains can be carved out into event-sourced microservices. The ERP system gradually becomes less of a monolith and more of a subscriber and orchestrator of data from the microservices. This allows the organization to modernize incrementally, without a big bang rewrite.

Conclusion

Bridging stateful and event-sourced systems is a common challenge in modern software architectures. The two approaches to state management are fundamentally different, which can lead to consistency, reliability, and performance issues when trying to integrate them.

Patterns like CQRS, Event-Driven Architecture, Change Data Capture, and the Skeptical Subscriber provide tools for managing these challenges. By understanding and applying these patterns, we can get the benefits of both stateful and event-sourced paradigms in our systems.

The key is to be pragmatic and incremental in adoption. Start small, choose the right tool for each job, and design for resilience. Over time, you can evolve your architecture towards the best of both worlds – the simplicity and queryability of stateful systems, and the flexibility and auditability of event sourcing.

The future of software is in composing heterogeneous systems and making them work seamlessly together. Mastering the art of bridging stateful and event-sourced systems is a key skill in that future. May these patterns and principles guide you on your journey.

Similar Posts