Saga Pattern: How I Solve Data Consistency in Microservices

Database tutorial - IT technology blog
Database tutorial - IT technology blog

The Monolith’s False Security

Moving from a single-database monolith to a 15-service microservice cluster felt like a massive upgrade until I hit the reality of data consistency. In a monolith, I relied on ACID transactions. I could wrap ten different database calls in one BEGIN/COMMIT block. If the server crashed mid-way, the database handled the rollback. Everything just worked.

In microservices, that safety net is gone. Your Order service, Payment service, and Inventory service likely use different databases—perhaps a mix of PostgreSQL and MongoDB. Attempting a global transaction across these nodes using Two-Phase Commit (2PC) usually results in a slow, brittle system that fails as soon as one network request lags. This is why I rely on the Saga pattern.

The Relay Race: Anatomy of a Saga

Think of a Saga as a relay race of local transactions. Each service performs its own work, updates its local database, and then signals the next service to take the baton. If a service fails—say, because a credit card is declined or a warehouse is out of stock—the Saga triggers a series of “compensating transactions” to undo the previous steps.

I generally choose between two implementation styles depending on the complexity of the workflow:

1. Choreography: The Decentralized Dance

For simple flows with 2 or 3 steps, I prefer choreography. There is no central boss. Each service emits an event, and others react to it. It’s lightweight but can quickly turn into “event spaghetti” if you aren’t careful.

  • Order Service: Persists a ‘Pending’ order and fires OrderCreated.
  • Payment Service: Sees the event, processes a $49.99 charge, and fires PaymentSuccessful.
  • Inventory Service: Allocates 1 unit of SKU-101 and fires InventoryReserved.

2. Orchestration: The Central Conductor

When a business process involves 5 or more services, I switch to an Orchestrator. This is a centralized state machine that explicitly tells each service what to do. It makes debugging much easier because the entire state of a $5,000 transaction is visible in one place.

# Orchestrator logic in Python
class OrderSagaOrchestrator:
    def execute(self, order_id, amount):
        try:
            # Step 1: Charge the user
            payment_ref = payment_api.charge(amount)
            # Step 2: Lock the items
            inventory_api.reserve(order_id)
            # Step 3: Finalize
            order_db.mark_as_paid(order_id)
        except Exception as error:
            self.rollback(order_id, payment_ref)

    def rollback(self, order_id, payment_ref):
        payment_api.refund(payment_ref)
        order_db.cancel(order_id)

The Secret Sauce: Compensating Transactions

The success path is easy. The failure path is where Sagas are won or lost. Unlike a SQL rollback, a compensation is a new transaction that logically reverses the previous one. If you already sent a confirmation SMS to a user, you cannot “un-send” it; you must send a second SMS explaining the cancellation.

Sagas follow the ACD principle. They provide Atomicity, Consistency, and Durability, but they lack Isolation. This means while a Saga is running, other services can see the intermediate “Pending” state. You must design your UI to handle this—for example, by showing a “Processing” spinner rather than a “Confirmed” checkmark immediately.

Designing the “Undo” Button

I ensure every API endpoint has a matching reversal strategy:

  • Action: ReserveStock (Subtract 5 units) -> Compensation: ReleaseStock (Add 5 units)
  • Action: ApplyDiscount -> Compensation: RemoveDiscount
  • Action: CreateShippingLabel -> Compensation: VoidShippingLabel

Managing mock data for these flows can be tedious. When I need to transform large CSV catalogs into JSON objects for testing my local microservices, I use toolcraft.app/en/tools/data/csv-to-json. It runs locally in the browser, which keeps sensitive test data off external servers and speeds up my dev loop.

Production Hardening: Idempotency and Reliability

In a distributed environment, network glitches mean your services will receive the same message twice. If your Inventory service processes a PaymentSuccessful event twice, you’ll accidentally deduct double the stock.

1. The Idempotency Key

I never process a transaction without a unique identifier (like a UUID). The service must check its database: “Have I already handled order_6789?” If yes, it ignores the duplicate and returns a cached success response.

2. The Transactional Outbox Pattern

Never update your database and then try to send a message to RabbitMQ in two separate steps. If the broker is down, your database will be out of sync with the rest of the world. Instead, I save the message to an outbox table within the same local transaction as the business data. A background worker then pushes those messages to the broker reliably.

Hard-Earned Lessons from the Trenches

After scaling Sagas for systems handling thousands of concurrent requests, here are my top takeaways:

  • Observability is Non-Negotiable: Attach a correlation_id to every log. If an order gets stuck for 120 seconds, you need to see exactly where the chain broke across five different service logs.
  • Keep Transactions Short: Because there is no isolation, long-running transactions increase the risk of race conditions. Aim for local transactions that finish in under 200ms.
  • Set Strict Timeouts: If the Payment gateway doesn’t respond within 10 seconds, don’t wait forever. Trigger the compensation flow automatically to release held inventory.
  • Avoid Circular Dependencies: In choreography, ensure Service A doesn’t wait for Service B, which is waiting for Service A. You’ll end up with a distributed deadlock.

Sagas are more complex than standard SQL transactions. However, they are the only way I’ve found to build a resilient, multi-database architecture that doesn’t suffer from data corruption. Start with a small 2-step flow, master your compensation logic, and always assume the network will fail.

Share: