Data Consistency in Microservices

Why I default to a transactional outbox plus idempotent consumers across services, with TypeScript code, Debezium CDC, and two production scars that set the rule.

It was a Thursday at the combat-sports tournament platform I CTO’d in London, and I was standing at a whiteboard arguing for the third time that month about how a service should “just publish an event” after it wrote to its database. Two engineers on my team had already shipped the naive version. Write to Postgres. Publish to Kafka. Move on. We had hundreds of microservices in flight by then and Kafka was our async backbone. The naive version is the reason I now have strong opinions about data consistency. Specifically, the kind of opinions you only get after watching a payment event silently disappear while the row that triggered it sits happily in the writer.

I’ll cut to the position. For any cross-service workflow that has to survive a crash, use a transactional outbox plus idempotent consumers. Don’t reach for two-phase commit. Don’t default to a saga before you have the outbox in place. The outbox is the boring primitive. Sagas live on top of it. 2PC doesn’t live anywhere I want to be on-call for.

The dual-write problem

The trap is this. You want to write a row and emit an event. You probably write the row in a Postgres transaction. Then you publish to Kafka. If the publish fails after the commit, your database has the row and the rest of the world doesn’t. If the publish succeeds and the commit then fails, the rest of the world acts on a row that doesn’t exist. Either direction is a silent inconsistency. Retrying doesn’t save you. The two writes are not in the same atomic boundary, and no amount of try/catch makes them one.

I watched the cleaner version of this play out on a Saturday at the federation platform. A live tournament was being broadcast. The match-events topic kept growing on the broker, but standings updates stopped reaching the public leaderboard. The page froze at 14:32 local time. We thought it was a consumer rebalance loop, and it was, but underneath there was an older bug: the match-results service was committing rows and then trying to publish, and a chunk of events had been quietly dropped over the previous week because of a transient broker hiccup nobody had paged on. Rebalances made the symptom visible. The root cause was a dual-write nobody had owned.

The outbox in plain code

The trick is to put the event in the same transaction as the row. You write to an outbox_events table. A separate process drains that table to the broker. Now you have one atomic write, and the publish is an eventually-consistent followup.

import { Inject, Injectable } from '@nestjs/common';
import { DataSource } from 'typeorm';
import { randomUUID } from 'crypto';

type OutboxRow = {
  id: string;
  aggregate_type: string;
  aggregate_id: string;
  event_type: string;
  payload: Record<string, unknown>;
  created_at: Date;
};

@Injectable()
export class OrdersService {
  constructor(@Inject('PG') private readonly ds: DataSource) {}

  async placeOrder(input: PlaceOrderDto) {
    return this.ds.transaction(async (tx) => {
      const order = await tx.query(
        `insert into orders (id, customer_id, total_cents, status)
         values ($1, $2, $3, 'pending') returning *`,
        [input.id, input.customerId, input.totalCents],
      );

      const event: OutboxRow = {
        id: randomUUID(),
        aggregate_type: 'order',
        aggregate_id: input.id,
        event_type: 'order.placed.v1',
        payload: { orderId: input.id, totalCents: input.totalCents },
        created_at: new Date(),
      };

      await tx.query(
        `insert into outbox_events
           (id, aggregate_type, aggregate_id, event_type, payload, created_at)
         values ($1,$2,$3,$4,$5,$6)`,
        [event.id, event.aggregate_type, event.aggregate_id,
         event.event_type, event.payload, event.created_at],
      );

      return order[0];
    });
  }
}

Notice what the service does not do. It does not call the Kafka producer. It does not even know Kafka exists. That decoupling is the whole point. If the transaction rolls back, the event vanishes with the row.

Draining the outbox

You have two reasonable ways to ship the rows out. A polling worker, or Debezium reading the Postgres write-ahead log. I have run both. For most teams that already operate Postgres, start with polling. It is one cron-shaped service and it gets you to correctness fast. Move to Debezium when polling latency or load becomes a real problem, not before.

The polling version is boring on purpose.

import { Kafka } from 'kafkajs';
import { setTimeout as sleep } from 'timers/promises';

const kafka = new Kafka({
  clientId: 'outbox-relay',
  brokers: process.env.KAFKA_BROKERS!.split(','),
});

const producer = kafka.producer({ idempotent: true, maxInFlightRequests: 5 });

export async function runOutboxRelay(ds: DataSource) {
  await producer.connect();
  while (true) {
    const batch = await ds.query(
      `select * from outbox_events
       where published_at is null
       order by created_at asc
       limit 200 for update skip locked`,
    );
    if (batch.length === 0) { await sleep(250); continue; }

    await producer.sendBatch({
      topicMessages: [{
        topic: 'orders',
        messages: batch.map((r: OutboxRow) => ({
          key: r.aggregate_id,
          value: JSON.stringify(r.payload),
          headers: {
            'event-id': r.id,
            'event-type': r.event_type,
            'trace-id': process.env.TRACE_ID ?? '',
          },
        })),
      }],
      acks: -1,
    });

    await ds.query(
      `update outbox_events set published_at = now()
       where id = any($1::uuid[])`,
      [batch.map((r: OutboxRow) => r.id)],
    );
  }
}

for update skip locked is the line that does the heavy lifting. Multiple relay pods can run safely. acks: -1 plus idempotent: true on the producer means once Kafka takes the batch, you can mark the rows published without worrying that the broker secretly lost half of them.

When polling latency starts to hurt, Debezium is the upgrade path. You point it at the same outbox table and let the WAL do the work.

name: orders-outbox
config:
  connector.class: io.debezium.connector.postgresql.PostgresConnector
  database.hostname: orders-writer.internal
  database.dbname: orders
  database.user: debezium
  slot.name: orders_outbox_slot
  publication.name: orders_outbox_pub
  table.include.list: public.outbox_events
  tombstones.on.delete: false
  transforms: outbox
  transforms.outbox.type: io.debezium.transforms.outbox.EventRouter
  transforms.outbox.route.by.field: aggregate_type
  transforms.outbox.route.topic.replacement: ${routedByValue}.events
  transforms.outbox.table.field.event.key: aggregate_id

The EventRouter transform turns each outbox row into a Kafka record on a topic chosen by aggregate_type. That gives you per-aggregate ordering for free, because the key is the aggregate id.

Idempotent consumers

The outbox guarantees at-least-once delivery. Which means duplicates. Which means the consumer side has to be idempotent or your downstream tables will lie. The simplest version is a dedupe table keyed on event-id, written in the same transaction as the side-effect.

@Injectable()
export class PaymentProjector {
  constructor(@Inject('PG') private readonly ds: DataSource) {}

  async handle(eventId: string, payload: OrderPlaced) {
    await this.ds.transaction(async (tx) => {
      const inserted = await tx.query(
        `insert into processed_events (event_id, consumer)
         values ($1, 'payment-projector')
         on conflict do nothing
         returning event_id`,
        [eventId],
      );
      if (inserted.length === 0) return;

      await tx.query(
        `insert into payments (order_id, amount_cents, status)
         values ($1, $2, 'authorized')
         on conflict (order_id) do nothing`,
        [payload.orderId, payload.totalCents],
      );
    });
  }
}

Notice the unique key on payments(order_id) too. Belt and braces. Anyone who tells you “we’ll just trust the broker” has never debugged a duplicate payment at midnight.

A scar from a different platform

A different story, same family of bug. The creator-economy platform I worked at had a Branded Mobile Apps pipeline. Apple’s server-to-server SubscriptionRenewal notifications hit our endpoint, our handler wrote to a creator_subscriptions table, and Apple retried aggressively if our response landed past their 30-second deadline. The handler had no idempotency check. Each retry created a new row. A creator opened a ticket: every one of their customers got charged twice and the app showed two active subscriptions. Apple had already pulled the cards. The first “fix” was a frontend patch that hid the duplicate rows. That bought us nothing because Apple did not refund anything just because we hid the row. The real fix was a database unique constraint on (apple_original_transaction_id, notification_uuid), a Sidekiq job that returned 200 OK in under five seconds while doing the work async, and a backfill script with the same idempotency key. The lesson is identical to the outbox case. Server-to-server notifications retry. Idempotency keys are not optional, they are the contract.

Outbox vs saga vs 2PC

Pick by what the workflow actually needs. The outbox is the data-plane primitive. It guarantees that what one service published reflects what it wrote. A saga is a workflow primitive. It sits on top of the outbox and coordinates multi-step distributed flows with compensations when a step fails. 2PC tries to do both atomically across services and pays for it with blocking locks, single-point coordinators, and operational pain that does not pencil out in any production system I have run. I have not used 2PC in over a decade, and I would not start now.

If you only need “this service’s writes show up on the bus”, outbox. If you need “create order, charge card, reserve inventory, ship, and roll all of it back if any step fails”, saga, backed by outbox at every step. If you think you need 2PC, redesign the boundary so you don’t.

Operational notes worth keeping

Watch outbox lag like you watch consumer lag. The metric I care about is “oldest unpublished row age in seconds”. If that crosses thirty seconds, page. Add a partial index on outbox_events(published_at) where published_at is null so the polling query stays cheap as the table grows. Archive published rows out to a cold table on a slow cron. Cap message size before the outbox row goes in. Carry trace ids through the headers so distributed traces follow the event across the broker.

And the bit that catches teams late. Make sure your Debezium replication slot has a retention policy. An orphaned slot will pin WAL on the writer and eat disk until pages start a fire.

Takeaways

Outbox plus idempotent consumers is the default. Everything else is a special case.
Treat the event as part of the row. One transaction. No dual-write.
Start with a polling relay. Move to Debezium when latency demands it, not before.
Server-to-server callbacks retry. Idempotency keys are the contract.
2PC is a trap. Redesign the boundary before you reach for it.
Alert on oldest-unpublished-row age. It catches more bugs than queue length.

Thanks for reading. If you’ve got thoughts, send them my way.