Introducing Bounded Contexts Into Legacy Code

How I carved bounded contexts out of a large legacy codebase using bubble contexts, an anti-corruption layer, and a strangler fig migration with event-driven integration.

A Tuesday in November, at the London digital product agency I led at. Rails monolith, eight years old, referring to a customer as a “client”, a “user”, and a “subscriber” depending on which file you opened. Someone suggested we just rewrite the whole thing. I said no. We did bounded contexts instead.

How do you carve a context out of a system that’s been running for years and isn’t going to stop running while you fix it. Slowly, and behind a wall.

Start with a bubble, not a rewrite

The trick I’ve seen actually ship is a bubble context. Don’t migrate the legacy code. Don’t even touch it, at first. Stand up a clean, well-modeled context next to it, with its own database and deployable. Then put a wall between them.

The wall is an anti-corruption layer. ACL. Sounds defensive but it’s just a translator. Legacy thinks in users and subscriptions and a bag of nullable columns. The new context thinks in Subscriber and Subscription with proper aggregates. The ACL turns one into the other so the legacy concepts don’t leak.

// contexts/billing/acl/legacy-user.adapter.ts
import { LegacyUserApi } from "../../../legacy/clients/user-api";
import { Subscriber, SubscriberId, Email, PlanCode } from "../domain";

export class LegacyUserAdapter {
  constructor(private readonly legacy: LegacyUserApi) {}

  async loadSubscriber(id: SubscriberId): Promise<Subscriber> {
    const raw = await this.legacy.fetchUser(id.value);
    if (!raw) throw new Error(`legacy user not found: ${id.value}`);

    // legacy stores plan as a string column with 14 historical values.
    // the new context only knows 3. we collapse here, on purpose.
    const plan = PlanCode.fromLegacy(raw.subscription_tier);
    const email = Email.parse(raw.email_address);

    return Subscriber.rehydrate({
      id,
      email,
      plan,
      activeSince: raw.activated_at ? new Date(raw.activated_at) : null,
    });
  }
}

PlanCode.fromLegacy is doing a lot of work. Eight years of business decisions folded into three. You want all that mess in one file, not sprinkled across the new domain logic.

Strangler fig in real milestones

The other half is the strangler fig. Route at the edge, and the new context handles more traffic over time until the legacy code is dead branches. People talk about it like a single move, but it’s a sequence:

ACL in place, new context reads from legacy. No writes yet. The cheapest milestone, and where you find out if your domain model is right.
New context writes to its own database. Still talks back to legacy for the bits it doesn’t own. The dangerous one. Two sources of truth for a slice.
Legacy stops writing the slice the new context owns. Writes go to the new context only, and legacy reads through an ACL of its own. Symmetric.
Legacy code for that slice gets deleted. People forget this one. If you don’t delete it, you didn’t migrate, you doubled the surface area.

Most of the value is in milestone 1. The team learns the new domain by modeling it, the ACL forces you to be explicit, and you can stop there for months and still have shipped something useful.

Edge routing with nginx

Edge routing is the least interesting part. Header or path-based routing in nginx, an API gateway rule, sometimes a feature flag.

# /etc/nginx/conf.d/billing.conf
upstream legacy_app { server legacy.internal:3000; }
upstream billing_ctx { server billing.internal:4000; }

map $http_x_billing_context $billing_upstream {
  default        legacy_app;
  "v2"           billing_ctx;
}

server {
  listen 443 ssl http2;
  server_name api.example.com;

  location /v1/subscriptions {
    proxy_pass http://$billing_upstream;
    proxy_set_header X-Request-Id $request_id;
    proxy_read_timeout 10s;
  }
}

The X-Billing-Context: v2 header is set by an internal flag service. We can move a single customer, a tenant, or a percentage. Rollback is one config change.

The dual-write that ate a weekend

Milestone 2 is where I broke production. This was at the live-video creator platform I led engineering at a few years back. We had a billing context running alongside the legacy app, and for two weeks both sides were writing to the legacy subscriptions table while the new context also wrote to its own. Dual-write, shadow-read, then flip.

What went wrong. A Friday afternoon (yeah, I know), the new context shipped a small change to how it computed expires_at. Legacy rounded to end-of-day UTC. The new context rounded to end-of-day in the customer’s local timezone. For two weeks, every nightly reconciliation job had hidden the drift with a fuzzy match. Then a billing report ran on Saturday morning and a slice of subscriptions showed as expired in legacy but active in the new view. Support woke me up at 7:14 a.m. local.

First wrong fix. The on-call ran a script that “synced” the legacy expires_at from the new value. Wrong direction. Legacy had downstream jobs (webhooks, reminders) keyed off the old value, and overwriting it triggered incorrect renewal-failure emails to customers whose subscriptions were genuinely fine.

Real fix. Pulled the dual-write. Made the new context authoritative, gave legacy a read-only view through an ACL that respected the old rounding rule for two more weeks while we drained the downstream consumers. Then changed the rounding in one place. Cost about 22 hours of inconsistent billing state for a slice of customers, no charges affected. Lesson: dual-write is a coordination problem, not a code problem. If the two sides disagree on one rule, the drift finds a way out.

Events, not synchronous calls

Once the new context owns writes, you almost never want it calling back into legacy synchronously. Legacy is the most fragile thing in the system, and coupling the new context’s latency to it makes the new context fragile too. Use events. The new context emits domain events. A thin integration layer translates each into a legacy-shaped integration event and feeds it into a queue legacy already understands.

// contexts/billing/integration/subscription-activated.publisher.ts
import { Kafka } from "kafkajs";
import { SubscriptionActivated } from "../domain/events";

const producer = new Kafka({
  clientId: "billing-ctx",
  brokers: process.env.KAFKA_BROKERS!.split(","),
}).producer({ idempotent: true, maxInFlightRequests: 1 });

export async function publishSubscriptionActivated(evt: SubscriptionActivated) {
  await producer.send({
    topic: "billing.subscription.activated.v1",
    messages: [
      {
        // partition by subscriber so legacy consumers see events in order per user
        key: evt.subscriberId.value,
        value: JSON.stringify({
          subscriber_id: evt.subscriberId.value,
          plan_code: evt.plan.code,
          activated_at: evt.activatedAt.toISOString(),
          schema_version: 1,
        }),
        headers: { "x-event-id": evt.eventId, "x-source": "billing-ctx" },
      },
    ],
  });
}

The domain event is internal to the new context. The integration event is the public contract. Not the same thing, and the schema version on the wire is what lets you evolve them independently. Idempotent producer and key partitioning are not optional once legacy is consuming.

When the ACL became a god object

Different engagement, same pattern. The combat-sports tournament platform I CTO’d in London. We were carving a Ratings context out of a monolith with a sprawling User model. The ACL started as a small adapter. Eighteen months later it was 2,400 lines and three engineers were afraid to touch it.

What went wrong. Every new field legacy added, we extended the ACL. Every edge case, a new private method. We never asked whether the legacy concept the ACL was translating should still exist in the new model.

First wrong fix. We split the ACL by sub-domain and ended up with three smaller god objects sharing helpers and importing the same gnarly legacy types.

Real fix. Pulled apart what was actually two patterns. Some methods were translations (legacy thinks X, new context thinks Y). Some were business decisions dressed up as translations (“we treat tier 4 customers as tier 3 for now”). Moved the second kind into the new context’s domain layer. Hard rule after that: the ACL can rename, reshape, and validate, but it cannot make a business decision. Four weeks of refactor, no production impact. An ACL has a maximum size in your head before it stops being a translator and starts being a second domain model. Watch for it.

Takeaways

Don’t rewrite the monolith. Stand up a bubble context with its own database and language, and a thin ACL between them.
Name the strangler fig milestones up front. ACL reads, new-side writes, legacy reads through ACL, delete the old code. The last one is the one people skip.
Dual-write is a coordination problem. Pick an authoritative side as early as you can, and make reconciliation fail loudly, not fuzzily.
Integrate back to legacy with events, not synchronous calls. Domain events are internal, integration events are the public contract.
An ACL that’s making business decisions is no longer an ACL. Move those decisions into the new context’s domain layer.

Thanks for reading. If you’ve got thoughts, send them my way.