Hexagonal Architecture in a DDD Monolith

How a portfolio-wide DDD migration at a London product agency landed on a hexagonal monolith, where driving and driven ports sit, and the folder layout that actually survived contact with a real team.

The first time I drew the hexagon on a whiteboard, somebody asked if we were about to rewrite everything in Java. Fair question. They’d been burned by an architect who showed up, drew shapes, and disappeared. I told them we were keeping the monolith, keeping Node, keeping PostgreSQL, and using the hexagon to stop domain code from knowing what an HTTP request was. Then we spent three weeks arguing about folder names.

This is the agency-portfolio era. The digital product agency I led engineering at had a stack of legacy projects and a flagship SaaS I’d built end to end. After we shipped that one and closed a funding round, I pushed the whole portfolio toward DDD. Hexagonal was the structural piece, the part that decided where framework code stopped and business code started.

Why a hexagon in a monolith

A lot of teams reach for hexagonal because they’re going microservices. We weren’t. Most of the portfolio was one Rails or NestJS monolith per product, and that’s what they were staying. The hexagon was about swapping the database for a fake in a test, swapping the HTTP entrypoint for a queue consumer when we needed to, and not having the domain care.

Honestly the trigger was tests. On older codebases every domain test pulled up a real Postgres connection and seeded six tables. A unit test took ~3 seconds to boot. Multiply by a few thousand tests and you get engineers who run them once before pushing and pray.

Driving versus driven, the part that matters

OK so the part people get backwards. Two kinds of ports.

Driving ports are what the outside world uses to call in. HTTP controllers, queue consumers, CLI commands, scheduled jobs. They drive the application. The adapter implements the protocol bit and calls in.

Driven ports are the other direction. The domain needs to load a customer, write an event, charge a card. It declares an interface for what it needs, infrastructure provides the adapter. Domain says “I need a CustomerRepository”. Postgres provides one. The test suite provides one that lives in a Map.

Same shape, opposite direction. Once that clicked the layout wrote itself.

// scheduling/domain/ports/booking-repository.ts
import { Booking, BookingId } from "../booking";
import { StudioId } from "../studio-id";

export interface BookingRepository {
  findById(id: BookingId): Promise<Booking | null>;
  findActiveForStudio(studio: StudioId, on: Date): Promise<Booking[]>;
  save(booking: Booking): Promise<void>;
}

// scheduling/domain/ports/clock.ts
export interface Clock {
  now(): Date;
}

// scheduling/domain/ports/event-publisher.ts
import { DomainEvent } from "../events";

export interface EventPublisher {
  publish(events: ReadonlyArray<DomainEvent>): Promise<void>;
}

Those interfaces live with the domain, written in the language of the domain, not the language of Postgres. Domain code only imports from this folder. Adapters live somewhere else.

// scheduling/infrastructure/persistence/booking-repository.postgres.ts
import { Pool } from "pg";
import { Booking, BookingId } from "../../domain/booking";
import { BookingRepository } from "../../domain/ports/booking-repository";
import { StudioId } from "../../domain/studio-id";
import { hydrate, dump } from "./booking-mapper";

export class PostgresBookingRepository implements BookingRepository {
  constructor(private readonly pool: Pool) {}

  async findById(id: BookingId): Promise<Booking | null> {
    const { rows } = await this.pool.query(
      "select * from bookings where id = $1",
      [id.value],
    );
    return rows[0] ? hydrate(rows[0]) : null;
  }

  async findActiveForStudio(studio: StudioId, on: Date): Promise<Booking[]> {
    const { rows } = await this.pool.query(
      `select * from bookings
       where studio_id = $1 and starts_at::date = $2::date and status <> 'cancelled'`,
      [studio.value, on],
    );
    return rows.map(hydrate);
  }

  async save(booking: Booking): Promise<void> {
    const row = dump(booking);
    await this.pool.query(
      `insert into bookings (id, studio_id, member_id, starts_at, status, version)
       values ($1, $2, $3, $4, $5, $6)
       on conflict (id) do update set
         starts_at = excluded.starts_at,
         status    = excluded.status,
         version   = excluded.version
       where bookings.version = excluded.version - 1`,
      [row.id, row.studio_id, row.member_id, row.starts_at, row.status, row.version],
    );
  }
}

Optimistic-lock dance lives in the adapter because it’s a Postgres concern, not a booking concern. The domain just calls save. If the version check fails the adapter throws and the application layer handles the conflict.

The folder layout that actually stuck

We tried three layouts. First was clever and confused everyone. Second nested so deep that imports ran five segments long. Third was boring and we kept it.

src/
  scheduling/
    domain/
      booking.ts
      booking-id.ts
      events.ts
      ports/
        booking-repository.ts
        clock.ts
        event-publisher.ts
    application/
      book-session.ts
      cancel-session.ts
    infrastructure/
      persistence/
        booking-repository.postgres.ts
        booking-mapper.ts
      http/
        booking-controller.ts
      events/
        event-publisher.kafka.ts
    test-doubles/
      booking-repository.in-memory.ts
      clock.fixed.ts
  shared/
    domain/
      ...

One folder per bounded context. domain is pure, application orchestrates use cases, infrastructure is the adapter side, test-doubles is the fakes. Rule everyone could repeat without checking: domain never imports from infrastructure. We had a lint rule for it. Catching that drift at PR time was worth more than any architecture diagram.

Adapter swaps make tests honest

Whole point of writing the domain against a port is you can hand it any adapter. In tests we hand it an in-memory one.

// scheduling/test-doubles/booking-repository.in-memory.ts
import { Booking, BookingId } from "../domain/booking";
import { BookingRepository } from "../domain/ports/booking-repository";
import { StudioId } from "../domain/studio-id";

export class InMemoryBookingRepository implements BookingRepository {
  private readonly store = new Map<string, Booking>();

  async findById(id: BookingId): Promise<Booking | null> {
    return this.store.get(id.value) ?? null;
  }

  async findActiveForStudio(studio: StudioId, on: Date): Promise<Booking[]> {
    return [...this.store.values()].filter(
      (b) =>
        b.studioId.equals(studio) &&
        b.startsAt.toDateString() === on.toDateString() &&
        b.status !== "cancelled",
    );
  }

  async save(booking: Booking): Promise<void> {
    this.store.set(booking.id.value, booking);
  }
}

Domain tests boot in milliseconds. They don’t touch Postgres, they don’t touch Kafka, they don’t care about transactions. We kept a smaller integration suite for the real adapters, but the bulk of testing ran against fakes. That’s the change engineers thanked me for months later, not the architecture posters.

A migration that broke things

A boutique fitness product we built was the first rollout. The old BookingService talked to Eloquent models, called the mailer, hit Stripe, and emitted a webhook, all in one method. We split it into a Booking aggregate, a BookSession use case, a handful of ports.

What went wrong: I let someone wire the EventPublisher adapter to publish synchronously inside the booking write’s transaction. Wednesday afternoon, we shipped it, Slack was on fire three hours later. A flaky downstream was rejecting events with a 500, the publish threw, the transaction rolled back, customers couldn’t book classes. p95 booking latency went from ~180 ms to about 6 seconds while the publisher retried.

First wrong fix: someone wrapped the publish in try/catch so the booking would commit even if events failed. Fixed bookings, silently dropped events. Membership balances stopped decrementing. We caught it next morning when a member booked the same class twice.

Real fix: outbox table. The repository wrote the booking and pending events in the same transaction. A separate worker drained the outbox and published. Domain didn’t change, port didn’t change, only the adapter and a worker. Driven ports look easy because they’re just interfaces. The hard part is the adapter gets to make decisions the domain cannot see, and “publish synchronously” is one of them.

A rebalance that taught me about ports

Different employer, same lesson.

At the combat-sports tournament platform I CTO’d in London, the standings projector consumed off Kafka. Live federation broadcast on a Saturday afternoon. Around the third bout, the consumer group started rebalancing every ~30 seconds. Standings page froze for about 12 minutes mid-broadcast.

What we tried first: rolling restart of the projector deployment. Pods came back cleanly. About 40 seconds later they kicked off another rebalance. We were just repeating what the consumer group was already doing to itself.

What actually worked: lined up logs from each pod. One of the six was running a stale image with max.poll.interval.ms at 60 seconds instead of 300. That pod’s handler made a slow downstream call that sometimes took ~70 seconds, longer than its poll interval, so it kept getting evicted and pulled everyone into a rebalance. Cordoned the bad pod, drained the storm in ~90 seconds, SHA-pinned every Kafka deployment over the weekend.

The handler had no business doing a slow downstream call inside the consumer loop. With a clean driving port and side-effects pushed behind driven ports, we’d have moved that call out on day one.

When the hexagon is overkill

This bites teams that go hexagon-everywhere. Three endpoints, no domain rules, no real invariants, you don’t need a hexagon. You need a controller, a query, a response. The portfolio had at least one context that was a read-only reporting surface on someone else’s data. We didn’t put ports around it. Wrote the SQL and moved on. Reach for the hexagon where there are real domain rules. Skip it where there aren’t.

Takeaways

Driving ports drive the app, driven ports are what the app drives. Both as interfaces in the domain folder.
Keep adapter concerns in the adapter. Optimistic locks, retry policy, outbox tables. Domain shouldn’t know.
Fakes in test-doubles/ make domain tests honest. Single-digit-millisecond unit tests change how the team writes code.
One context, one hexagon. Don’t wrap a reporting query in three interfaces.
Lint the import rule. domain cannot import from infrastructure. CI failure beats a code review comment.
First event-publishing adapter you write should be the outbox version. Synchronous publish inside the domain transaction is a foot-gun.

Thanks for reading. If you’ve got thoughts, send them my way.