Modular Monolith with Bounded Contexts

How a portfolio-wide DDD migration at a London product agency settled on a modular monolith with strict namespace isolation, internal and published events, and a database-per-module discipline.

The whiteboard had three boxes on it. Orders, Billing, Catalog. I’d just told the room that we were not splitting these into services, and a backend lead I worked with at the agency squinted and asked, very politely, what the point of the meeting was then. Fair. The point was that we’d already paid the price of a tangled monolith on five projects in a row, and we weren’t going to pay it again. We were going to draw the seams inside the deploy unit and treat them like real walls.

This is the agency-portfolio era. The digital product agency I led engineering at had a long tail of legacy projects and one flagship SaaS I’d built end to end. After we closed a funding round I made the strategic call to push the whole portfolio toward DDD. Modular monolith was the shape we landed on for most of them. Hundreds of projects, small teams per product, no appetite for the operational tax of microservices on day one.

Why not microservices yet

A lot of teams reach for services because they want independent deploys. We didn’t. Most of these products had three to six engineers on them. Splitting into services would have meant five engineers spending their week on infra, not the domain. The modular monolith bought us the boundary discipline of services without the network, the eventual consistency tax, or the deploy choreography. We could promote a module to a service later, when it actually earned the move.

The honest reason I prefer modular monoliths for small teams: when you draw the boundaries inside a single deploy, you find out fast whether you drew them right. Move a method across modules in an afternoon and the test suite tells you if you cut wrong. Try doing that across two repos and a Kafka topic.

Namespaces as the wall

The wall was a folder. Each bounded context got its own top-level folder, its own public surface, and nothing else in the codebase could reach inside. We enforced it with a lint rule. No code outside orders/ could import from orders/internal/. Only orders/api.ts was importable from the outside.

// orders/api.ts
// the only file other modules can import from
export { OrdersFacade } from "./internal/application/orders-facade";
export type { CreateOrderInput, OrderSummary } from "./internal/application/dto";
export { OrderPlacedEvent } from "./public-events/order-placed";

// orders/internal/application/orders-facade.ts
import { OrderRepository } from "../domain/ports/order-repository";
import { Clock } from "../domain/ports/clock";
import { EventBus } from "../domain/ports/event-bus";
import { Order } from "../domain/order";
import { CreateOrderInput, OrderSummary } from "./dto";

export class OrdersFacade {
  constructor(
    private readonly repo: OrderRepository,
    private readonly clock: Clock,
    private readonly bus: EventBus,
  ) {}

  async create(input: CreateOrderInput): Promise<OrderSummary> {
    const order = Order.place(input, this.clock.now());
    await this.repo.save(order);
    await this.bus.publish(order.pullEvents());
    return OrderSummary.from(order);
  }
}

Outside callers only ever see the facade and the public events. The aggregate, the repository, the value objects, the event bus implementation, all of that lives inside internal/ and the lint rule keeps it that way. It sounds petty. It is not petty. The day a junior engineer reaches for orders/internal/domain/order.ts and the rule fails their PR, you have just saved yourself a year of slow rot.

Internal events versus published events

This is the part I see teams get wrong. There are two kinds of domain events in a modular monolith and they are not the same thing.

Internal events stay inside the module. The Order aggregate raises an OrderPlaced inside the orders boundary, and only handlers inside orders/ listen to it. These can change shape next sprint. They are an implementation detail.

Published events are a contract. When the orders module wants to tell billing that something happened, it emits a public-events/order-placed payload, with a versioned schema, and billing subscribes to that. That schema does not get changed without a migration plan, the same way you would treat an HTTP API. Tomorrow when orders becomes a service, those published events become Kafka topics. Same shape, same versioning, just a different transport.

// orders/public-events/order-placed.ts
import { z } from "zod";

export const OrderPlacedV1 = z.object({
  schema: z.literal("orders.order_placed.v1"),
  orderId: z.string().uuid(),
  customerId: z.string().uuid(),
  totalCents: z.number().int().nonnegative(),
  currency: z.string().length(3),
  placedAt: z.string().datetime(),
});

export type OrderPlaced = z.infer<typeof OrderPlacedV1>;

// billing/internal/handlers/on-order-placed.ts
import { OrderPlacedV1 } from "../../../orders/api";
import { InvoiceService } from "../application/invoice-service";

export function onOrderPlaced(invoices: InvoiceService) {
  return async (raw: unknown) => {
    const evt = OrderPlacedV1.parse(raw);
    await invoices.draftFor({
      orderId: evt.orderId,
      customerId: evt.customerId,
      totalCents: evt.totalCents,
      currency: evt.currency,
    });
  };
}

In-process, the bus is just a function call wrapped in a queue. When the time comes to move billing to its own service, the handler signature does not change. The bus implementation does.

Database ownership per module

The rule we held the hardest: each module owns its own tables, and nobody else writes to them. Reads through SQL across boundaries were also banned. If billing needs the customer’s currency, it gets it from a published event or from customers/api.ts, not from SELECT currency FROM customers WHERE id = ?.

Same Postgres database. Different schemas. Different migration folders. Different module owners. We tagged it with a Postgres role per module so even if someone wrote raw SQL by accident, the connection wouldn’t have grants to write across the wall.

-- orders schema
CREATE SCHEMA orders;
CREATE ROLE orders_app LOGIN PASSWORD :orders_password;
GRANT USAGE ON SCHEMA orders TO orders_app;
GRANT ALL ON ALL TABLES IN SCHEMA orders TO orders_app;
ALTER DEFAULT PRIVILEGES IN SCHEMA orders GRANT ALL ON TABLES TO orders_app;

REVOKE ALL ON SCHEMA orders FROM billing_app;
REVOKE ALL ON SCHEMA orders FROM catalog_app;

Two stories from this era, both honest.

First one. On the federation platform I CTO’d in London. Different shop, same lesson. We had a standings-projector consumer reading off Kafka, and one Saturday afternoon during a live broadcast it started rebalancing every 30 seconds. Page froze at 14:32 local. First fix was a rollout restart. The group rebalanced right back. Real cause was one pod out of six running a stale image with max.poll.interval.ms set to 60 instead of 300, getting kicked out repeatedly. Cordoned the bad pod, group settled in 90 seconds. Twelve minutes of stale standings on a publicly visible competition. The lesson was not about Kafka. It was about cross-module contracts. The slow downstream the pod called had no timeout budget on the consumer side, because the team thought of it as “just another method”. Once you publish, you have a network, even when the wire is in-process.

Second one. On the agency’s flagship SaaS, I once approved a migration to add a non-null column with a default on a hot table. The “safe” gem helper. Deploy went out late evening, the migration acquired an access exclusive lock, login error rate hit 100 percent for about 85 seconds during peak hours. The wrong fix I was halfway through typing was to roll the migration back. Rails does not have a clean rollback for that helper mid-flight. Right move was to let it finish, then split it the next day: add nullable, backfill in batches, then enforce not-null. The rule we added in CI after that: any add_column with a non-null default against a large table gets blocked. The modular-monolith angle: when each module owns its tables, you can enforce module-specific migration rules. The orders team’s hot tables are not the catalog team’s hot tables.

The path to services later

We promoted exactly one module to a service in the year after, on a different product. Messaging, because its load profile was nothing like the rest of the app. The migration took about three weeks. Published events were already Kafka-shaped, so the bus swap was easy. The hard part was the database: the module had its own schema and role, but lived in the shared cluster. Pulling it out meant a dump, a replay, and a switchover window. Everything else stayed put. One module promoted, the rest happy where they are.

Takeaways

Draw module boundaries inside the deploy first. Promote to a service only when load or team shape demands it.
Enforce the wall with a lint rule. A folder convention without a check fails by Q2.
Internal events stay internal. Published events are a contract with a version.
One module, one schema, one Postgres role. No cross-module SQL, ever.
Treat in-process bus calls like network calls. Timeouts, retries, idempotency, all of it.

Thanks for reading. If you’ve got thoughts, send them my way.