Refactoring Toward a Rich Domain Model

A grounded, step-by-step refactor from anemic transaction scripts to DDD aggregates, with value objects, behavior pushdown, and domain events kept honest by characterization tests.

At the London product agency I led engineering at, we inherited a Laravel project that ran the boutique fitness product. The booking flow was one fat service method. About 380 lines. Validated inputs, talked to the database, hit Stripe, posted to Slack, sent the email, did the analytics ping, returned an array. I read it three times before I gave up trying to hold the whole thing in my head.

That’s the kind of code I want to talk about. Not in the abstract. The actual move from one of those god-methods toward an aggregate with behavior. I’ve done this maybe a dozen times by now, across the portfolio at the agency and on smaller legacy projects since, and there’s a sequence that works. Skip a step and you’ll wreck something on the way.

Start with characterization tests

The first thing I do, before touching any class, is pin the current behavior in tests. Not the behavior the team thinks the code has. The behavior it actually has. The funny thing about old transaction scripts is they’re full of small accidental rules that someone depended on three years ago.

// booking-service.characterization.spec.ts
import { BookingService } from "./booking-service";
import { buildFakeDb, buildFakeStripe } from "./test-doubles";

describe("BookingService current behavior", () => {
  it("books an open slot and charges the card", async () => {
    const db = buildFakeDb({ session: { capacity: 10, booked: 4 } });
    const stripe = buildFakeStripe();
    const svc = new BookingService(db, stripe);

    const result = await svc.book({ sessionId: "s1", memberId: "m1" });

    expect(result.status).toBe("confirmed");
    expect(stripe.charges).toHaveLength(1);
    expect(db.bookings).toHaveLength(1);
  });

  it("returns 'waitlisted' when full but never throws", async () => {
    const db = buildFakeDb({ session: { capacity: 10, booked: 10 } });
    const svc = new BookingService(db, buildFakeStripe());

    const result = await svc.book({ sessionId: "s1", memberId: "m1" });

    expect(result.status).toBe("waitlisted");
  });
});

These tests are ugly on purpose. They don’t describe the domain. They describe the current code. Once they’re green, I have a net. I can move under them without breaking what the rest of the system relies on.

Extract value objects first

The cheapest, safest move is value objects. Every transaction script is full of primitives standing in for domain ideas. A string that’s really a MemberId. A number that’s really a Money. Two Date fields that are really a TimeWindow.

// scheduling/domain/time-window.ts
export class TimeWindow {
  private constructor(readonly startsAt: Date, readonly endsAt: Date) {}

  static create(startsAt: Date, endsAt: Date): TimeWindow {
    if (endsAt <= startsAt) {
      throw new Error("TimeWindow: end must be after start");
    }
    return new TimeWindow(startsAt, endsAt);
  }

  overlaps(other: TimeWindow): boolean {
    return this.startsAt < other.endsAt && other.startsAt < this.endsAt;
  }
}

The trick is to pull these out one at a time and let the type checker carry you. Replace one primitive in one signature, follow the cascade, commit. Do not refactor “everything that looks like money” in one PR. I tried that once on a healthcare professional portal we shipped at the agency. Lost a week. The PR got too big to review and we ended up reverting the parts that touched billing because nobody could prove they were safe.

Push behavior down to where the data lives

Once the value objects are in, the next step is moving behavior off the service and onto the thing that owns the data. Anemic models are easy to spot. The entity has only getters and setters. The service has all the verbs. Reverse it.

// scheduling/domain/class-session.ts
import { TimeWindow } from "./time-window";
import { MemberId } from "./member-id";
import { BookingDeclined } from "./booking-declined";

export class ClassSession {
  private constructor(
    readonly id: string,
    readonly window: TimeWindow,
    private capacity: number,
    private bookings: MemberId[],
  ) {}

  book(memberId: MemberId, now: Date): void {
    if (this.window.endsAt <= now) {
      throw new BookingDeclined(this.id, "session_already_ended");
    }
    if (this.bookings.some(b => b.equals(memberId))) {
      throw new BookingDeclined(this.id, "already_booked");
    }
    if (this.bookings.length >= this.capacity) {
      throw new BookingDeclined(this.id, "full");
    }
    this.bookings.push(memberId);
  }
}

Now the service stops asking the session questions and telling it what to do. It just calls book and handles the outcome. The invariants live where they belong. There’s no version of this where capacity is enforced in a controller and also a Sidekiq job and also a database constraint, all subtly different.

Find the aggregate boundary the hard way

This is the step everyone wants to skip and shouldn’t. The aggregate is whatever set of objects needs to be loaded, mutated, and persisted as one consistent unit. Not what looks tidy on a class diagram. What the invariants actually need.

For the booking flow, the aggregate was ClassSession. Members and studios lived outside, referenced by id. The waitlist almost ended up inside the session, until I traced the actual workflow. The waitlist gets promoted by a background job that doesn’t care about the session’s transactional consistency. It’s a separate aggregate. Keeping it outside saved us from a nasty optimistic locking storm later.

I learned this the painful way at the federation platform where I was acting CTO. We had a Tournament aggregate that quietly grew to include matches, fighters, rankings, and a rules snapshot. Loading it for a single match update meant pulling the entire object graph, which was fine until a Saturday afternoon live broadcast. Around 14:32 local, the standings page froze for about 12 minutes. The actual root cause was a Kafka consumer rebalance loop on the standings projector, traced to one pod running a stale image with a wrong max.poll.interval.ms. But the giant Tournament aggregate didn’t help. Every match update was loading half the tournament into memory. We split it later. Match, ranking, and tournament-config became three aggregates referencing each other by id. Updates got cheap. The lock contention went away.

Domain events at the seam

Once the aggregate has behavior, the side effects that used to live in the service, the email, the Slack post, the analytics ping, can move out behind domain events. The aggregate records what happened. Something else decides what to do about it.

class ClassSession {
  private events: DomainEvent[] = [];

  book(memberId: MemberId, now: Date): void {
    // ...invariants as above...
    this.bookings.push(memberId);
    this.events.push({
      type: "MemberBooked",
      sessionId: this.id,
      memberId: memberId.value,
      occurredAt: now.toISOString(),
    });
  }

  pullEvents(): DomainEvent[] {
    const out = this.events;
    this.events = [];
    return out;
  }
}

The repository flushes those events after the transaction commits. Not before. Dispatching before commit produces ghost events during retries. Moving dispatch to an after-commit hook removes a whole class of nasty bugs.

Takeaways

Pin behavior with characterization tests before you refactor.
Extract value objects first, one at a time, let the types cascade.
Push verbs onto the entity that owns the data, not onto services.
Aggregate boundaries follow invariants, not class diagrams.
Domain events fire after commit, never inside the transaction.
Most teams adopt this partially, and that’s fine. Pragmatism beats purity.

Thanks for reading. If you’ve got thoughts, send them my way.