Testing Strategies for Microservices

Why the testing pyramid flips toward integration in distributed systems, and how I use Pact, TestContainers, and chaos drills to stop production from teaching me lessons.

It was a Saturday afternoon at the combat-sports tournament platform I CTO’d in London. Live broadcast. The standings page froze at 14:32 local. Our unit test coverage was somewhere north of 85% across the consumer service that fed standings. Every test was green that morning. The thing that took us down was a single pod with a max.poll.interval.ms of 60s instead of 300s, talking to a downstream rules service that occasionally took 70s. No unit test on earth catches that. None of them ever will.

That afternoon is why I don’t trust the classic testing pyramid in a microservices world. The pyramid says lots of unit tests, fewer integration tests, a tiny bit of end-to-end. In a monolith that’s mostly right. Across hundreds of microservices, the shape inverts. Unit tests still matter, but the bugs that wake me up at 2 a.m. live in the seams. Contract drift. Broker config. Idempotency on a retry. Timeouts. Replica lag. None of these are unit-testable in any meaningful way.

So here’s the position. In a distributed system, integration is the load-bearing layer of the test suite, not the middle. Unit tests stop you from shipping garbage. Integration tests stop you from taking down production.

What changes in a distributed system

A unit test in a microservice tells you the function does the thing. Cool. The function is maybe 5% of what could go wrong. The other 95% is the wire between services, the broker in the middle, the database the function thinks it owns, the consumer on the other side that’s two versions behind your producer.

Three things start mattering more than they did in your monolith.

First, contracts between services. Your producer and consumer evolve independently. They will drift. The only question is whether you find out in CI or in a Slack thread on a Saturday.

Second, real infrastructure in tests. Mocking your message broker is approximately useless. Real Kafka, real Postgres, real Redis. Spin them up per test run, throw them away.

Third, failure modes. Network partitions, slow downstreams, retries gone wrong, partial writes. Production will hand you all of these eventually. A test suite that never exercises them is just optimism in a CI pipeline.

Consumer-driven contracts with Pact

The first thing I add to any new microservice these days is Pact. Specifically the consumer-driven flavor, where the consumer writes the expectation and the producer verifies against it.

Why this way around. If the producer owns the contract, the producer ships breaking changes and the consumer finds out at runtime. If the consumer owns the contract, the producer’s CI fails the second it breaks the shape the consumer depends on. The blast radius lands in the right team’s PR, not in production.

A consumer-side test in NestJS looks roughly like this.

import { PactV3, MatchersV3 } from '@pact-foundation/pact';
import path from 'node:path';
import { TournamentsClient } from '../src/tournaments/tournaments.client';

const provider = new PactV3({
  consumer: 'standings-projector',
  provider: 'tournaments-api',
  dir: path.resolve(process.cwd(), 'pacts'),
});

describe('tournaments-api contract', () => {
  it('returns a tournament with rankings shape we depend on', async () => {
    provider
      .given('tournament 42 exists with three ranked athletes')
      .uponReceiving('a request for tournament 42')
      .withRequest({
        method: 'GET',
        path: '/tournaments/42',
        headers: { Authorization: MatchersV3.like('Bearer abc') },
      })
      .willRespondWith({
        status: 200,
        body: {
          id: MatchersV3.integer(42),
          name: MatchersV3.string('Spring Open'),
          rankings: MatchersV3.eachLike(
            { athleteId: MatchersV3.integer(), points: MatchersV3.integer() },
            { min: 1 },
          ),
        },
      });

    await provider.executeTest(async (mock) => {
      const client = new TournamentsClient(mock.url);
      const t = await client.getTournament(42);
      expect(t.rankings.length).toBeGreaterThan(0);
    });
  });
});

That generates a pact file. The producer’s CI pulls it from a broker (we self-host pact-broker in a small ECS task) and verifies against its own implementation. If the producer drops rankings or renames athleteId to athlete_id, their pipeline goes red and nothing ships. The consumer team didn’t have to know the change was coming. That’s the point.

TestContainers for real infra

Mocked brokers lie. I learned this the third time a Kafka consumer test passed against a mock and then deadlocked in staging against a real broker. So integration tests run real infra now. TestContainers in Node makes this stupidly easy.

import { KafkaContainer, StartedKafkaContainer } from '@testcontainers/kafka';
import { PostgreSqlContainer, StartedPostgreSqlContainer } from '@testcontainers/postgresql';
import { Kafka, Producer, Consumer } from 'kafkajs';
import { StandingsProjector } from '../src/standings/standings.projector';

jest.setTimeout(120_000);

describe('StandingsProjector against real Kafka and Postgres', () => {
  let kafka: StartedKafkaContainer;
  let pg: StartedPostgreSqlContainer;
  let producer: Producer;
  let projector: StandingsProjector;

  beforeAll(async () => {
    [kafka, pg] = await Promise.all([
      new KafkaContainer().withExposedPorts(9093).start(),
      new PostgreSqlContainer().start(),
    ]);

    const client = new Kafka({ brokers: [`${kafka.getHost()}:${kafka.getMappedPort(9093)}`] });
    producer = client.producer();
    await producer.connect();

    projector = await StandingsProjector.boot({
      kafkaBrokers: [`${kafka.getHost()}:${kafka.getMappedPort(9093)}`],
      databaseUrl: pg.getConnectionUri(),
    });
  });

  afterAll(async () => {
    await producer.disconnect();
    await projector.shutdown();
    await Promise.all([kafka.stop(), pg.stop()]);
  });

  it('projects a match-completed event into standings within 2s', async () => {
    await producer.send({
      topic: 'match-events',
      messages: [{ key: 'm-1', value: JSON.stringify({ type: 'completed', winnerId: 7 }) }],
    });

    await expect(
      projector.waitForAthletePoints(7, { minPoints: 1, timeoutMs: 2_000 }),
    ).resolves.toBeGreaterThan(0);
  });
});

That test does about ten things a mock can never do. It exercises the real serializer. It hits real consumer-group semantics. It runs an actual SQL transaction. It surfaces config bugs, the kind I was just talking about with max.poll.interval.ms. The test takes 90 seconds to spin up. Worth every one of them.

The companion docker-compose.test.yml for local development looks similar, sharing the same image versions so dev and CI match.

services:
  kafka:
    image: confluentinc/cp-kafka:7.6.1
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
    ports:
      - '9092:9092'
  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: postgres
    ports:
      - '5432:5432'

Chaos testing earns its keep in production

Here’s the part where I lose half the room. I run chaos drills against real production. Not staging. Staging lies about steady-state load.

A few years back, at the real-time trading platform I architected, market open on the Tuesday after a long bank holiday weekend went sideways at 09:31:14. WebSocket reconnect storm. Every gateway pod pinned at 100% CPU within 90 seconds. My first move was to scale pods up 3x. New pods walked straight into the storm and died inside 20 seconds. I was feeding the fire. The actual fix lived on the client, jittered exponential backoff pushed via a remote-config channel, plus a tight per-IP connection rate limit at the nginx layer. 14 minutes of degraded tick delivery on one of the most-watched 15-minute windows of the trading week.

That outage taught me what no test had ever taught me. Autoscaling is not a fix for a self-amplifying client-side bug. I would have learned that lesson cheaper if I’d ever drilled it.

So now I run chaos drills. Kill a pod during business hours. Inject latency into a downstream dependency. Drop a Kafka broker. Watch what your retries do. The drills go through a runbook and a tight blast radius, but they’re against real traffic, because steady-state load is the only honest test bed for retries and timeouts.

A simple latency-injection sidecar config, run for 15 minutes against one canary pod.

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: standings-downstream-latency
spec:
  action: delay
  mode: one
  selector:
    namespaces: [tournaments]
    labelSelectors:
      'app': 'standings-projector'
      'canary': 'true'
  delay:
    latency: '800ms'
    jitter: '200ms'
  duration: '15m'
  target:
    mode: all
    selector:
      namespaces: [tournaments]
      labelSelectors: { 'app': 'rules-service' }

If your circuit breaker doesn’t open. If your timeouts cascade. If your p99 doesn’t recover within seconds of the drill ending. You just found a bug you didn’t have to find at 2 a.m.

What I’d actually run in CI

A few simple rules. Unit tests run on every commit. Pact contract tests run on every PR, both consumer and provider sides, and fail the merge if the broker says the contract drifted. TestContainers integration tests run on PRs that touch the consumer or the data layer. Chaos drills don’t run in CI at all. They run on a schedule, against production, with a human watching, behind a feature flag, with an instant kill switch.

End-to-end tests across more than three services I mostly don’t write. They’re flaky and they tell you something broke without telling you what. A solid Pact suite plus TestContainers covers the same surface with better signal.

Takeaways

The pyramid inverts. Integration carries the load in microservices, not unit.
Consumer-driven contracts catch breakages in the producer’s CI, where they belong.
Real brokers and real databases in integration tests. Mocks lie about everything that matters.
Chaos drills run in production, with a runbook and a kill switch. Staging isn’t honest enough.
End-to-end across many services is mostly a flaky way to learn things you should already know.

Thanks for reading. If you’ve got thoughts, send them my way.