Microservice Versioning Strategies

URL, header, and query-param API versioning compared, what counts as breaking, event upcasting, and consumer-driven contract tests.

It was a Wednesday at the combat-sports tournament platform I CTO’d in London. Hundreds of services, Kafka as the backbone. A small auth service shipped a “tiny refactor” renaming user_id to subject_id in its login response. Passed staging because everything there was on the same build. Production wasn’t. Within four minutes the rankings page, messaging service, and bracket page 401-ed for anyone logged in before the deploy. We rolled back. The postmortem went ninety minutes with zero finger-pointing because the rename wasn’t, on its own, a stupid change. We just had no rules about what counted as breaking.

That meeting is where the versioning policy was born. Not in an RFC. In a room of tired people explaining to a federation contact why his bracket page was empty.

Versioning is the social contract between you and every consumer you can’t reach on Slack. Get it wrong, you find out from PagerDuty.

What counts as a breaking change

Write this list down before picking a mechanism. Everything else is a footnote.

Breaking, no exceptions:

  • removing or renaming a response field or request parameter
  • changing the type of a field
  • changing the meaning of an enum value, or removing one
  • making an optional field required, or vice versa
  • tightening validation
  • changing the shape of an error response
  • changing default sort order or pagination behavior

Not breaking:

  • adding a new field, endpoint, or optional parameter with a safe default
  • loosening validation
  • adding a new enum value (only if consumers are tolerant readers)

That last one is where teams argue. I take the strict view. If a consumer treats unknown enum values as a hard error, that’s their bug, but it’ll still wake somebody up. Rather them than me. We publish enums as open sets and enforce tolerant readers in code review.

Pick one mechanism, ship it everywhere

Three options come up every time. URL path (/v1/orders). A custom header (X-API-Version: 2). Content negotiation via Accept. Some teams push a query param (?version=2). I’ve shipped all of them. I’ll plant my flag.

URL path for public HTTP APIs. Headers for internal service-to-service where you control both ends. Query params, never.

URL path is grep-able, testable from curl, CDN cache keys don’t lie about it. Headers are cleaner in theory and worse in practice for external consumers, every wrong call ends up untyped in a partner’s CI logs. Inside a mesh where every caller is your own service, a header is fine. Query params get stripped by proxies, end up in access logs, and the moment somebody adds a CDN it caches v1 and v2 together. Don’t.

A thin NestJS gateway controller with path-versioned routing:

import { Controller, Get, Param, Query, Headers, BadRequestException } from '@nestjs/common';
import { OrdersService } from './orders.service';
import { OrderResponseV1, OrderResponseV2 } from './dto/order-response';

@Controller({ path: 'orders', version: '1' })
export class OrdersV1Controller {
  constructor(private readonly orders: OrdersService) {}

  @Get(':id')
  async findOne(@Param('id') id: string): Promise<OrderResponseV1> {
    const order = await this.orders.findById(id);
    return OrderResponseV1.from(order);
  }
}

@Controller({ path: 'orders', version: '2' })
export class OrdersV2Controller {
  constructor(private readonly orders: OrdersService) {}

  @Get(':id')
  async findOne(@Param('id') id: string, @Headers('x-client-id') clientId?: string): Promise<OrderResponseV2> {
    if (!clientId) {
      throw new BadRequestException('x-client-id required for v2');
    }
    const order = await this.orders.findById(id);
    return OrderResponseV2.from(order, { includePayments: true });
  }
}

Two controllers, one service. DTOs are the contract surface. The day v3 ships, v1 still returns the shape it always did. For internal Nest-to-Nest traffic over Kafka or NATS, version lives on the message envelope, not the URL.

Versioning events, not just endpoints

Events are where versioning gets interesting. There’s no Accept header on a Kafka topic, and you don’t control when the consumer reads the message. You can publish now and have a stale consumer read six hours later on a different schema. So you version the payload, and upcast on read.

The pattern that’s held up for me, on Kafka at the federation platform and Lambda pipelines at the creator economy platform I worked at, is a thin envelope plus an upcaster chain.

// shared/events/envelope.ts
export interface EventEnvelope<T = unknown> {
  eventId: string;
  eventType: string;
  schemaVersion: number;
  occurredAt: string;
  producer: string;
  payload: T;
}

// orders/events/order-placed.ts
export interface OrderPlacedV1 {
  orderId: string;
  userId: string;
  total: number;
}

export interface OrderPlacedV2 {
  orderId: string;
  customerId: string;
  totalCents: number;
  currency: string;
}

type Upcaster<From, To> = (from: From) => To;

const v1ToV2: Upcaster<OrderPlacedV1, OrderPlacedV2> = (v1) => ({
  orderId: v1.orderId,
  customerId: v1.userId,
  totalCents: Math.round(v1.total * 100),
  currency: 'USD',
});

export function upcastOrderPlaced(envelope: EventEnvelope): OrderPlacedV2 {
  if (envelope.schemaVersion === 2) return envelope.payload as OrderPlacedV2;
  if (envelope.schemaVersion === 1) return v1ToV2(envelope.payload as OrderPlacedV1);
  throw new Error(`unsupported schemaVersion ${envelope.schemaVersion} for ${envelope.eventType}`);
}

Consumers always work with the latest in-memory shape. The upcaster lives next to the event definition so a new contributor sees, in one file, every version this event has had. Never delete an upcaster early. Drop the v1 interface only after the producer stops emitting it and topic retention has passed.

A war story about renaming a field

Back to the auth service. The fix wasn’t “communicate better next time”, which is what postmortems conclude when nobody wants to ship a control. We stopped allowing rename-in-place. The new contract: add the new field, keep the old one, mark it deprecated in the OpenAPI spec with an x-deprecated-at date, remove only after every known consumer had moved.

We first tried this with docs and a Slack ping. Lasted three weeks before another team broke a different downstream the same way. Doc-and-ping is a thoughts-and-prayers strategy.

The real fix was a CI check that diffed the OpenAPI spec on PRs and refused merges that removed or renamed a response field, unless the PR carried a breaking-change-approved label only platform engineers could add. The label triggered a workflow: post in #api-changes, ticket every consumer team, set a sunset date three weeks out, sign-off before removal.

Cost: maybe an extra day on schema-changing PRs. Lesson: encode the policy in CI. Docs don’t survive contact with a large team.

Consumer-driven contracts catch the rest

OpenAPI diffing catches removals and renames. It doesn’t catch “I changed an enum’s meaning” or “I tightened validation”. Consumer-driven contract tests do.

What worked best for us: Pact-style contracts checked into consumer repos, published to a broker, verified against the producer on every producer PR. Nest provider verification is straightforward:

// orders.pact.spec.ts
import { Verifier } from '@pact-foundation/pact';
import { Test } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import { AppModule } from '../src/app.module';

describe('OrdersService provider verification', () => {
  let app: INestApplication;

  beforeAll(async () => {
    const moduleRef = await Test.createTestingModule({ imports: [AppModule] }).compile();
    app = moduleRef.createNestApplication();
    await app.listen(3001);
  });

  afterAll(async () => {
    await app.close();
  });

  it('honors all consumer contracts', async () => {
    const opts = {
      provider: 'orders-service',
      providerBaseUrl: 'http://localhost:3001',
      pactBrokerUrl: process.env.PACT_BROKER_URL,
      pactBrokerToken: process.env.PACT_BROKER_TOKEN,
      providerVersion: process.env.GIT_SHA,
      publishVerificationResult: process.env.CI === 'true',
      stateHandlers: {
        'an order exists with id 123': async () => {
          // seed test data
        },
      },
    };

    await new Verifier(opts).verifyProvider();
  });
});

The contract test doesn’t test your service. It tests that you didn’t lie to a consumer. A breaking change shows up as a failed verification against a specific consumer, by name, with the exact interaction that broke. Way more useful than a generic “API changed” alert.

A war story about WebSocket protocol drift

Different transport, same lesson. At a real-time trading platform I architected, Socket.io between a Node gateway and browser clients. The gateway shipped a “minor” tick-frame change, renaming p to price because someone was tired of cryptic field names in logs. Backwards-incompatible. Browser code only updated on full reload, and most users keep tabs open through the session. Within fifteen minutes the support inbox filled up.

First attempt: a hotfix emitting both p and price for one release. Worked for active sessions. Next release dropped p, on the assumption everyone had reloaded. They hadn’t. Same incident, second time.

The real fix was a protocol version on the WebSocket handshake. The client announces its supported version in the connect query, the gateway pins the frame shape to that version for the lifetime of the socket, and we never silently change a shape inside a version. Two weeks of work. Never had that class of incident again.

Versioning isn’t only HTTP. It’s every contract crossing a process boundary, and the moment you have a stale consumer, you need a way to say “I know what shape I was promising you when we shook hands”.

Takeaways

  • Decide what counts as a breaking change in writing, before you need to.
  • URL path for external HTTP, header for internal mesh, query params never.
  • Version event payloads with a schemaVersion and upcast on read.
  • Encode the policy in CI. Docs are not a control.
  • Add consumer-driven contracts. They catch what diffs can’t.
  • Protocol versioning applies to WebSocket and async transports too.

Thanks for reading. If you’ve got thoughts, send them my way.

© 2026 Akin Gundogdu. All Rights Reserved.