gRPC for Service-to-Service Communication

Where gRPC actually pays off in service-to-service traffic, where REST still wins, and how Protobuf, streaming, Envoy, and schema evolution shake out in production.

At the combat-sports tournament platform I CTO’d in London, two services on the same Kubernetes cluster were chatting over REST and a JSON parser was showing up on a Datadog flame graph. The call was rankings-api to athlete-profile, twice per page render, both services my team owned. p99 on the chained call was around 180ms. Half of that was JSON. We swapped that one path to gRPC over a long weekend. p99 dropped under 60ms and the flame graph went quiet. That was the moment I stopped thinking of gRPC as a “fancy RPC” and started thinking of it as the default inside a bounded context.

I’ll save you the position. gRPC is the right default for synchronous service-to-service calls when both ends are owned by the same team or a small group of cooperating teams. It is the wrong default for public APIs, browser clients, and cross-org integrations where you do not control both ends. REST/JSON wins those by default and it isn’t close.

The rest of this is the why, with code, plus the parts that bit me.

Protobuf is the actual product

The wire protocol matters. The schema matters more. Protobuf gives you a contract you can lint, version, and review in PRs, and a code generator that hands typed clients to every consumer. That part is where the real value lives. The binary encoding is a nice tax break on top.

syntax = "proto3";

package rankings.v1;

import "google/protobuf/timestamp.proto";

service Rankings {
  rpc GetRank(GetRankRequest) returns (Rank);
  rpc ListRanksByDivision(ListRanksRequest) returns (stream Rank);
  rpc StreamRankChanges(StreamRankChangesRequest) returns (stream RankChange);
}

message GetRankRequest {
  string athlete_id = 1;
  string division_id = 2;
}

message Rank {
  string athlete_id = 1;
  string division_id = 2;
  int32 position = 3;
  int32 points = 4;
  google.protobuf.Timestamp updated_at = 5;
  // reserved for points_breakdown, added in v1.2
  reserved 6;
}

message RankChange {
  string athlete_id = 1;
  int32 old_position = 2;
  int32 new_position = 3;
  google.protobuf.Timestamp at = 4;
}

The two details I always check on a .proto review. Field numbers are forever, so reserving them when you remove a field is non-negotiable. And proto3 makes every scalar field have a default value, which means “field missing” and “field set to zero” look identical on the wire. If that distinction matters, use a wrapper type or a oneof. I’ve watched a payment service silently treat a missing amount_cents as a free order because someone trusted the default. Once.

The four streaming modes, briefly

Unary, server streaming, client streaming, bidirectional. The names are accurate and boring. Where it gets interesting is matching the mode to the shape of the data.

Unary is your default. Server streaming is the one I actually reach for in production for paginated reads and live feeds, like the StreamRankChanges above, because it gives you backpressure for free over HTTP/2 flow control. Client streaming is rare in my work, useful for upload-style endpoints. Bidi is the one juniors over-reach for. If your problem fits a WebSocket and a message format, you probably don’t want bidi gRPC, you want a message broker.

Here’s a NestJS server-streaming handler I’d ship today.

import { Controller } from '@nestjs/common';
import { GrpcMethod, GrpcStreamMethod } from '@nestjs/microservices';
import { Observable, Subject } from 'rxjs';

interface StreamRankChangesRequest {
  divisionId: string;
}

interface RankChange {
  athleteId: string;
  oldPosition: number;
  newPosition: number;
  at: string;
}

@Controller()
export class RankingsController {
  constructor(private readonly bus: RankingChangeBus) {}

  @GrpcStreamMethod('Rankings', 'StreamRankChanges')
  streamRankChanges(req: StreamRankChangesRequest): Observable<RankChange> {
    const subject = new Subject<RankChange>();
    const unsubscribe = this.bus.subscribe(req.divisionId, (change) => {
      subject.next(change);
    });

    subject.subscribe({
      complete: () => unsubscribe(),
      error: () => unsubscribe(),
    });

    return subject.asObservable();
  }
}

The thing that catches people: HTTP/2 keeps the connection alive, but a client that wanders off without canceling the stream will hold server resources until your idle timeout fires. Always wire unsubscribe to both complete and error. Always set deadlines on the client.

Envoy in front, not behind

gRPC over plain Kubernetes Service objects load-balances at L4. That means one TCP connection per client to one pod, sticky for the lifetime of the connection. HTTP/2 multiplexing happens inside that single connection, which means a hot client pins itself to a single server pod and your nice replica count does nothing. I have watched this exact problem starve five out of six replicas on a Saturday afternoon. Not fun.

The fix is an L7 proxy that speaks HTTP/2 and balances per-request. Envoy does this. Linkerd does this. The Kubernetes Service does not. Stick Envoy in front of the gRPC service mesh side and let it spread requests across replicas.

# envoy.yaml excerpt
static_resources:
  listeners:
  - name: rankings_listener
    address:
      socket_address: { address: 0.0.0.0, port_value: 8080 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: HTTP2
          stat_prefix: rankings_ingress
          route_config:
            virtual_hosts:
            - name: rankings
              domains: ["*"]
              routes:
              - match: { prefix: "/rankings.v1.Rankings/" }
                route:
                  cluster: rankings_upstream
                  timeout: 2s
                  retry_policy:
                    retry_on: "cancelled,deadline-exceeded,unavailable"
                    num_retries: 2
                    per_try_timeout: 800ms
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
  - name: rankings_upstream
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    http2_protocol_options: {}
    load_assignment:
      cluster_name: rankings_upstream
      endpoints:
      - lb_endpoints:
        - endpoint: { address: { socket_address: { address: rankings, port_value: 50051 }}}

Two things worth flagging. First, retry_on: "cancelled,deadline-exceeded,unavailable" is the safe list. Retrying on internal or aborted will replay non-idempotent writes and you will hate yourself. Second, per_try_timeout matters more than timeout. Without it, three retries can each consume the full budget and you’ve cooked an 8s call when the user expected 2s.

Schema evolution, the only rule that matters

Never reuse a field number. That’s the whole rule. Adding a field is safe, removing a field is safe if you reserved the number, renaming a field on the wire is a no-op because the wire only sees numbers. Changing a field type is almost never safe. Changing int32 to int64 will silently truncate for old clients.

If you have to do something risky, ship the new field alongside the old one for at least one release cycle, migrate readers, then remove the old one in a follow-up. Treat the .proto repo like a database schema. I had a teammate at the creator-tools company where I spent the last few years try to “clean up” a points_breakdown field by re-numbering. Old mobile clients started seeing negative point totals because the old field number was now an enum. That’s a Saturday I want back.

A war story about trusting the contract

At the creator-economy platform I worked at, our branded-mobile-app pipeline had a billing reconciliation step that called an internal subscriptions gRPC service from the renewal worker. The service exposed GetSubscriptionStatus(apple_original_transaction_id). Apple’s server-to-server renewal notification came in, the worker called gRPC, the response said “active”, we wrote a new row.

What went wrong: Apple started retrying renewal notifications aggressively after a slightly-too-slow 200 OK on our side. Our worker called gRPC each time, the gRPC service returned “active” each time because that’s a true statement about a subscription, and the worker created a new creator_subscriptions row each time. A few thousand customers got billed twice across dozens of branded apps. Apple had already taken the money.

First wrong fix went out from the frontend team within an hour. Show only the most recent subscription row per customer. The duplicate billing was untouched. Apple does not issue refunds because a UI hides things.

The real fix had two halves. One, a Sidekiq job behind the renewal endpoint so we returned 200 OK within 5 seconds and Apple stopped retrying. Two, a database unique constraint on (apple_original_transaction_id, notification_uuid) so the duplicate inserts blew up at the database instead of accumulating. The gRPC service was not the bug. The bug was treating “the contract is typed” as “the call is idempotent”. Types don’t give you idempotency. You add that yourself, with a key derived from the upstream’s identifiers, every time.

gRPC vs REST, the actual answer

Inside a team’s bounded context, with HTTP/2-aware load balancing, with schema review discipline, gRPC is the better default. Smaller payloads, deadlines that propagate, real types, streaming when you need it, generated clients in every language.

Across team or org boundaries, REST/JSON wins. Browsers can’t speak gRPC directly without grpc-web and an Envoy translator, and that’s a tax most product teams will never want to pay. JSON is debuggable from a terminal. Postman exists. OpenAPI tooling is mature. The “tax” of typing your way through generated TypeScript clients is real, but it’s a one-team problem that you can solve with discipline. The “tax” of forcing every external consumer to learn Protobuf, install codegen, and accept binary payloads is a coordination problem you cannot solve with discipline.

Takeaways

Default to gRPC for service-to-service calls inside a team’s bounded context. Default to REST/JSON for anything that crosses an org or hits a browser.
Protobuf is the product. Field numbers are forever. reserved is not optional.
Pick the streaming mode that matches the data shape. Don’t reach for bidi unless you’ve ruled out a broker.
Put an L7 proxy in front, Envoy or Linkerd, or your replica count is decorative.
Set deadlines and per_try_timeout on every client. The retry policy is the contract.
Types are not idempotency. Add an idempotency key wherever a retry can fire.

Thanks for reading. If you’ve got thoughts, send them my way.