Serving 10 Million Requests Per Minute

A first-person scaling story from a real-time trading platform. Connection storms, cache stampedes, replica lag, and the rebuild that actually held.

The first time the platform did ten million requests per minute, I wasn’t celebrating. I was eight minutes into a war room channel, watching the gateway tier eat itself alive while charts on the client side froze on stale prices. It was a Tuesday after a long bank-holiday weekend. Markets opened at 09:30 local. By 09:31:14 we were on fire.

This is the story of how a real-time trading and charting platform I architected went from a few hundred thousand requests per minute to ten million per minute, and what broke along the way. I want to be honest about it. Most of the path was not pretty.

Where the scaling story started

We were running a Node.js and TypeScript gateway tier behind nginx, with Socket.io fan-out for tick-level price streams. PostgreSQL for the persisted side, Redis for the hot side. Single-region. Single-writer Postgres with a couple of read replicas. The kind of architecture that does a hundred thousand requests per minute without breaking a sweat and starts wheezing somewhere around half a million.

The first time we doubled traffic, nothing dramatic happened. The second time, the connection pool went down.

// db.ts - the version we shipped first, and regretted
import { Pool } from "pg";

export const pool = new Pool({
  host: process.env.PG_HOST,
  user: process.env.PG_USER,
  password: process.env.PG_PASSWORD,
  database: process.env.PG_DB,
  max: 20,
  idleTimeoutMillis: 30_000,
});

Twenty connections per pod. We had eight pods. Postgres max_connections was 200. Math says we had headroom. Reality said otherwise, because every long query held its connection, every transaction inside an HTTP handler held its connection, and every retry held its connection twice. At peak we were starving the pool by 9:32 and timing out by 9:34.

The reconnection storm at market open

I keep coming back to this one because it taught me the thing about scaling that I now believe more than anything else: autoscale is not a fix for a client-side bug.

It was the Tuesday after that bank-holiday weekend. 09:31:14, seventy-four seconds after the open. Connections started dropping en masse. Clients reconnected immediately, were dropped again, reconnected again. Within ninety seconds the reconnect storm had every gateway pod pinned at 100% CPU. p99 tick fan-out latency went from around 80 ms to roughly 3 seconds. Charts started showing stale prices. The worst possible failure mode for a trading product. I was on-call.

My first move was to scale gateway pods 3x via the autoscaler’s manual override. Straight to nine pods with kubectl scale. The new pods came online, ran straight into the reconnect storm, and went CPU-bound within about twenty seconds. I was feeding the fire. Worse, more pods meant more partial-success reconnects, clients latched onto a healthy pod for a heartbeat, got the “connection established” signal, then dropped again when that pod saturated.

The real fix was two things in parallel. First, an emergency client-side config push through a remote-config channel we’d built for exactly this kind of moment. Jittered exponential backoff on reconnects.

const reconnectConfig = {
  minDelayMs: 200,
  maxDelayMs: 30_000,
  factor: 2,
  jitterRatio: 0.5,
};

function nextDelay(attempt: number) {
  const base = Math.min(
    reconnectConfig.maxDelayMs,
    reconnectConfig.minDelayMs * reconnectConfig.factor ** attempt,
  );
  const jitter = base * reconnectConfig.jitterRatio * (Math.random() * 2 - 1);
  return Math.max(reconnectConfig.minDelayMs, Math.floor(base + jitter));
}

Second, a per-IP connection-rate limiter at the nginx layer. Deliberately tight.

limit_req_zone $binary_remote_addr zone=ws_conns:10m rate=3r/s;

server {
  listen 443 ssl http2;
  location /ws {
    limit_req zone=ws_conns burst=5 nodelay;
    proxy_pass http://gateway_upstream;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 75s;
  }
}

About eight minutes after the config push the connection pool stabilized and fan-out latency came back under 200 ms. Roughly fourteen minutes of degraded tick delivery during one of the most-watched windows of the trading week. No financial losses for our customers, but plenty of angry support tickets, and a hard lesson written into the post-mortem. Backoff lives on the client. Always.

Replica lag and the wrong instance bump

The other war story I think about is from a different gig, the creator economy platform I worked at, but the shape of the failure was the same. A Tuesday morning around 10:14 a.m. PT, Datadog fired its replica-lag alert. The Community read path was on Aurora, three reader replicas behind a custom routing layer. The lag started at sixty seconds and climbed to fourteen minutes inside four minutes. Community feed p99 read latency went from around 120 ms to over eight seconds.

The on-call’s first instinct was to bump reader instance class up two tiers, r6g.4xlarge to r6g.16xlarge. Reasonable on paper. “We’re CPU-bound on the readers.” Wrong root cause. The readers weren’t CPU-bound at all. They were starved of WAL.

The real fix lived on the writer. A long-running ANALYZE on one of the hottest tables was holding write-side locks and starving WAL emission. Killed the analyze. Lag drained in roughly six minutes. Same week, we shipped a maintenance guardrail that refuses to run heavy commands during peak hours.

class DbSafeMaintenance
  PEAK_WINDOW_UTC = (6..22)

  def self.run!(statement)
    hour = Time.now.utc.hour
    if PEAK_WINDOW_UTC.cover?(hour)
      raise "refusing to run during peak window: #{statement}"
    end
    ActiveRecord::Base.connection.execute(statement)
  end
end

Twenty-two minutes of degraded read latency for millions of customers. No data loss. The runbook now leads with a literal sentence about checking pg_stat_activity on the writer before touching the reader tier. I’m the reason that sentence is there.

The rebuild that held

After the reconnection storm we did a proper rebuild on the trading platform. Not a rewrite. A surgical one. Three things. We split the WebSocket gateway from the HTTP gateway so a connection storm couldn’t take down the order-entry path. We moved hot price snapshots behind a small in-process LRU on each gateway pod, with a singleflight wrapper so a cold key only hits the upstream once even when ten thousand clients ask for it in the same millisecond. And we capped pool sizes by what Postgres could actually handle, not by what a single pod wished it had.

The cache stampede fix was the smallest code change and the biggest reliability win. Honestly. One mutex per key in front of the upstream call. Boring. Worked.

Takeaways

Backoff lives on the client. Servers cannot apologize their way out of a self-amplifying reconnect loop.
Autoscale is not a fix. If new pods inherit the bug, more pods means more bug.
Replica lag is usually a writer problem. Check pg_stat_activity before touching reader sizing.
Pool sizes should be derived from upstream limits, not from per-pod wishful thinking.
Singleflight in front of hot keys is the cheapest stampede fix you’ll ever ship.
Real-time products fail in public. Treat backoff, rate limits, and kill switches as production features, not nice-to-haves.

Thanks for reading. If you’ve got thoughts, send them my way.