Rate Limiting and Throttling in NestJS

Why I moved off @nestjs/throttler to a Redis-backed limiter, and how to pick between token bucket and sliding window without burning a weekend.

It was a Tuesday morning at the real-time trading platform I architected. Market open at 09:30 London time. By 09:31 the gateway tier was on fire, clients were reconnecting in a tight loop, and our nice in-memory throttler was happily letting each pod think it was the only one in the world. The limiter had no idea the same client had just been kicked off seven other pods. It was, in effect, no limiter at all.

That’s the moment I stopped trusting @nestjs/throttler defaults. Not because the library is bad. It’s fine for a single instance. But the day you autoscale, every pod is its own little kingdom with its own counter, and the math stops working.

This is the rate-limiting setup I keep reaching for in NestJS now. Redis-backed, per-tier, with proper headers so clients can actually behave.

Why the in-memory limiter breaks

@nestjs/throttler ships with an in-memory store. One Node process, one map of ip -> hits. If you scale to six pods and a client lands on a different pod each request, your “100 req/min” limit is really “600 req/min” in the worst case. Sticky sessions kind of help, but the moment a pod restarts or rolls during a deploy, the counter resets and your limit is fiction again.

The fix is moving the counter out of the pod. Redis is the obvious place. NestJS makes that swap clean, you just provide a different storage backend.

import { Module } from '@nestjs/common';
import { ThrottlerModule } from '@nestjs/throttler';
import { ThrottlerStorageRedisService } from 'nestjs-throttler-storage-redis';
import Redis from 'ioredis';

@Module({
  imports: [
    ThrottlerModule.forRootAsync({
      useFactory: () => ({
        throttlers: [{ ttl: 60_000, limit: 120 }],
        storage: new ThrottlerStorageRedisService(
          new Redis({
            host: process.env.REDIS_HOST,
            port: Number(process.env.REDIS_PORT),
            enableOfflineQueue: false,
            maxRetriesPerRequest: 1,
          }),
        ),
      }),
    }),
  ],
})
export class AppModule {}

enableOfflineQueue: false matters more than people think. If Redis is down, you don’t want your API to queue every limiter call in memory and then explode 30 seconds later. Fail fast, fall open or fall closed depending on your policy, but don’t pretend.

Token bucket vs sliding window

These two cover almost everything I’ve shipped. Pick one per route family, not one globally.

Token bucket is for bursts. You refill at a steady rate, but a client can spend a chunk of tokens all at once if they’ve been quiet. Good fit for write APIs where users sometimes upload a batch, then go silent for an hour. The implementation is one Lua script in Redis, atomic, ~200 microseconds when the connection is warm.

Sliding window is for fairness. No bursts, no surprise. The last N seconds are what they are. Good fit for login, password reset, anything that’s a step in a security flow.

A real Lua-backed token bucket guard inside NestJS looks like this. I lifted the shape from a limiter I built for the creator economy platform I worked at, slightly simplified.

import { CanActivate, ExecutionContext, Injectable } from '@nestjs/common';
import { Reflector } from '@nestjs/core';
import Redis from 'ioredis';

const SCRIPT = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refillPerSec = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local data = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(data[1]) or capacity
local ts = tonumber(data[2]) or now

local delta = math.max(0, now - ts) / 1000
tokens = math.min(capacity, tokens + delta * refillPerSec)

local allowed = 0
if tokens >= cost then
  tokens = tokens - cost
  allowed = 1
end

redis.call('HMSET', key, 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', key, 60000)
return { allowed, tokens }
`;

@Injectable()
export class TokenBucketGuard implements CanActivate {
  constructor(
    private readonly redis: Redis,
    private readonly reflector: Reflector,
  ) {}

  async canActivate(ctx: ExecutionContext): Promise<boolean> {
    const req = ctx.switchToHttp().getRequest();
    const res = ctx.switchToHttp().getResponse();
    const tier = req.user?.plan ?? 'anonymous';
    const { capacity, refillPerSec } = limitsForTier(tier);
    const key = `tb:${tier}:${req.user?.id ?? req.ip}`;

    const [allowed, remaining] = (await this.redis.eval(
      SCRIPT, 1, key, capacity, refillPerSec, Date.now(), 1,
    )) as [number, number];

    res.setHeader('X-RateLimit-Limit', capacity);
    res.setHeader('X-RateLimit-Remaining', Math.floor(remaining));
    res.setHeader('X-RateLimit-Reset', Math.ceil((capacity - remaining) / refillPerSec));

    if (!allowed) {
      res.setHeader('Retry-After', Math.ceil(1 / refillPerSec));
      return false;
    }
    return true;
  }
}

A couple of details that aren’t obvious from the code. The Lua script is one round trip, atomic, no race between read and write. The limitsForTier function is a plain object lookup, not a database call, you do not want a Postgres hit on every request. And the headers are not optional, see the war story below.

Tiered limits without the if-else mess

Tiers live in config, not in code. Free plan, paid plan, internal service-to-service, admin tools. They all share the same guard, just different numbers.

type Tier = 'anonymous' | 'free' | 'pro' | 'internal';

export function limitsForTier(tier: Tier) {
  switch (tier) {
    case 'internal': return { capacity: 5000, refillPerSec: 100 };
    case 'pro':      return { capacity: 600,  refillPerSec: 10  };
    case 'free':     return { capacity: 120,  refillPerSec: 2   };
    default:         return { capacity: 30,   refillPerSec: 0.5 };
  }
}

The thing I learned the hard way is to keep internal traffic on the same path. Don’t bypass the limiter for service-to-service calls, just give them a fat tier. The day a runaway cron job starts looping is the day you’ll wish your own services were measurable.

A war story about missing headers

A few years back at a live-video creator startup I led at, we shipped a public API for creators to query their own analytics. Limiter was in place. Returned 429 on overage. No X-RateLimit-* headers, no Retry-After. A few weeks later, support tickets piled up, all from one creator who’d hit 429 and then retried in a tight while-loop because their SDK saw an error and had no signal to slow down.

First fix was the obvious one. We sent them a Slack message asking them to back off. That worked for one creator, did nothing for the next.

Real fix was to add the headers, publish a small client-side helper that respected Retry-After, and write a doc page that pinned both. The behavior on the API side did not change at all. The clients changed. Once they could see the budget, they used it.

The lesson stuck. Headers are not decoration, they are the limiter’s API. If the client can’t read them, you don’t have rate limiting, you have rate punishment.

Takeaways

Move the counter to Redis the day you scale past one pod.
Pick token bucket for write bursts, sliding window for security flows.
Tiers live in config, not in code, and internal traffic uses the same path with a fat budget.
Send X-RateLimit-* and Retry-After on every limited response, every time.
Edge-layer connection limits matter more than app-layer ones during a reconnect storm.

Thanks for reading. If you’ve got thoughts, send them my way.