Rate Limiting in Rails

From Rack::Attack defaults to Redis-backed sliding windows with Lua atomics. How we shut down a scraping pattern abusing our branded-app admin API without bricking real creators.

OK so, Tuesday afternoon at the creator-economy platform I worked at. Our branded-mobile-app admin API was getting hit hard from a single ASN. Not enough to page anyone, but enough that the p99 on /api/v1/branded_apps/:id/builds was sitting at 480ms when it normally sat at 90ms. The pattern was obvious once we pulled the logs. Same caller, rotating through hundreds of branded app IDs, fetching build metadata every two seconds, twenty-four hours a day. They were scraping creator app catalogues.

Rack::Attack was on. It had been on for years. It just wasn’t catching this, because the caller was rotating user-agent and IP enough to stay under the default per-IP threshold while still pulling more data than any real admin would ever look at.

So we tightened the rate limiter. Properly this time. Sliding-window counts in Redis with a Lua atomic, tiered per authenticated user, with the proper headers so the legit callers knew where they stood. The scraping pattern died in about an hour. No real creator got blocked. This is the writeup.

Rack::Attack is the floor

I still ship Rack::Attack in every Rails app I touch. It’s free, it’s fast, it sits in the rack and rejects garbage before any of your controllers wake up. But the defaults are a floor, not a ceiling.

# Gemfile
gem "rack-attack", "~> 6.7"

# config/initializers/rack_attack.rb
class Rack::Attack
  cache.store = ActiveSupport::Cache::RedisCacheStore.new(
    url: ENV.fetch("REDIS_URL"),
    pool_size: 10,
    pool_timeout: 5
  )

  throttle("ip/req/min", limit: 300, period: 1.minute) do |req|
    req.ip unless req.path.start_with?("/health", "/assets")
  end

  throttle("login/email/5min", limit: 5, period: 5.minutes) do |req|
    if req.path == "/users/sign_in" && req.post?
      req.params["user"]&.dig("email")&.downcase
    end
  end

  blocklist("fail2ban/abusive-paths") do |req|
    Rack::Attack::Fail2Ban.filter("abusive-#{req.ip}", maxretry: 6, findtime: 10.minutes, bantime: 1.hour) do
      req.path.match?(/\.(php|asp|env|git)/i)
    end
  end

  self.throttled_responder = lambda do |request|
    retry_after = (request.env["rack.attack.match_data"] || {})[:period]
    [429, {"Content-Type" => "application/json", "Retry-After" => retry_after.to_s},
     [{error: "rate_limited", retry_after: retry_after}.to_json]]
  end
end

A few details that matter. The cache store is Redis, not memory, because you have more than one app server and you want them sharing counts. The login throttle is keyed on email, not IP, which is the only thing that actually slows a credential-stuffing run. And the throttled_responder returns JSON with a real Retry-After so clients don’t have to guess.

This catches dumb attacks. It will not catch a careful scraper, and it will not let you do “5 requests per second for free plan, 50 per second for pro plan, 500 per second for enterprise”. For that you need your own layer.

Token bucket vs sliding window

Two algorithms are worth knowing. Token bucket is what you usually want for burst tolerance. Sliding window is what you want for fairness.

Token bucket gives every user a bucket of N tokens, refilling at a steady rate. They can burn the whole bucket in one second if they want, then they wait. Cheap. Easy to reason about. The problem is the boundary case. A user can spend all their tokens at 11:59:59 and then all of next minute’s tokens at 12:00:01. Two seconds, two buckets worth. If you’re rate-limiting an expensive endpoint, that hurts.

Sliding window log keeps the timestamps of the last N requests. When a new one comes in, you drop any timestamp older than the window and count what’s left. Strict. Fair. Slightly more expensive in memory because you’re storing timestamps. For an API surface I actually care about getting right, this is what I reach for.

There’s a middle ground people reach for called sliding window counter, which interpolates between two fixed windows. It’s fine. It’s not better than the log version once Redis is doing the work for you.

The Lua script

The reason sliding window log is practical in production is that Redis evaluates Lua scripts atomically. The full check-and-update happens inside Redis, on one node, without a round trip in the middle. No race.

# app/services/rate_limiter.rb
class RateLimiter
  LUA_SCRIPT = <<~LUA
    local key = KEYS[1]
    local now = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])
    local limit = tonumber(ARGV[3])

    redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
    local count = redis.call('ZCARD', key)

    if count < limit then
      redis.call('ZADD', key, now, now .. ':' .. math.random())
      redis.call('EXPIRE', key, window)
      return {1, limit - count - 1, window}
    else
      local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
      local retry_after = window - (now - tonumber(oldest[2]))
      return {0, 0, retry_after}
    end
  LUA

  def initialize(redis: Sidekiq.redis_pool)
    @redis = redis
    @sha = nil
  end

  def check(key:, limit:, window_seconds:)
    now_ms = (Time.now.to_f * 1000).to_i
    window_ms = window_seconds * 1000

    result = @redis.with do |conn|
      @sha ||= conn.script(:load, LUA_SCRIPT)
      begin
        conn.evalsha(@sha, keys: [key], argv: [now_ms, window_ms, limit])
      rescue Redis::CommandError => e
        raise unless e.message.include?("NOSCRIPT")
        @sha = conn.script(:load, LUA_SCRIPT)
        retry
      end
    end

    allowed, remaining, retry_after_ms = result
    Result.new(allowed: allowed == 1, remaining: remaining, retry_after_ms: retry_after_ms)
  end

  Result = Struct.new(:allowed, :remaining, :retry_after_ms, keyword_init: true) do
    def retry_after_seconds = (retry_after_ms / 1000.0).ceil
  end
end

The NOSCRIPT rescue is the part people forget. Redis evicts scripts. If your app server boots, caches a SHA, and Redis later flushes the script cache because of a failover or a flush, your next evalsha blows up. Reload and retry. Otherwise you’ll page yourself at 3am because half the pods can rate-limit and half can’t.

Note I’m using Sidekiq.redis_pool rather than $redis or a fresh Redis.new. Connection pools matter, and Sidekiq already runs one in every Rails process. Don’t open a second one.

Tiered limits, surfaced as headers

The point of all this is to give different callers different budgets. Free plan, pro plan, enterprise, internal-service-to-service. And to tell them what the budget is in the response, because clients that can see their budget behave better than clients that can’t.

# app/controllers/concerns/rate_limited.rb
module RateLimited
  extend ActiveSupport::Concern

  TIERS = {
    "free"       => {limit: 60,   window: 60},
    "pro"        => {limit: 600,  window: 60},
    "enterprise" => {limit: 6000, window: 60},
    "internal"   => {limit: 60_000, window: 60}
  }.freeze

  included do
    before_action :enforce_rate_limit
  end

  private

  def enforce_rate_limit
    return if current_user.nil?

    tier = current_user.api_tier || "free"
    config = TIERS.fetch(tier)
    key = "rl:#{tier}:user:#{current_user.id}:#{controller_name}"

    result = RateLimiter.new.check(
      key: key,
      limit: config[:limit],
      window_seconds: config[:window]
    )

    response.set_header("X-RateLimit-Limit", config[:limit].to_s)
    response.set_header("X-RateLimit-Remaining", result.remaining.to_s)
    response.set_header("X-RateLimit-Reset", (Time.now + result.retry_after_seconds).to_i.to_s)

    return if result.allowed

    response.set_header("Retry-After", result.retry_after_seconds.to_s)
    render json: {error: "rate_limited", retry_after: result.retry_after_seconds},
           status: :too_many_requests
  end
end

The controller_name in the key is intentional. It lets you tune limits per controller without rewriting the concern. If /builds is hot but /profile is cold, you keep them in separate keyspaces. And the headers are non-negotiable. A well-behaved client backs off on Retry-After. A misbehaving one doesn’t, and now you know which one to ban.

When this isn’t enough

Rate limits are application-level. They are not your only defense.

A rate limiter at the application layer is great for fairness. It is not a DDoS mitigation. It is not a circuit breaker. It will not save you from a thundering herd of your own clients reconnecting. Different problems, different tools.

Takeaways

Rack::Attack is the floor. Tier limits per authenticated user above it.
Sliding window log in Redis via a Lua atomic. No races, no boundary cheating.
Always handle NOSCRIPT. Always use a connection pool.
Surface X-RateLimit-* and Retry-After. Well-behaved clients will back off.
Rate limiting is one defense. Pair it with edge limits and client-side backoff.

Thanks for reading. If you’ve got thoughts, send them my way.