From Rack::Attack defaults to Redis-backed sliding windows with Lua atomics. How we shut down a scraping pattern abusing our branded-app admin API without bricking real creators.
OK so, Tuesday afternoon at the creator-economy platform I worked at. Our branded-mobile-app admin API was getting hit hard from a single ASN. Not enough to page anyone, but enough that the p99 on /api/v1/branded_apps/:id/builds was sitting at 480ms when it normally sat at 90ms. The pattern was obvious once we pulled the logs. Same caller, rotating through hundreds of branded app IDs, fetching build metadata every two seconds, twenty-four hours a day. They were scraping creator app catalogues.
Rack::Attack was on. It had been on for years. It just wasn’t catching this, because the caller was rotating user-agent and IP enough to stay under the default per-IP threshold while still pulling more data than any real admin would ever look at.
So we tightened the rate limiter. Properly this time. Sliding-window counts in Redis with a Lua atomic, tiered per authenticated user, with the proper headers so the legit callers knew where they stood. The scraping pattern died in about an hour. No real creator got blocked. This is the writeup.
I still ship Rack::Attack in every Rails app I touch. It’s free, it’s fast, it sits in the rack and rejects garbage before any of your controllers wake up. But the defaults are a floor, not a ceiling.
# Gemfile
gem "rack-attack", "~> 6.7"
# config/initializers/rack_attack.rb
class Rack::Attack
cache.store = ActiveSupport::Cache::RedisCacheStore.new(
url: ENV.fetch("REDIS_URL"),
pool_size: 10,
pool_timeout: 5
)
throttle("ip/req/min", limit: 300, period: 1.minute) do |req|
req.ip unless req.path.start_with?("/health", "/assets")
end
throttle("login/email/5min", limit: 5, period: 5.minutes) do |req|
if req.path == "/users/sign_in" && req.post?
req.params["user"]&.dig("email")&.downcase
end
end
blocklist("fail2ban/abusive-paths") do |req|
Rack::Attack::Fail2Ban.filter("abusive-#{req.ip}", maxretry: 6, findtime: 10.minutes, bantime: 1.hour) do
req.path.match?(/\.(php|asp|env|git)/i)
end
end
self.throttled_responder = lambda do |request|
retry_after = (request.env["rack.attack.match_data"] || {})[:period]
[429, {"Content-Type" => "application/json", "Retry-After" => retry_after.to_s},
[{error: "rate_limited", retry_after: retry_after}.to_json]]
end
end
A few details that matter. The cache store is Redis, not memory, because you have more than one app server and you want them sharing counts. The login throttle is keyed on email, not IP, which is the only thing that actually slows a credential-stuffing run. And the throttled_responder returns JSON with a real Retry-After so clients don’t have to guess.
This catches dumb attacks. It will not catch a careful scraper, and it will not let you do “5 requests per second for free plan, 50 per second for pro plan, 500 per second for enterprise”. For that you need your own layer.
Two algorithms are worth knowing. Token bucket is what you usually want for burst tolerance. Sliding window is what you want for fairness.
Token bucket gives every user a bucket of N tokens, refilling at a steady rate. They can burn the whole bucket in one second if they want, then they wait. Cheap. Easy to reason about. The problem is the boundary case. A user can spend all their tokens at 11:59:59 and then all of next minute’s tokens at 12:00:01. Two seconds, two buckets worth. If you’re rate-limiting an expensive endpoint, that hurts.
Sliding window log keeps the timestamps of the last N requests. When a new one comes in, you drop any timestamp older than the window and count what’s left. Strict. Fair. Slightly more expensive in memory because you’re storing timestamps. For an API surface I actually care about getting right, this is what I reach for.
There’s a middle ground people reach for called sliding window counter, which interpolates between two fixed windows. It’s fine. It’s not better than the log version once Redis is doing the work for you.
The reason sliding window log is practical in production is that Redis evaluates Lua scripts atomically. The full check-and-update happens inside Redis, on one node, without a round trip in the middle. No race.
# app/services/rate_limiter.rb
class RateLimiter
LUA_SCRIPT = <<~LUA
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now .. ':' .. math.random())
redis.call('EXPIRE', key, window)
return {1, limit - count - 1, window}
else
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after = window - (now - tonumber(oldest[2]))
return {0, 0, retry_after}
end
LUA
def initialize(redis: Sidekiq.redis_pool)
@redis = redis
@sha = nil
end
def check(key:, limit:, window_seconds:)
now_ms = (Time.now.to_f * 1000).to_i
window_ms = window_seconds * 1000
result = @redis.with do |conn|
@sha ||= conn.script(:load, LUA_SCRIPT)
begin
conn.evalsha(@sha, keys: [key], argv: [now_ms, window_ms, limit])
rescue Redis::CommandError => e
raise unless e.message.include?("NOSCRIPT")
@sha = conn.script(:load, LUA_SCRIPT)
retry
end
end
allowed, remaining, retry_after_ms = result
Result.new(allowed: allowed == 1, remaining: remaining, retry_after_ms: retry_after_ms)
end
Result = Struct.new(:allowed, :remaining, :retry_after_ms, keyword_init: true) do
def retry_after_seconds = (retry_after_ms / 1000.0).ceil
end
end
The NOSCRIPT rescue is the part people forget. Redis evicts scripts. If your app server boots, caches a SHA, and Redis later flushes the script cache because of a failover or a flush, your next evalsha blows up. Reload and retry. Otherwise you’ll page yourself at 3am because half the pods can rate-limit and half can’t.
Note I’m using Sidekiq.redis_pool rather than $redis or a fresh Redis.new. Connection pools matter, and Sidekiq already runs one in every Rails process. Don’t open a second one.
The point of all this is to give different callers different budgets. Free plan, pro plan, enterprise, internal-service-to-service. And to tell them what the budget is in the response, because clients that can see their budget behave better than clients that can’t.
# app/controllers/concerns/rate_limited.rb
module RateLimited
extend ActiveSupport::Concern
TIERS = {
"free" => {limit: 60, window: 60},
"pro" => {limit: 600, window: 60},
"enterprise" => {limit: 6000, window: 60},
"internal" => {limit: 60_000, window: 60}
}.freeze
included do
before_action :enforce_rate_limit
end
private
def enforce_rate_limit
return if current_user.nil?
tier = current_user.api_tier || "free"
config = TIERS.fetch(tier)
key = "rl:#{tier}:user:#{current_user.id}:#{controller_name}"
result = RateLimiter.new.check(
key: key,
limit: config[:limit],
window_seconds: config[:window]
)
response.set_header("X-RateLimit-Limit", config[:limit].to_s)
response.set_header("X-RateLimit-Remaining", result.remaining.to_s)
response.set_header("X-RateLimit-Reset", (Time.now + result.retry_after_seconds).to_i.to_s)
return if result.allowed
response.set_header("Retry-After", result.retry_after_seconds.to_s)
render json: {error: "rate_limited", retry_after: result.retry_after_seconds},
status: :too_many_requests
end
end
The controller_name in the key is intentional. It lets you tune limits per controller without rewriting the concern. If /builds is hot but /profile is cold, you keep them in separate keyspaces. And the headers are non-negotiable. A well-behaved client backs off on Retry-After. A misbehaving one doesn’t, and now you know which one to ban.
Rate limits are application-level. They are not your only defense.
A rate limiter at the application layer is great for fairness. It is not a DDoS mitigation. It is not a circuit breaker. It will not save you from a thundering herd of your own clients reconnecting. Different problems, different tools.
NOSCRIPT. Always use a connection pool.X-RateLimit-* and Retry-After. Well-behaved clients will back off.Thanks for reading. If you’ve got thoughts, send them my way.