A production take on graphql-ruby in Rails API mode: Dataloader for N+1, complexity limits, persisted queries, ActionCable subscriptions, and why versioned endpoints are the wrong instinct.
It was a Tuesday morning at the creator economy platform I worked at when the Community feed went sideways. Around 10:14 a.m. Pacific, p99 on /communities/:id/posts climbed from about 120 ms to over 8 seconds. The dashboard read like a textbook reader-replica lag fire. The actual fire was upstream of the readers. A maintenance ANALYZE was holding write-side locks on community_posts, one of the hottest tables on the platform, starving WAL emission. Replicas had nothing to apply. Lag at 14 minutes and growing.
I wasn’t on-call that week. I was in joker mode with three squads and the Community team pinged me because I’d lived inside the Aurora layer for months. We killed the analyze. Lag drained in about six minutes. Same week we shipped db_safe_maintenance.rb and the runbook got a literal sentence at the top: “Before touching reader scaling, check pg_stat_activity on the writer.”
That morning is a big part of why I’m careful with GraphQL on Rails. The resolver layer hides exactly the kind of database behaviour that turns a quiet Tuesday into a 22-minute blackout. Used carefully, graphql-ruby in Rails API mode is the right call for client-shaped data. Used naively, it’s an N+1 generator with a friendly DSL.
Rails API mode strips middleware you don’t need for a GraphQL server. No view rendering, no flash, no cookies by default. The endpoint is a single POST. Auth happens once, in a middleware or a base resolver. That’s the whole shape.
# Gemfile
gem "graphql", "~> 2.3"
gem "graphql-batch"
# app/controllers/graphql_controller.rb
class GraphqlController < ApplicationController
def execute
result = AppSchema.execute(
params[:query],
variables: prepare_variables(params[:variables]),
context: {
current_user: current_user,
request_id: request.request_id
},
operation_name: params[:operationName]
)
render json: result
rescue GraphQL::ParseError, GraphQL::ExecutionError => e
render json: { errors: [{ message: e.message }] }, status: :unprocessable_entity
end
private
def prepare_variables(raw)
case raw
when String then raw.present? ? JSON.parse(raw) : {}
when Hash then raw
when nil then {}
end
end
end
One controller. One method. The rest is types and resolvers, which is where the work actually is.
The reason graphql-ruby ships with GraphQL::Dataloader is because the naive resolver pattern is a footgun. Each field that touches the database resolves once per parent. A list of 50 posts asking for their author becomes 51 queries. On a hot table, that’s how you ship the Aurora Tuesday I just described, except as a slow burn instead of an outage.
# app/graphql/sources/record_source.rb
class Sources::RecordSource < GraphQL::Dataloader::Source
def initialize(model_class)
@model_class = model_class
end
def fetch(ids)
records = @model_class.where(id: ids).index_by(&:id)
ids.map { |id| records[id] }
end
end
# app/graphql/types/post_type.rb
module Types
class PostType < BaseObject
field :id, ID, null: false
field :body, String, null: false
field :author, Types::UserType, null: false
def author
dataloader.with(Sources::RecordSource, User).load(object.author_id)
end
end
end
That’s it. The dataloader batches every author_id requested in the current operation into a single IN (...) query against users. 50 posts, 1 query for authors. The fix is mechanical once you internalise it.
Where Dataloader does not save you: aggregations, counts, has-many through joins that need the whole graph. For those, write a hand-rolled source. The pattern is “batch by key”, not “magic”. I’ve seen squads load comments-of-post-of-feed through three layers of nested Dataloaders and produce a query plan that made the writer cry. When the shape gets weird, drop into a service object and call it from one place.
GraphQL’s killer feature for clients is also its operational hazard. A client can request a query that joins five lists, each of which loads twenty, each of which loads ten. That’s 1000 records on one POST. Worse, it’s all behind a single endpoint, so your usual “rate limit per route” doesn’t help.
# app/graphql/app_schema.rb
class AppSchema < GraphQL::Schema
use GraphQL::Dataloader
max_depth 12
max_complexity 2000
default_max_page_size 50
validate_max_errors 5
query(Types::QueryType)
subscription(Types::SubscriptionType)
def self.unauthorized_object(error)
raise GraphQL::ExecutionError, "Unauthorized"
end
end
max_complexity is the one that actually matters at scale. Set it tight on day one. Every field with a list resolver should declare its complexity so the calculation reflects reality.
field :posts, [Types::PostType], null: false do
argument :first, Integer, required: true
complexity ->(_ctx, args, child_complexity) {
args[:first] * child_complexity
}
end
I’ve watched a perfectly nice client app request 2500-complexity queries because a junior dev nested two paginated lists. The endpoint refused with a clean error and we added pagination defaults on the client. Without the limit, that’s a slow-motion replica-lag incident waiting to happen.
Sending the full query string on every request is fine for development. In production it’s a waste of bandwidth and a security smell. Clients can still send arbitrary queries, which means your complexity limit is the only thing between you and a bad actor.
The pattern I use is straightforward. At build time, the client extracts every named operation, hashes it, ships a manifest. At runtime, the client sends the hash, not the query. Server validates the hash, executes the cached document, returns the result.
# app/graphql/persisted_query_store.rb
class PersistedQueryStore
def self.fetch(hash)
Rails.cache.fetch(["pq", hash], expires_in: 1.day) do
record = PersistedQuery.find_by(query_hash: hash)
record&.query
end
end
end
# in GraphqlController#execute
def execute
query_string = params[:query] || PersistedQueryStore.fetch(params[:queryHash])
raise GraphQL::ExecutionError, "Unknown query" if query_string.nil?
# ... unchanged below
end
What this buys you: the client cannot send anything the server hasn’t seen at build time. Request body is a hash, not a 4 KB query. If you ever need to kill a misbehaving operation, flip a flag on the row and the world stops sending it. That’s the part I care about.
The default story for real-time GraphQL on Rails is graphql-anycable or graphql-ruby’s built-in ActionCableSubscriptions. I’ve shipped both. For a Rails-API-mode app where you’re not running a separate WebSocket service, ActionCable is the lower-friction call.
# config/application.rb
config.action_cable.mount_path = "/cable"
config.action_cable.allowed_request_origins = [/https:\/\/.*\.example\.com/]
# app/channels/graphql_channel.rb
class GraphqlChannel < ApplicationCable::Channel
def execute(data)
result = AppSchema.execute(
data["query"],
context: {
current_user: current_user,
channel: self
},
variables: data["variables"] || {},
operation_name: data["operationName"]
)
transmit({ result: result.to_h, more: result.subscription? })
end
def unsubscribed
AppSchema.subscriptions.delete_channel_subscriptions(self)
end
end
The thing that bit me on a real-time trading and charting platform I architected was treating subscription fan-out like it was free. Concurrent connections subscribing to “all ticks” is not a subscription, it’s a broadcast. For GraphQL subs, set a hard per-user subscription count, scope subs to a small entity (a single community, a single channel), and rate-limit publish at the source.
Here’s the opinion. Do not version your GraphQL endpoint. Not /v1/graphql, not ?version=2, not separate schemas per client. Evolve the schema in place.
The Rails crowd loves a v2 namespace because that’s REST muscle memory. For GraphQL it’s wrong. The whole point of the contract is that clients ask for the fields they want, and the server adds fields without breaking anyone. Versioning the endpoint throws away the one property that makes GraphQL different from REST.
The three moves that replace versioning:
# adding a field, safe, always
field :reaction_count, Integer, null: false
# deprecating a field, mark it, keep serving
field :likes_count,
Integer,
null: false,
deprecation_reason: "Use reactionCount; scheduled for removal"
# replacing a type, additive, long deprecation window
field :author_v2, Types::AuthorType, null: false
The same instinct applies to GraphQL schema changes. There is no “safe and instant” breaking change. You do it in steps. Add the new shape, mark the old one deprecated, give clients a window, only then remove it. The endpoint stays the same. Clients never switch URLs. The migration is in the schema, not the routing table.
If a change is so structural you feel like you need a v2, the right move is a new type, not a new endpoint. UserV2 next to User. Old clients keep working. New clients adopt the new shape. Six months later, delete User. Done.
Per-resolver duration is table stakes. The two I care about more on a real Rails-API GraphQL service: complexity score distribution, and deprecated-field usage. The first tells you when clients are starting to ask for too much. The second tells you when you can finally delete the type you’ve been carrying for a year.
class AppSchema < GraphQL::Schema
trace_with GraphQL::Tracing::DataDogTracing
end
Spans per resolver, plus a custom counter for field_deprecated_used keyed by field name. Three lines of Datadog config gets you most of the way.
GraphQL::Dataloader solves the default N+1. Where the shape is weird, drop into a service object.max_complexity on day one. The single endpoint is your whole rate-limit surface.Thanks for reading. If you’ve got thoughts, send them my way.