GraphQL in Rails API Mode

A production take on graphql-ruby in Rails API mode: Dataloader for N+1, complexity limits, persisted queries, ActionCable subscriptions, and why versioned endpoints are the wrong instinct.

It was a Tuesday morning at the creator economy platform I worked at when the Community feed went sideways. Around 10:14 a.m. Pacific, p99 on /communities/:id/posts climbed from about 120 ms to over 8 seconds. The dashboard read like a textbook reader-replica lag fire. The actual fire was upstream of the readers. A maintenance ANALYZE was holding write-side locks on community_posts, one of the hottest tables on the platform, starving WAL emission. Replicas had nothing to apply. Lag at 14 minutes and growing.

I wasn’t on-call that week. I was in joker mode with three squads and the Community team pinged me because I’d lived inside the Aurora layer for months. We killed the analyze. Lag drained in about six minutes. Same week we shipped db_safe_maintenance.rb and the runbook got a literal sentence at the top: “Before touching reader scaling, check pg_stat_activity on the writer.”

That morning is a big part of why I’m careful with GraphQL on Rails. The resolver layer hides exactly the kind of database behaviour that turns a quiet Tuesday into a 22-minute blackout. Used carefully, graphql-ruby in Rails API mode is the right call for client-shaped data. Used naively, it’s an N+1 generator with a friendly DSL.

Why API mode

Rails API mode strips middleware you don’t need for a GraphQL server. No view rendering, no flash, no cookies by default. The endpoint is a single POST. Auth happens once, in a middleware or a base resolver. That’s the whole shape.

# Gemfile
gem "graphql", "~> 2.3"
gem "graphql-batch"

# app/controllers/graphql_controller.rb
class GraphqlController < ApplicationController
  def execute
    result = AppSchema.execute(
      params[:query],
      variables: prepare_variables(params[:variables]),
      context: {
        current_user: current_user,
        request_id: request.request_id
      },
      operation_name: params[:operationName]
    )
    render json: result
  rescue GraphQL::ParseError, GraphQL::ExecutionError => e
    render json: { errors: [{ message: e.message }] }, status: :unprocessable_entity
  end

  private

  def prepare_variables(raw)
    case raw
    when String  then raw.present? ? JSON.parse(raw) : {}
    when Hash    then raw
    when nil     then {}
    end
  end
end

One controller. One method. The rest is types and resolvers, which is where the work actually is.

Dataloader fixes N+1, mostly

The reason graphql-ruby ships with GraphQL::Dataloader is because the naive resolver pattern is a footgun. Each field that touches the database resolves once per parent. A list of 50 posts asking for their author becomes 51 queries. On a hot table, that’s how you ship the Aurora Tuesday I just described, except as a slow burn instead of an outage.

# app/graphql/sources/record_source.rb
class Sources::RecordSource < GraphQL::Dataloader::Source
  def initialize(model_class)
    @model_class = model_class
  end

  def fetch(ids)
    records = @model_class.where(id: ids).index_by(&:id)
    ids.map { |id| records[id] }
  end
end

# app/graphql/types/post_type.rb
module Types
  class PostType < BaseObject
    field :id, ID, null: false
    field :body, String, null: false
    field :author, Types::UserType, null: false

    def author
      dataloader.with(Sources::RecordSource, User).load(object.author_id)
    end
  end
end

That’s it. The dataloader batches every author_id requested in the current operation into a single IN (...) query against users. 50 posts, 1 query for authors. The fix is mechanical once you internalise it.

Where Dataloader does not save you: aggregations, counts, has-many through joins that need the whole graph. For those, write a hand-rolled source. The pattern is “batch by key”, not “magic”. I’ve seen squads load comments-of-post-of-feed through three layers of nested Dataloaders and produce a query plan that made the writer cry. When the shape gets weird, drop into a service object and call it from one place.

Complexity limits or you ship a DDoS endpoint

GraphQL’s killer feature for clients is also its operational hazard. A client can request a query that joins five lists, each of which loads twenty, each of which loads ten. That’s 1000 records on one POST. Worse, it’s all behind a single endpoint, so your usual “rate limit per route” doesn’t help.

# app/graphql/app_schema.rb
class AppSchema < GraphQL::Schema
  use GraphQL::Dataloader

  max_depth 12
  max_complexity 2000
  default_max_page_size 50
  validate_max_errors 5

  query(Types::QueryType)
  subscription(Types::SubscriptionType)

  def self.unauthorized_object(error)
    raise GraphQL::ExecutionError, "Unauthorized"
  end
end

max_complexity is the one that actually matters at scale. Set it tight on day one. Every field with a list resolver should declare its complexity so the calculation reflects reality.

field :posts, [Types::PostType], null: false do
  argument :first, Integer, required: true
  complexity ->(_ctx, args, child_complexity) {
    args[:first] * child_complexity
  }
end

I’ve watched a perfectly nice client app request 2500-complexity queries because a junior dev nested two paginated lists. The endpoint refused with a clean error and we added pagination defaults on the client. Without the limit, that’s a slow-motion replica-lag incident waiting to happen.

Persisted queries in production

Sending the full query string on every request is fine for development. In production it’s a waste of bandwidth and a security smell. Clients can still send arbitrary queries, which means your complexity limit is the only thing between you and a bad actor.

The pattern I use is straightforward. At build time, the client extracts every named operation, hashes it, ships a manifest. At runtime, the client sends the hash, not the query. Server validates the hash, executes the cached document, returns the result.

# app/graphql/persisted_query_store.rb
class PersistedQueryStore
  def self.fetch(hash)
    Rails.cache.fetch(["pq", hash], expires_in: 1.day) do
      record = PersistedQuery.find_by(query_hash: hash)
      record&.query
    end
  end
end

# in GraphqlController#execute
def execute
  query_string = params[:query] || PersistedQueryStore.fetch(params[:queryHash])
  raise GraphQL::ExecutionError, "Unknown query" if query_string.nil?
  # ... unchanged below
end

What this buys you: the client cannot send anything the server hasn’t seen at build time. Request body is a hash, not a 4 KB query. If you ever need to kill a misbehaving operation, flip a flag on the row and the world stops sending it. That’s the part I care about.

Subscriptions over ActionCable

The default story for real-time GraphQL on Rails is graphql-anycable or graphql-ruby’s built-in ActionCableSubscriptions. I’ve shipped both. For a Rails-API-mode app where you’re not running a separate WebSocket service, ActionCable is the lower-friction call.

# config/application.rb
config.action_cable.mount_path = "/cable"
config.action_cable.allowed_request_origins = [/https:\/\/.*\.example\.com/]

# app/channels/graphql_channel.rb
class GraphqlChannel < ApplicationCable::Channel
  def execute(data)
    result = AppSchema.execute(
      data["query"],
      context: {
        current_user: current_user,
        channel: self
      },
      variables: data["variables"] || {},
      operation_name: data["operationName"]
    )
    transmit({ result: result.to_h, more: result.subscription? })
  end

  def unsubscribed
    AppSchema.subscriptions.delete_channel_subscriptions(self)
  end
end

The thing that bit me on a real-time trading and charting platform I architected was treating subscription fan-out like it was free. Concurrent connections subscribing to “all ticks” is not a subscription, it’s a broadcast. For GraphQL subs, set a hard per-user subscription count, scope subs to a small entity (a single community, a single channel), and rate-limit publish at the source.

Schema evolution without versioned endpoints

Here’s the opinion. Do not version your GraphQL endpoint. Not /v1/graphql, not ?version=2, not separate schemas per client. Evolve the schema in place.

The Rails crowd loves a v2 namespace because that’s REST muscle memory. For GraphQL it’s wrong. The whole point of the contract is that clients ask for the fields they want, and the server adds fields without breaking anyone. Versioning the endpoint throws away the one property that makes GraphQL different from REST.

The three moves that replace versioning:

# adding a field, safe, always
field :reaction_count, Integer, null: false

# deprecating a field, mark it, keep serving
field :likes_count,
      Integer,
      null: false,
      deprecation_reason: "Use reactionCount; scheduled for removal"

# replacing a type, additive, long deprecation window
field :author_v2, Types::AuthorType, null: false

The same instinct applies to GraphQL schema changes. There is no “safe and instant” breaking change. You do it in steps. Add the new shape, mark the old one deprecated, give clients a window, only then remove it. The endpoint stays the same. Clients never switch URLs. The migration is in the schema, not the routing table.

If a change is so structural you feel like you need a v2, the right move is a new type, not a new endpoint. UserV2 next to User. Old clients keep working. New clients adopt the new shape. Six months later, delete User. Done.

What I actually monitor

Per-resolver duration is table stakes. The two I care about more on a real Rails-API GraphQL service: complexity score distribution, and deprecated-field usage. The first tells you when clients are starting to ask for too much. The second tells you when you can finally delete the type you’ve been carrying for a year.

class AppSchema < GraphQL::Schema
  trace_with GraphQL::Tracing::DataDogTracing
end

Spans per resolver, plus a custom counter for field_deprecated_used keyed by field name. Three lines of Datadog config gets you most of the way.

Takeaways

Use Rails API mode. One controller, one POST. The rest is types and resolvers.
GraphQL::Dataloader solves the default N+1. Where the shape is weird, drop into a service object.
Set max_complexity on day one. The single endpoint is your whole rate-limit surface.
Persisted queries shrink payloads, kill arbitrary queries, and let you kill bad operations on the fly.
ActionCable is fine for subscriptions in API mode, until fan-out becomes your problem. Limit per user.
Do not version the endpoint. Evolve the schema in place. Add, deprecate, remove. New type instead of new URL.

Thanks for reading. If you’ve got thoughts, send them my way.