Rails Engines for Modular Monoliths

Carving team boundaries inside a Rails monolith with mountable engines, inter-engine contracts, and private gems, without going microservices.

A Wednesday afternoon at the creator economy platform I worked at. Shape Up pitch meeting. Someone, half-joking, suggested we extract billing into a microservice. The Rails monolith had been growing for years and squads were tripping over each other in app/models/billing/. The pitch was “let’s just split it out.” I argued the opposite. Not microservices. Not yet. We needed boundaries, not a network call.

That afternoon I sketched the engine skeleton on a whiteboard. Three weeks later, billing was a mountable Rails engine living inside the same repo, the same deploy, the same Sidekiq cluster. No HTTP between services. No new on-call rotation. Just a namespace, a contract, and a gemspec.

This is the part of Rails I think is criminally underused. If you have a large monolith and two or more squads stepping on each other, engines are the cheapest team boundary you can buy.

Why engines, not services

The default reflex when a Rails app gets big is to break it apart over the network. I’ve watched teams do that. It almost always costs more than they expect. Network failure modes where you used to have method calls. Schema drift. A new deploy pipeline. A new on-call rotation.

A Rails engine gives you most of what you actually want, code ownership and a clean import surface, without the operational tax. The engine has its own routes, controllers, models, migrations. Other parts of the app cannot reach into its private classes by accident, because the namespace makes the coupling visible at the require site.

You still share a database. You still share a deploy. You still share Sidekiq. That’s the trade. For a team carving boundaries inside a healthy monolith, that’s the right trade nine times out of ten.

Generating the engine

The Rails generator does most of the heavy lifting. From the monolith’s root:

bundle exec rails plugin new engines/billing \
  --mountable \
  --database=postgresql \
  --skip-test \
  --dummy-path=spec/dummy

That creates engines/billing/ with its own lib/, app/, config/routes.rb, and a billing.gemspec. Mount it from the host app:

# config/routes.rb
Rails.application.routes.draw do
  mount Billing::Engine => "/billing"
  # ... rest of the app
end
# Gemfile
gem "billing", path: "engines/billing"

The --mountable flag is the important one. It isolates the namespace so Billing::Invoice is genuinely distinct from any top-level Invoice. Without it, you get a partial split that leaks back into the host. Don’t take the shortcut.

The contract is the whole game

The reason engines work as a boundary is that you decide, deliberately, what the engine exposes. Everything else is private.

I keep the public surface in a single file at the engine root. No host code is allowed to call into Billing:: anywhere except through it.

# engines/billing/lib/billing/public.rb
module Billing
  module Public
    Result = Struct.new(:ok?, :value, :error, keyword_init: true)

    def self.charge(account_id:, amount_cents:, currency:, idempotency_key:)
      command = Billing::Commands::Charge.new(
        account_id: account_id,
        amount_cents: amount_cents,
        currency: currency,
        idempotency_key: idempotency_key
      )
      outcome = command.call
      Result.new(ok?: outcome.success?, value: outcome.value, error: outcome.error)
    rescue Billing::Errors::DuplicateCharge => e
      Result.new(ok?: false, value: nil, error: e)
    end

    def self.subscription_state(account_id:)
      Billing::Queries::SubscriptionState.call(account_id: account_id)
    end
  end
end

The host calls Billing::Public.charge(...). It does not call Billing::Invoice.find(...). Ever. I enforce this with a RuboCop rule that fails the build if any file outside engines/billing/ references Billing:: other than Billing::Public or Billing::Engine. Cheap, automatic, brutal. The point isn’t decoupling for its own sake. It’s that two years from now, when you actually do want to lift billing out into a separate service, the cutover is a transport swap, not a refactor. You replace the body of Billing::Public.charge with an HTTP client. Nothing else moves.

Migrations inside the engine

Engine migrations live inside the engine. The host installs them with one command:

bundle exec rails billing:install:migrations
bundle exec rails db:migrate

This is also where I learned to be careful, the hard way.

The 87 second login outage

Late evening deploy. Past midnight UTC, but only around 6 p.m. Pacific. We were shipping a schema change to add a non-null column to users, a table with hundreds of millions of rows. The migration was using strong_migrationsadd_column_with_default helper. I’d reviewed it that morning and ack’d it as safe.

The migration acquired an ACCESS EXCLUSIVE lock on users while applying the default backfill. On Aurora at our row count, that meant about 90 seconds of blocked writes. Login. Sign-up. Password reset. Every webhook tied to user creation. Login error rate hit 100% for around 85 seconds. PagerDuty woke half the senior engineers in California.

First instinct: roll back the migration. But Rails doesn’t have a clean rollback for a partially-applied add_column_with_default. Killing it mid-flight risked leaving the table in an inconsistent metadata state.

The real fix was to let the migration finish. It took 87 seconds. Login recovered within fifteen seconds of the lock release because the dependent service had a tight retry loop. Postmortem: split the migration into the three correct steps. add_column null: true, default: nil. Backfill in batches in a separate job. Then change_column_null once fully backfilled. We added a strong_migrations rule that blocks any add_column with a non-null default against hot tables in CI.

What that taught me about engines: the boundary is not a magic shield. A migration inside engines/billing/ still hits the shared writer. The contract has to extend to “no engine ships a hot-table migration without the three-step dance.” We added that to the engine README and a custom cop that flags it in CI.

Packaging as a private gem

Once the engine settles, you can move it out of engines/billing/ and into its own repo, packaged as a private gem. We pulled this trigger on a different engine, a notification one, after about a year of in-tree life.

# billing.gemspec
Gem::Specification.new do |spec|
  spec.name        = "billing"
  spec.version     = Billing::VERSION
  spec.authors     = ["Platform Team"]
  spec.summary     = "Internal billing engine."
  spec.files       = Dir["{app,config,db,lib}/**/*", "MIT-LICENSE", "Rakefile", "README.md"]

  spec.add_dependency "rails", ">= 7.1"
  spec.add_dependency "sidekiq", ">= 7"
  spec.add_dependency "stripe", "~> 12.0"
end
# Gemfile (in host app)
gem "billing", "~> 1.4", source: "https://gem.example.internal"

Private gems force you to version the contract. That sounds heavy. It is, a little. But it pays off the day a junior engineer in another squad sends a PR that would have silently broken a downstream consumer. Now they have to bump a minor version. Everyone sees the change in a Gemfile.lock diff. The discipline is the feature.

Idempotency at the boundary

Adjacent lesson from the same era. Native billing for our branded-mobile-app pipeline, Apple and Google in-app purchases inside branded mobile apps. Shipped, quietly running for about six months.

A creator opened a support ticket. Their customers had been charged twice. Apple’s SubscriptionRenewal server-to-server notification had been retried after our endpoint returned 200 OK slightly past its 30 second deadline. We had no idempotency check on the renewal handler. Every retry created a new creator_subscriptions row. A few thousand customers across dozens of branded apps were affected.

First wrong fix was a frontend patch that hid the duplicate row. The bill was still real. Apple did not unbill the cards just because we hid the row. The creator escalated to legal.

Real fix: a Sidekiq job behind the webhook, plus a database-level unique constraint on (apple_original_transaction_id, notification_uuid). The endpoint returns 200 OK within five seconds by enqueueing the work. Apple’s retries become idempotent at the queue level.

The reason I’m telling it here: when you carve out an engine like Billing, the public API of that engine is where idempotency lives. Not in the host. Not in the controller that called you. Inside the engine, at the Billing::Public.charge boundary. The contract owns the safety.

Takeaways

  • Engines are the cheapest team boundary in Rails. Use them before you reach for microservices.
  • The public surface is one file. Everything else is private. Enforce it with RuboCop.
  • Engine migrations still hit the shared writer. The three-step Rails-on-Aurora dance applies inside engines too.
  • Idempotency lives at the engine’s public API, with a database-level unique constraint, not at the controller.
  • Move the engine out to a private gem when the contract is stable. Versioning is the feature.
  • The day you actually need to go over the network, the migration is a transport swap, not a refactor.

Thanks for reading. If you’ve got thoughts, send them my way.

© 2026 Akin Gundogdu. All Rights Reserved.