Rails 7 to Rails 8 Upgrade Strategy

How I ran a dual-boot Rails 7 to Rails 8 upgrade across a portfolio of legacy client apps, with Zeitwerk fixes, a real gem audit, and a feature-flag gated rollout.

The first Rails 8 upgrade I owned was a Monday-morning decision at the digital product agency I led engineering at. A portfolio of legacy client projects sitting on Rails 7.0 and 7.1, half of them mid-way through the DDD migration I was running. Leadership wanted “Rails 8, every project, end of quarter”. I wrote one number on the whiteboard: 1,400 deprecation warnings across the largest app. Then I rewrote the plan.

Big-bang Rails upgrades fail. I’ve watched them fail at three different companies. The thing that actually works is dual-boot, a gem audit you take seriously, Zeitwerk done properly, and a feature-flag gated rollout you can yank. Everything else is bookkeeping.

Dual-boot the Gemfile

The dual-boot trick is older than Rails 8 but I’m surprised how many teams skip it. You let the same branch run on both Rails versions, gated by an env var. CI runs the test suite twice. Production stays on 7.1 until you flip the switch.

# Gemfile
def next?
  ENV['RAILS_NEXT'] == '1'
end

if next?
  gem 'rails', '~> 8.0.0'
  gem 'sprockets-rails', '>= 3.5'
  gem 'propshaft', '~> 0.9'
else
  gem 'rails', '~> 7.1.0'
end

gem 'pg', '~> 1.5'
gem 'puma', '~> 6.4'
gem 'sidekiq', '~> 7.2'
gem 'redis', '~> 5.0'

group :development, :test do
  gem 'rspec-rails', '~> 6.1'
  gem 'strong_migrations', '~> 1.8'
end

Two lockfiles. Gemfile.lock and Gemfile.next.lock. Bundler picks the right one when BUNDLE_GEMFILE is set. The CI matrix runs both. PRs that pass on 7.1 but break on 8 don’t merge until they pass on both. We kept shipping features the whole quarter, no upgrade-only freeze.

Detail that catches people: bin/rails and bin/rake need a header that flips BUNDLE_GEMFILE when RAILS_NEXT=1. Otherwise you get a confusing “wrong Rails version” error in development.

Zeitwerk migration

Most apps on Rails 7 are already on Zeitwerk for autoloading. But “on Zeitwerk” and “Zeitwerk-clean” are not the same thing. Rails 8 is stricter about constant resolution, and the apps that limped along on the classic-style require statements suddenly break at boot.

The check that surfaces every issue:

# bin/check_zeitwerk
require_relative '../config/environment'

Rails.application.eager_load!
Zeitwerk::Loader.eager_load_all

puts 'Zeitwerk: clean'
rescue Zeitwerk::Error => e
  puts "Zeitwerk error: #{e.message}"
  exit 1

Run that in CI on every PR. The errors are usually one of three shapes: a class file named pdf_generator.rb defining PDFGenerator (Zeitwerk wants PdfGenerator unless you tell it otherwise), a constant defined in the wrong file, or a require at the top of a model that used to work and now throws a circular load.

For the acronym case, the fix is an inflection rule in config/initializers/inflections.rb:

ActiveSupport::Inflector.inflections(:en) do |inflect|
  inflect.acronym 'PDF'
  inflect.acronym 'API'
  inflect.acronym 'URL'
end

For the misplaced-constant case there’s no shortcut. Move the file. Run the check again. One of the apps in the portfolio had 47 of these. We did them over two afternoons, one engineer pairing with me.

Gem audit done honestly

The gem audit is the unglamorous part. It’s also where every “we’ll just bump Rails” plan dies. I keep a single CSV. Five columns: gem name, current version, Rails 8 compatible version (yes / no / requires fork), maintenance status, replacement candidate.

The ones that bit us were always the same shape: a low-traffic gem the original author had moved on from, no Rails 8 release, no clear replacement. The instinct is to fork and patch. The right move, four times out of five, is to vendor the small piece you actually use and delete the gem. We pulled three gems out that way. Total replacement code: about 80 lines across all three.

The other category is gems with a Rails 8 release that quietly changes behavior. activerecord-import and strong_migrations both had subtle changes in our version range. Don’t trust the changelog. Run your real test suite.

Feature-flag gated rollout

You don’t deploy a Rails upgrade. You deploy the new image to a small slice of traffic and watch.

We had a feature flag service in the Rails app that was already gating product launches. I bolted Rails-version targeting onto it. The flag wasn’t “use Rails 8”, because by the time a request hits a Rails app it’s already running on whichever Rails version booted the process. The flag was “send this request to the pod group running Rails 8”.

class RailsTargetingMiddleware
  def initialize(app)
    @app = app
  end

  def call(env)
    request = ActionDispatch::Request.new(env)
    bucket = traffic_bucket_for(request)

    if FeatureFlag.enabled?(:rails_8_pods, bucket: bucket)
      env['HTTP_X_RAILS_TARGET'] = '8'
    end

    @app.call(env)
  rescue => e
    Rails.logger.warn("rails_targeting: #{e.message}")
    @app.call(env)
  end

  private

  def traffic_bucket_for(request)
    return :internal if request.headers['X-Internal-Caller'] == 'true'
    return :canary if request.cookies['canary'] == '1'
    :default
  end
end

The middleware sets a header. The ingress (nginx in front of the cluster) reads the header and routes to one of two backend pools. Internal traffic and canary cookies hit the Rails 8 pool first. Then 1% of default traffic. Then 5%. Then 25%. We held at 25% for three days each time before opening the gate further.

What I watched on the Datadog dashboard: p99 request latency per endpoint, ActiveRecord query count per request, Sidekiq job runtime distribution, and the error rate split by Rails version. The split was the whole point. If something regressed on Rails 8 but not on 7.1, the version diff was your suspect list.

Deprecation burn-down

The 1,400 deprecation warnings on day one. We burned them down weekly, each cycle picking the top three deprecation messages by occurrence and fixing them across the codebase. By the end of the quarter the largest app was at 12, all acknowledged in a tracked issue, not silenced. The mistake I see teams make is silencing deprecations to make the noise go away. Don’t. The noise is the whole point.

Takeaways

Dual-boot, always. Same branch, two Gemfiles, CI runs both.
Zeitwerk-clean is a precondition, not a side quest. Run the eager-load check in CI.
Take the gem audit seriously. Vendor the 80 lines and delete the gem.
Roll out via traffic-routed pod pools, not by flipping a deploy.
Don’t silence deprecations. Burn them down weekly.

Thanks for reading. If you’ve got thoughts, send them my way.