Full-Text Search in Rails

When PostgreSQL full-text search is enough, when to reach for Searchkick on Elasticsearch, and where Meilisearch fits. Reindexing cost, ranking, facets, and typo tolerance from a Rails app I actually ran in production.

A Thursday afternoon at the creator-economy platform I was working at. A creator filed a ticket saying their community search was useless. Searching “onboarding checklist” returned nothing. “checklist onboarding” returned the right post in third place. Same words. Different order. Different results.

We were running LIKE '%query%' over a single column. That was the whole search. Nobody had needed better until that community got big enough that they did.

I’ve shipped three search stacks in Rails. Postgres full-text search, Elasticsearch via Searchkick on the combat-sports tournament platform I CTO’d in London, and Meilisearch on a smaller side product where I wanted typo tolerance without running a JVM. They’re not interchangeable. Pick wrong and you’ll either pay for a cluster you don’t need, or fight Postgres for ranking quality it was never going to give you.

Start with Postgres

Small corpus, normal write volume, no fuzzy matching, no aggressive ranking? Postgres full-text search is enough. You already have the database. Indexing happens inside the transaction that wrote the row. You can’t get out of sync.

# db/migrate/add_search_to_community_posts.rb
class AddSearchToCommunityPosts < ActiveRecord::Migration[7.1]
  def up
    add_column :community_posts, :search_vector, :tsvector

    execute <<~SQL
      CREATE INDEX index_community_posts_on_search_vector
      ON community_posts USING GIN (search_vector)
    SQL

    execute <<~SQL
      CREATE TRIGGER community_posts_search_vector_update
      BEFORE INSERT OR UPDATE ON community_posts
      FOR EACH ROW EXECUTE FUNCTION
      tsvector_update_trigger(
        search_vector, 'pg_catalog.english', title, body
      )
    SQL
  end

  def down
    execute "DROP TRIGGER community_posts_search_vector_update ON community_posts"
    remove_column :community_posts, :search_vector
  end
end

The trigger is the part people skip and then regret. Without it you maintain search_vector from Rails callbacks, which means anything that bypasses Active Record (a raw SQL update, a data migration, an upsert_all) silently corrupts your index. The trigger fires no matter who wrote the row. Use the trigger.

The query side is straightforward.

class CommunityPost < ApplicationRecord
  scope :search, ->(query) {
    return none if query.blank?

    sanitized = ActiveRecord::Base.sanitize_sql(["websearch_to_tsquery('english', ?)", query])

    select("community_posts.*, ts_rank(search_vector, #{sanitized}) AS rank")
      .where("search_vector @@ #{sanitized}")
      .order("rank DESC, created_at DESC")
      .limit(50)
  }
end

websearch_to_tsquery is the one to use. Accepts the same shape your users type into Google: quoted phrases, minus signs, OR. Doesn’t throw on bad input. The older to_tsquery does, which is how so many old Postgres search articles have at least one production bug in them.

This is enough for a lot of products. Ranking is mediocre but predictable. Typo tolerance is zero. Faceting is whatever you can express in WHERE.

When Postgres stops being enough

You’ll know. The ticket queue is the signal.

Three signs I’ve watched repeatedly:

Misspellings return zero results. pg_trgm helps a little, but combining trigram similarity with full-text ranking in one ORDER BY is the kind of SQL you write once and never want to touch again.
You need synonyms or per-locale stemming and you’re maintaining dictionaries by hand.
Search queries are starting to spike replica lag.

That last one bit me on a Tuesday morning at the same creator platform. Aurora reader replica lag alert fired around 10:14 a.m. Pacific. /communities/:id/posts p99 went from about 120 ms to over 8 seconds within four minutes. The on-call’s first move was to bump the reader instance class up two tiers. Lag didn’t move. The readers weren’t CPU-starved, they were starved of WAL. I pulled pg_stat_activity on the writer and found a long-running ANALYZE on community_posts holding write-side locks. Killed it. Lag drained in about six minutes. The runbook now leads with one sentence: “Before touching reader scaling, check pg_stat_activity on the writer.” I’m the reason that sentence is in there.

The reason I’m telling you that story in a search article: community_posts was the same table we were running text search against. Search queries were hitting a busy table on a multi-terabyte cluster doing real work. The GIN index helps, but you cannot pretend the search workload is free. At some point search wants to be a separate system.

Searchkick on Elasticsearch

Searchkick is a thin gem on top of Elasticsearch with sane Rails defaults. I ran it on the federation platform for the public rankings page. Hot read path, fed by a Kafka projection, athlete names needing fuzzy matching across multiple alphabets, and a UX that punished anything slower than 200 ms.

# Gemfile
gem "searchkick", "~> 5.3"
gem "elasticsearch", "~> 8.10"

class Athlete < ApplicationRecord
  searchkick(
    word_start: [:name, :nickname],
    suggest: [:name],
    language: "english",
    text_start: [:slug],
    callbacks: :async
  )

  def search_data
    {
      name: name,
      nickname: nickname,
      slug: slug,
      federation_id: federation_id,
      weight_class: weight_class,
      country_code: country_code,
      ranking_points: ranking_points,
      active: active?
    }
  end
end

A few things there aren’t optional in production. callbacks: :async pushes index updates through Sidekiq instead of blocking the request that saved the model. word_start enables prefix matching for autocomplete. search_data is the explicit projection of what goes into the index, and it’s the most important method on this class. Indexes drift when search_data and your model diverge. Treat changes to it like a schema migration.

The query side gets the ranking you actually wanted.

def self.public_search(query, weight_class: nil, country: nil, page: 1)
  conditions = { active: true }
  conditions[:weight_class] = weight_class if weight_class.present?
  conditions[:country_code] = country if country.present?

  Athlete.search(
    query.presence || "*",
    fields: ["name^3", "nickname^2", "slug"],
    match: :word_start,
    misspellings: { below: 5, edit_distance: 2 },
    where: conditions,
    aggs: [:federation_id, :weight_class, :country_code],
    order: [{ _score: :desc }, { ranking_points: :desc }],
    page: page,
    per_page: 25
  )
end

Field boosting (name^3) is the lever that fixes the “wrong order” problem I started with. Misspellings with edit distance 2 is the typo tolerance Postgres can’t give you cleanly. Aggregations are facets for free, returned in the same response. p99 on this query against a large athletes index sat under 80 ms.

The cost of Elasticsearch is real. Three master-eligible nodes plus data nodes. Reindexing a large index is a careful operation, not a rake task you run blindly. And the index can drift from the system of record in ways that are silent and embarrassing.

That last one hit us during a federation tournament on a Saturday night. The new champion’s ranking was supposed to update within minutes. Eight hours later the rankings page was still parading the old number one around. The athlete had a verified account, spotted it before we did, and tweeted a screenshot tagging the federation. The rankings-indexer had stopped projecting events overnight but was still consuming from Kafka, so nothing paged. Logs were quiet. A restart cleared the offset and started reprojecting from a 12-hour-stale checkpoint. The fix was a full reindex from PostgreSQL into a new ES index, then atomic-aliasing the read alias. About 25 minutes. Root cause: the indexer’s bulk-write client had silently entered “circuit open” after a transient cluster blip the night before, with no automated retry back to closed. Lesson that’s stuck on every search project I’ve touched since. Measure freshness, not throughput. “Is the consumer consuming” is not the same question as “is the index correct”.

# app/jobs/search_freshness_check_job.rb
class SearchFreshnessCheckJob < ApplicationJob
  queue_as :monitoring
  STALENESS_THRESHOLD = 5.minutes

  def perform
    pg_max = Athlete.maximum(:updated_at)
    es_max = Athlete.search("*", load: false, limit: 1, order: { updated_at: :desc })
                    .results.first&.updated_at

    return if es_max && (pg_max - es_max) < STALENESS_THRESHOLD

    Honeybadger.notify(
      "Search index drift",
      context: { pg_max: pg_max, es_max: es_max, drift_seconds: (pg_max - es_max&.to_time).to_i }
    )
  end
end

That job runs every minute. It’s the dumbest possible freshness check and it’s saved us twice since.

Where Meilisearch fits

Meilisearch is what I reach for when I want typo tolerance and instant-search latency without the operational weight of Elasticsearch. Single binary, forgiving indexes, default ranking is shockingly good out of the box. I ran it on a smaller side product, not at the scale of the creator platform. Trade-off is honest. You give up custom analyzers, complex aggregation pipelines, and the broader monitoring ecosystem. You get a search engine that’s faster to set up and easier to keep happy.

If your corpus fits on one machine and your team is two people, Meilisearch is a serious answer. If it’s sharded, multi-tenant, and you need fine-grained relevance tuning per index, you want Elasticsearch. I would not put Meilisearch in front of millions of customers without a careful eval first.

How I actually decide

Small corpus, no typo tolerance, moderate writes: Postgres FTS with tsvector and a trigger.
Need ranking quality, facets, fuzzy match, multi-locale, or search reads are competing with the rest of the database: Searchkick on Elasticsearch.
Small product, want typo tolerance for free, no JVM: Meilisearch.

Most common mistake I see is jumping to Elasticsearch on day one. You’ll spend a week on the cluster instead of a day on the feature, and your users won’t notice until your corpus is two orders of magnitude bigger.

Takeaways

Postgres FTS with a tsvector column, GIN index, and database trigger is the right default for most Rails apps.
Use websearch_to_tsquery, not to_tsquery. Same input users actually type.
When you outgrow Postgres, Searchkick on Elasticsearch is the boring, proven move.
Field boosting and an explicit search_data projection are how you stop “wrong-order results” tickets.
Index drift is silent. Run a freshness check job. Measure correctness, not throughput.
Meilisearch is a serious option at small scale. Not my default at the high end.

Thanks for reading. If you’ve got thoughts, send them my way.