← All posts
Engineering

Vector vs Full-text vs Hybrid Search: When to Use Which

January 8, 2026

Vector vs Full-text vs Hybrid Search

TL;DR

  • Vector search wins on meaning. It loses on exact tokens. Error codes, SKUs, function names, version numbers, anything where the literal string matters.
  • Full-text search (BM25) wins on exact tokens and rare terms. It loses on synonyms and questions phrased differently from the answer.
  • Hybrid search (vectors plus BM25 plus filters in one query) is what most RAG pipelines need. Not because it's trendy. Because real user queries mix both modes. "Show me the auth bug from last Tuesday" needs auth bug matched literally and last Tuesday resolved by metadata.
  • Most teams default to vector-only because that's what their database makes easy. Pure vector search will happily return "Error 222" when you asked about "Error 221". Close enough in embedding space, completely wrong in production. That's the failure mode.
  • LambdaDB ships hybrid as the default. One query, vectors + BM25 + filters, $0 idle, up to 90% cheaper than Pinecone for typical workloads. Code at the bottom.

What is hybrid search?

Three one-sentence definitions, if that's all you came for:

Vector search ranks documents by semantic similarity between dense embeddings of the query and the document. BM25 ranks documents by weighted term overlap, where rarer matching terms count more. Hybrid runs both, fuses the scores (usually via Reciprocal Rank Fusion), and applies metadata filters in the same query. So you get semantic recall, exact-token precision, and scoping in one round trip.

For RAG, hybrid is the correct default. The rest of this post is the why, with concrete failure modes, a decision tree, and working code.


The three options, in one paragraph each

Vector search turns text into a 1024- or 1536-dimensional embedding and ranks by cosine similarity. "Cancel my subscription" and "How do I unsubscribe?" land near each other in vector space. Strong on meaning. Weak on literals. Error 222 and Error 223 are essentially the same point to a model that doesn't care about the digit.

Full-text search in practice means BM25. It ranks by term overlap weighted by inverse document frequency. Rare terms count more, common ones count less. Error 222 matches Error 222 exactly and ignores documents that just say "an error happened." Strong on literals. Weak on paraphrase. cancel subscription and unsubscribe share zero terms.

Hybrid search runs both, fuses the scores (RRF is the common default), and applies metadata filters in the same call. Each side covers the other's blind spot. The cost is one extra index and a slightly more complex query. The benefit is recall that doesn't silently collapse on the queries your users type. Practitioners on r/Rag put it bluntly: hybrid retrieval gives you better precision than pure vector.


When each one fails

I keep these three on a sticky note. Every time someone asks me "do I really need hybrid?" I read them back.

Vector-only fails on literals

Query:  "Why am I getting Error 221 from the SDK?"

A vector index ranks by semantic closeness. To the embedding model, Error 221, Error 222, and error response are all nearby points. Your top result is just as likely to be a doc about Error 401 as the one ticket that mentions 221. Your RAG generator then hallucinates a fix for the wrong error.

This is the "close enough" failure. It does not throw. It does not warn. It returns five plausible-looking results and one of them happens to be right. Cute in a demo. Catastrophic in production.

BM25-only fails on synonyms

Query:  "How do I cancel my subscription?"
Doc:    "To unsubscribe, navigate to Account > Billing > End plan."

Term overlap: zero. BM25 returns nothing relevant. Your support bot replies "I couldn't find anything about that." The doc is sitting right there.

Hybrid handles both

Vectors catch the cancel ↔ unsubscribe synonymy. BM25 catches the Error 221 literal. Filters scope by product_version, customer_tier, lang=en. One query, one ranking, one round trip.


When to use vector search alone

Use it alone when all three hold:

  1. Queries are paraphrased, not literal. "summarize the meeting", "what's the policy on refunds", "find me something about cold starts"
  2. Documents are prose, not structured strings. Articles, transcripts, customer messages
  3. There are no exact tokens that must match. No error codes, no IDs, no version strings, no proper nouns

Customer support over a knowledge base of help articles. Semantic recommendations. Summarization-style retrieval. Finding similar long documents. These fit cleanly.

Code search doesn't fit (function names are exact tokens). Neither does e-commerce (SKUs and brand names), log retrieval (timestamps and error codes), or legal search (statute numbers).

A useful gut check: if your users would ever paste a string from their screen into the search box, you need full-text in the mix.


When to use BM25 alone

Use it alone when queries are short and literal (product codes, error messages, IDs), the corpus is structured (logs, code, catalog entries), or you don't have an embedding budget. Sometimes the simplest answer is the right one.

BM25 is also the right baseline to measure your vector search against. If your vector setup can't beat BM25 on your own query log, something is wrong. The embedding model, the chunking, or the assumption that your queries are paraphrased to begin with. Simon Willison made this point on HN: LLMs paired with grep or full-text search are already great at fuzzy lookup. I've watched teams ship pure vector, get worse results than the keyword search they replaced, and not notice for weeks because everyone assumed "vectors are smarter."

There's also the language problem. If your corpus is Korean, Japanese, or Chinese, default whitespace BM25 silently tanks recall. 맛집을 and 맛집 look like different terms to a whitespace tokenizer. LambdaDB ships Korean and Japanese morphological analyzers as defaults; most other engines make you go install Nori or Mecab manually. Annoying.


Why hybrid is the right default for RAG

For the AI/RAG case (and especially for agent memory, where the corpus is a mix of chat history, tool docs, retrieved facts, and code) hybrid is what I recommend. Three reasons.

Real user queries mix modes. "Find the doc about webhook signature verification" has one term that needs paraphrase tolerance (doc about → an article) and one that's a literal (webhook signature verification). Vector-only loses the literal. BM25-only loses the paraphrase. The user doesn't know or care which.

RAG cost is dominated by the LLM, not retrieval. Your retrieval cost is usually under 5% of total. Spending an extra few milliseconds and a tiny bit of storage to run BM25 alongside vectors is cheap insurance against feeding garbage into a $5/million-token generator. Do the math.

Filters are the unsung hero. Most "vector search isn't working" stories are filter problems. tenant_id, language, created_after, is_published. These turn a 10M-document corpus into the right 50K to search. A real hybrid system applies them as part of the query, not as a post-filter that breaks your top-k.

The catch: most vector databases don't make hybrid as effortless as it should be.

Pinecone supports sparse-dense hybrid, but you supply the sparse vectors yourself (typically via a separate encoder) and you pay storage for a second index. Even Pinecone's own docs nudge users toward alternatives when hybrid keyword + vector search is a hard requirement. Qdrant and Weaviate support BM25 + dense natively, but the fusion + filter API is verbose enough that teams skip it on day one. pgvector technically lets you bolt on Postgres FTS, but you're managing two indexes and writing the fusion yourself. Elasticsearch has BM25 natively and recently added vectors, but you pay enterprise infra costs and there is no $0 idle.

This is the gap LambdaDB fills.


Decision tree

Pick the first one that fits.

  1. Will your users ever search for exact strings? (error codes, SKUs, names, code symbols) → You need full-text in the mix. Skip to hybrid.
  2. Are your queries always paraphrased prose, with no literal terms? (rare in practice. Be honest with yourself) → Vector alone is fine. Save the BM25 cost.
  3. Is the corpus highly structured? (logs, IDs, catalog rows) → BM25 alone may be enough. Try it before adding vectors.
  4. RAG over mixed content? (docs + code + tickets + chat) → Hybrid. Always. Don't overthink it.
  5. Need to scope by tenant/user/language/date? → You need metadata filters in the same query. Hybrid systems give you this. Two-system Frankenstein setups do not.

If you're building agent memory specifically (chat history, retrieved facts, tool descriptions, session transcripts) that's case 4 with extra metadata. Half the recent r/LocalLLaMA and r/LangChain posts on agent memory describe pipelines that fuse FTS, dense vectors, and recency decay through RRF. Hybrid + filters is the answer; the question is whether you want to wire it together yourself or use one query.


Why most RAG teams default to vector-only anyway

Three reasons, none of them good.

The tutorial economy. Every "build a RAG app in 10 minutes" tutorial is OpenAI embed → Pinecone upsert → cosine similarity → LLM. Hybrid adds two lines and a paragraph of explanation. Gets cut from blog posts. So the default mental model is vector-only.

The vendor's path of least resistance. If your database makes hybrid hard (separate sparse index, separate query, manual fusion) you don't add it on day one. By month three you've shipped to users and you've never measured what your retrieval is missing.

The "vectors are smarter" assumption. Embeddings feel like the AI part. BM25 feels old. So teams skip it without measuring. Then they spend months tuning chunk sizes and reranker models to fix problems BM25 would've handled for free.

The retrieval layer your agents need is hybrid + filters + cheap idle. Not pure vector with an enterprise minimum.


How LambdaDB does hybrid (with code)

One index. Vectors and full-text live together. One query, one ranking. Filters apply to both sides. Korean and Japanese analyzers on by default. $0 to start, $0 when idle.

Install:

npm install @functional-systems/lambdadb

Create a collection. Declare your vector and text fields up front, that's it. (Collection docs →)

import { LambdaDBClient } from "@functional-systems/lambdadb";

const client = new LambdaDBClient({
  projectApiKey: process.env.LAMBDADB_API_KEY!,
  baseUrl: process.env.LAMBDADB_BASE_URL!,
  projectName: process.env.LAMBDADB_PROJECT!,
});

await client.createCollection({
  collectionName: "support-docs",
  indexConfigs: {
    text:    { type: "text", analyzers: ["english"] },
    vector:  { type: "vector", dimensions: 1536, similarity: "cosine" },
    product: { type: "keyword" },
    lang:    { type: "keyword" },
    version: { type: "keyword" },
  },
});

Insert. The text field is auto-indexed for BM25. The vector is your embedding. Keyword fields are filterable. (Upsert docs →)

const collection = client.collection("support-docs");

await collection.docs.upsert({
  docs: [
    {
      id: "doc-001",
      text: "To unsubscribe, navigate to Account > Billing > End plan.",
      vector: await embed("To unsubscribe, navigate to Account > Billing..."),
      product: "billing",
      lang: "en",
      version: "4",
    },
    {
      id: "doc-002",
      text: "Error 221: invalid API key. Regenerate from the dashboard.",
      vector: await embed("Error 221: invalid API key. Regenerate from..."),
      product: "auth",
      lang: "en",
      version: "4",
    },
  ],
});

Hybrid query. Vectors + BM25 + filter, one call. RRF (Reciprocal Rank Fusion) fuses the two rankings server-side (Hybrid search docs →):

const results = await collection.query({
  size: 5,
  query: {
    rrf: [
      { queryString: { query: "cancel subscription", defaultField: "text" } },
      { knn: { field: "vector", queryVector: await embed("How do I cancel?"), k: 5 } },
    ],
  },
  filter: {
    queryString: { query: "lang:en AND version:4" },
  },
  consistentRead: true,
});

That's the whole API. No separate sparse index. No client-side fusion. No "now go install the FTS plugin" step. RRF is the default fuser. mm (min-max) and l2 (L2-norm) are also available if you want different rescoring.

Pure-vector for cases where it's correct? Drop the queryString side and use knn directly:

const results = await collection.query({
  size: 5,
  query: {
    knn: { field: "vector", queryVector: await embed("summarize Q3 customer feedback"), k: 5 },
  },
  filter: { queryString: { query: "product:feedback" } },
  consistentRead: true,
});

Pure-BM25? Drop knn:

const results = await collection.query({
  size: 5,
  query: {
    queryString: { query: "Error 221", defaultField: "text" },
  },
  filter: { queryString: { query: "product:auth" } },
  consistentRead: true,
});

Same API, same collection, three modes. Pick what fits the query.


What hybrid costs (vs Pinecone)

A note on cost, because this is the question every solo founder asks me. Pinecone's pricing change nuked some hobby projects. $50/mo minimum even when your index is idle. One thread documents a RAG support chatbot starting at $50/mo, climbing to $380, then $2,847/mo as usage grew. The author's point: vector database costs that scale linearly with usage don't fit teams that need predictable infrastructure budgets. And hybrid on Pinecone means a sparse index too. More storage, more bill.

LambdaDB runs entirely on serverless components. No servers. No idle cost. You pay per query and per GB stored. A side project with 10K vectors and 100 queries a day costs cents. A hobby agent that goes silent overnight costs $0 overnight.

Hybrid does not change that math. Same pricing whether you query vectors, BM25, or both.


FAQ

Q: What's the difference between vector search and full-text search? A: Vector search ranks by semantic similarity between dense embeddings. Strong on paraphrase, weak on exact tokens. BM25 ranks by weighted term overlap. Strong on exact tokens and rare terms, weak on synonyms.

Q: Is hybrid better than vector for RAG? A: For most RAG pipelines, yes. Real queries mix paraphrase and literal terms (error codes, function names, IDs), and vector-only silently fails on the literal half. Hybrid covers both modes plus metadata filters in one query.

Q: When should I use vector search alone? A: When all queries are paraphrased prose, all documents are long-form text, and no exact-string matching is ever required. Examples: semantic recommendation over articles, summarization retrieval. If users ever paste a string from their screen, you need full-text.

Q: When is BM25 enough on its own? A: When the corpus is structured (logs, catalog rows, code) and queries are short literal terms (SKUs, error codes, names). BM25 is also the correct baseline to measure any vector search against. If your vector setup can't beat BM25 on your real query log, something is wrong.

Q: Does Pinecone support hybrid? A: Yes, but you supply the sparse vectors yourself (via a separate encoder) and pay storage for a second index. LambdaDB exposes hybrid as a single search() call with one index. Vectors, BM25, and filters in one query, no separate sparse index to manage.

Q: What is Reciprocal Rank Fusion (RRF)? A: The standard score-fusion method for hybrid search. It ranks each result list independently, then combines via 1 / (k + rank) per list and sums. Robust to score-scale differences between vector cosine similarity and BM25. Which is why most production hybrid systems default to it.

Q: Is hybrid the right default for AI agent memory? A: Yes. Agent memory blends chat transcripts (paraphrase-heavy), tool docs and code (literal-heavy), and metadata like timestamps and tenant IDs. Vector-only loses tool names and error codes; BM25-only loses semantic recall over conversations. Production agent-memory pipelines on r/LocalLLaMA almost universally fuse dense + BM25 + recency through RRF.


TL;DR, again, for skimmers

  • Vector search: meaning, not literals.
  • BM25: literals, not meaning.
  • Hybrid: both, plus filters, in one query. The right default for RAG and for agent memory.
  • Pure vector is the most common day-one mistake because tutorials skip BM25.
  • LambdaDB ships hybrid + filters + multilingual analyzers as defaults. One query, $0 idle.

Try it

Free tier, no credit card, $0 idle.

If your RAG pipeline is vector-only today, run BM25 against the same query log for one afternoon. The result usually settles the question.