Precise Hybrid Search with LambdaDB

In this tutorial, we will build a small support knowledge base search experience with LambdaDB.

You will:

Create a free LambdaDB Cloud project.
Install the Python SDK.
Create a collection with text, keyword, and managed embedding vector fields.
Load sample support articles.
Try increasingly precise queryString searches using Lucene syntax.
Combine lexical search with vector search using RRF hybrid ranking.

The goal is to show how LambdaDB can handle both exact product language and natural user questions in one retrieval layer, then express the final hybrid retrieval strategy as a single query object.

If you have used Elasticsearch, many of the lexical controls in this tutorial will feel familiar: phrases, proximity, fuzziness, boosts, minimum-should-match, and filters. The difference is the operating model: LambdaDB packages those Lucene-style controls with managed embeddings and hybrid ranking in a fully serverless database, so teams can get precise search without provisioning or tuning a search cluster.

Scenario

Imagine a user asks:

Can I get my money back if I cancel my annual plan?

Your support articles may not use those exact words. Some may say “refund policy”, “annual subscription cancellation”, “billing credit”, or “money-back guarantee”.

Pure keyword search is precise, but it can miss paraphrases. Pure vector search catches semantic meaning, but it may not prioritize exact policy wording. With LambdaDB, we can use both.

Prerequisites

You need:

a LambdaDB Cloud account
Python 3.9+
the LambdaDB Python SDK

If you do not have an account yet, open LambdaDB Cloud, sign up, and create a project. Accounts without a payment method start on the Free plan.

When you create a project, copy:

your project API key
your region-specific base URL
your project name

Set them as environment variables:

export LAMBDADB_PROJECT_API_KEY="YOUR_API_KEY"
export LAMBDADB_BASE_URL="YOUR_BASE_URL"
export LAMBDADB_PROJECT_NAME="YOUR_PROJECT_NAME"

Install the SDK:

pip install lambdadb

Step 1: Create a Collection

Create a file called support_search_tutorial.py.

We will use:

title: text search over article titles
content: text search over article bodies
section: exact keyword filtering
url: exact keyword filtering
content_embedding: managed embedding vector generated from content

import os
import time

from lambdadb import LambdaDB


API_KEY = os.environ["LAMBDADB_PROJECT_API_KEY"]
BASE_URL = os.environ["LAMBDADB_BASE_URL"]
PROJECT_NAME = os.environ["LAMBDADB_PROJECT_NAME"]
COLLECTION_NAME = "support-search-demo"


def wait_until_active(client, collection_name, timeout_s=120):
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        meta = client.collection(collection_name).get()
        status = str(meta.collection_status)
        if "ACTIVE" in status:
            return
        print(f"Collection status: {status}. Waiting...")
        time.sleep(3)
    raise TimeoutError(f"Collection {collection_name} did not become ACTIVE in time")


with LambdaDB(
    project_api_key=API_KEY,
    base_url=BASE_URL,
    project_name=PROJECT_NAME,
) as client:
    try:
        client.collections.create(
            collection_name=COLLECTION_NAME,
            index_configs={
                "title": {
                    "type": "text",
                    "analyzers": ["english"],
                },
                "content": {
                    "type": "text",
                    "analyzers": ["english"],
                },
                "section": {"type": "keyword"},
                "url": {"type": "keyword"},
                "content_embedding": {
                    "type": "vector",
                    "managedEmbedding": True,
                    "embedding": {
                        "provider": "openai",
                        "model": "text-embedding-3-small",
                        "sourceField": "content",
                    },
                },
            },
        )
        print(f"Created collection: {COLLECTION_NAME}")
    except Exception as exc:
        print(f"Create skipped or failed: {exc}")

    wait_until_active(client, COLLECTION_NAME)

Run it once:

python support_search_tutorial.py

At this point, you have a collection that supports both Lucene-style lexical search and semantic vector search.

Step 2: Load Sample Support Articles

Append this sample data to the same script, inside the with LambdaDB(...) as client: block after wait_until_active(...). The complete script below is copy-paste ready; if you follow the incremental steps, keep the snippets from Steps 2 and 4-12 indented inside the with block.

coll = client.collection(COLLECTION_NAME)

docs = [
    {
        "id": "refund-policy",
        "title": "Refund policy for annual plans",
        "content": (
            "Customers on an annual subscription may request a refund within "
            "30 days of purchase. Refund eligibility depends on account usage "
            "and cancellation timing."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/refund-policy",
    },
    {
        "id": "cancel-annual-plan",
        "title": "Cancel an annual subscription",
        "content": (
            "You can cancel an annual plan from billing settings. Cancellation "
            "stops renewal at the end of the billing period. Contact support "
            "for refund review."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/cancel-annual-plan",
    },
    {
        "id": "billing-credit",
        "title": "Billing credits after plan changes",
        "content": (
            "When a customer downgrades or changes a paid plan, unused time may "
            "be converted into account credit depending on the subscription terms."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/credits",
    },
    {
        "id": "money-back-guarantee",
        "title": "Money-back guarantee",
        "content": (
            "New customers can ask for their money back during the trial period. "
            "This policy is separate from annual contract cancellation."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/money-back-guarantee",
    },
    {
        "id": "change-password",
        "title": "Change your password",
        "content": (
            "Users can update their password from account settings. If you forgot "
            "your password, use the reset link on the sign-in page."
        ),
        "section": "account",
        "url": "https://docs.example.com/account/change-password",
    },
    {
        "id": "api-rate-limits",
        "title": "API rate limits",
        "content": (
            "API requests are rate limited by project. Higher limits are available "
            "on paid plans."
        ),
        "section": "developers",
        "url": "https://docs.example.com/developers/rate-limits",
    },
]

coll.docs.upsert(docs=docs)
print(f"Upserted {len(docs)} support articles")

Run the script again:

python support_search_tutorial.py

The regular upsert path stores the generated managed embeddings with the documents. LambdaDB is eventually consistent by default, so in the queries below we use consistent_read=True to make the freshly upserted sample documents visible immediately.

Step 3: Add a Helper to Print Results

Add this helper near the top of the file:

def print_results(label, res):
    print(f"\n=== {label} ===")
    for item in res.docs:
        doc = item.doc
        print(
            f"{doc.get('id'):<24} "
            f"score={item.score:.4f} "
            f"section={doc.get('section'):<10} "
            f"title={doc.get('title')}"
        )

Now we can run several searches and compare the results.

Step 4: Start with a Basic Lexical Query

Add this query after the upsert call:

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "refund cancellation subscription",
            "defaultField": "content",
        }
    },
    consistent_read=True,
)
print_results("Basic queryString search", res)

This looks for documents that contain terms such as refund, cancellation, or subscription in the content field.

This is a useful baseline, but it is still broad. Next, we will make it more precise.

Step 5: Filter by Exact Metadata

Because section is a keyword field, we can filter exact values such as billing.

res = coll.query(
    size=5,
    query={
        "bool": [
            {
                "queryString": {
                    "query": "refund cancellation subscription",
                    "defaultField": "content",
                }
            },
            {
                "queryString": {
                    "query": "billing",
                    "defaultField": "section",
                },
                "occur": "filter",
            },
        ]
    },
    consistent_read=True,
)
print_results("Lexical search filtered to billing", res)

Now unrelated account or developer documents are removed before ranking.

For literal values with special characters, such as URLs, use skipSyntax:

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "https://docs.example.com/billing/refund-policy",
            "defaultField": "url",
            "skipSyntax": True,
        }
    },
    consistent_read=True,
)
print_results("Exact URL lookup", res)

Step 6: Search for Exact Phrases

Some terms are more meaningful as a phrase. For example, refund policy is more specific than refund and policy separately.

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "\"refund policy\"",
            "defaultField": "content",
        }
    },
    consistent_read=True,
)
print_results("Phrase search", res)

Phrase search is helpful for policy names, product names, error messages, and compliance terms.

Step 7: Use Proximity Search When the Words Are Close, but Not Adjacent

In real articles, important terms may appear near each other without forming an exact phrase.

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "\"refund cancellation\"~5",
            "defaultField": "content",
        }
    },
    consistent_read=True,
)
print_results("Proximity search", res)

This searches for refund and cancellation within a limited distance of each other. It is more flexible than phrase search, but still more targeted than a broad OR query.

Step 8: Add Fuzzy Matching for Typos

User queries often contain typos. Fuzzy matching can recover useful results.

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "cancelllation~2 refund~1",
            "defaultField": "content",
        }
    },
    consistent_read=True,
)
print_results("Fuzzy search", res)

Here, cancelllation~2 allows up to two edits, while refund~1 stays tighter.

Step 9: Boost Strong Signals

A title match should often matter more than a body match. Use ^ to boost important clauses.

lexical_query = (
    "title:(refund OR cancellation)^2.5 "
    "OR content:(\"refund policy\"~4 OR subscription)"
)

res = coll.query(
    size=5,
    query={"queryString": {"query": lexical_query}},
    consistent_read=True,
)
print_results("Boosted lexical search", res)

This query says:

refund or cancellation in the title is very important.
refund policy near-match in the content is also useful.
subscription keeps recall broad enough for annual plan questions.

Step 10: Require Enough Optional Terms

If a query becomes too broad, use minimum should match to require at least N optional terms.

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "(refund cancellation subscription billing)@2",
            "defaultField": "content",
        }
    },
    consistent_read=True,
)
print_results("Minimum should match", res)

This requires at least two of the four terms to match.

Step 11: Use Interval Functions for Ordered Relationships

For advanced lexical control, interval functions let you express word order and position.

res = coll.query(
    size=5,
    query={
        "queryString": {
            "query": "fn:ordered(refund cancellation subscription)",
            "defaultField": "content",
        }
    },
    consistent_read=True,
)
print_results("Interval function search", res)

This looks for documents where refund, cancellation, and subscription appear in that order, even if other words appear between them.

Step 12: Combine Query String and Vector Search with RRF

Now we can answer the original natural language question:

Can I get my money back if I cancel my annual plan?

The lexical side captures exact support language:

refund
cancellation
subscription
boosted title matches
phrase and proximity matches

The vector side captures paraphrases:

“money back”
“cancel my annual plan”
“billing credit”
“refund review”

Use RRF to combine the two ranked lists:

user_question = "Can I get my money back if I cancel my annual plan?"

rrf_query = {
    "rrf": [
        {
            "queryString": {
                "query": (
                    "title:(refund OR cancellation)^2.5 "
                    "OR content:(\"refund policy\"~4 "
                    "OR (refund cancellation subscription billing)@2)"
                )
            }
        },
        {
            "knn": {
                "field": "content_embedding",
                "queryText": user_question,
                "k": 10,
            }
        },
    ]
}

res = coll.query(
    size=5,
    query=rrf_query,
    consistent_read=True,
)
print_results("RRF hybrid search", res)

This is where the retrieval layer becomes more than a keyword box. Query string syntax gives you exact control over product language and metadata, while vector search catches semantic variations in how users ask questions.

Because the lexical clauses, semantic knn, and fusion rule all live inside the same query, the application sends one retrieval request instead of coordinating multiple searches and merging results itself.

Full Script

Here is the complete script in one place.

import os
import time

from lambdadb import LambdaDB


API_KEY = os.environ["LAMBDADB_PROJECT_API_KEY"]
BASE_URL = os.environ["LAMBDADB_BASE_URL"]
PROJECT_NAME = os.environ["LAMBDADB_PROJECT_NAME"]
COLLECTION_NAME = "support-search-demo"


def wait_until_active(client, collection_name, timeout_s=120):
    deadline = time.time() + timeout_s
    while time.time() < deadline:
        meta = client.collection(collection_name).get()
        status = str(meta.collection_status)
        if "ACTIVE" in status:
            return
        print(f"Collection status: {status}. Waiting...")
        time.sleep(3)
    raise TimeoutError(f"Collection {collection_name} did not become ACTIVE in time")


def print_results(label, res):
    print(f"\n=== {label} ===")
    for item in res.docs:
        doc = item.doc
        print(
            f"{doc.get('id'):<24} "
            f"score={item.score:.4f} "
            f"section={doc.get('section'):<10} "
            f"title={doc.get('title')}"
        )


docs = [
    {
        "id": "refund-policy",
        "title": "Refund policy for annual plans",
        "content": (
            "Customers on an annual subscription may request a refund within "
            "30 days of purchase. Refund eligibility depends on account usage "
            "and cancellation timing."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/refund-policy",
    },
    {
        "id": "cancel-annual-plan",
        "title": "Cancel an annual subscription",
        "content": (
            "You can cancel an annual plan from billing settings. Cancellation "
            "stops renewal at the end of the billing period. Contact support "
            "for refund review."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/cancel-annual-plan",
    },
    {
        "id": "billing-credit",
        "title": "Billing credits after plan changes",
        "content": (
            "When a customer downgrades or changes a paid plan, unused time may "
            "be converted into account credit depending on the subscription terms."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/credits",
    },
    {
        "id": "money-back-guarantee",
        "title": "Money-back guarantee",
        "content": (
            "New customers can ask for their money back during the trial period. "
            "This policy is separate from annual contract cancellation."
        ),
        "section": "billing",
        "url": "https://docs.example.com/billing/money-back-guarantee",
    },
    {
        "id": "change-password",
        "title": "Change your password",
        "content": (
            "Users can update their password from account settings. If you forgot "
            "your password, use the reset link on the sign-in page."
        ),
        "section": "account",
        "url": "https://docs.example.com/account/change-password",
    },
    {
        "id": "api-rate-limits",
        "title": "API rate limits",
        "content": (
            "API requests are rate limited by project. Higher limits are available "
            "on paid plans."
        ),
        "section": "developers",
        "url": "https://docs.example.com/developers/rate-limits",
    },
]


with LambdaDB(
    project_api_key=API_KEY,
    base_url=BASE_URL,
    project_name=PROJECT_NAME,
) as client:
    try:
        client.collections.create(
            collection_name=COLLECTION_NAME,
            index_configs={
                "title": {"type": "text", "analyzers": ["english"]},
                "content": {"type": "text", "analyzers": ["english"]},
                "section": {"type": "keyword"},
                "url": {"type": "keyword"},
                "content_embedding": {
                    "type": "vector",
                    "managedEmbedding": True,
                    "embedding": {
                        "provider": "openai",
                        "model": "text-embedding-3-small",
                        "sourceField": "content",
                    },
                },
            },
        )
        print(f"Created collection: {COLLECTION_NAME}")
    except Exception as exc:
        print(f"Create skipped or failed: {exc}")

    wait_until_active(client, COLLECTION_NAME)

    coll = client.collection(COLLECTION_NAME)
    coll.docs.upsert(docs=docs)
    print(f"Upserted {len(docs)} support articles")

    searches = [
        (
            "Basic queryString search",
            {
                "queryString": {
                    "query": "refund cancellation subscription",
                    "defaultField": "content",
                }
            },
        ),
        (
            "Lexical search filtered to billing",
            {
                "bool": [
                    {
                        "queryString": {
                            "query": "refund cancellation subscription",
                            "defaultField": "content",
                        }
                    },
                    {
                        "queryString": {
                            "query": "billing",
                            "defaultField": "section",
                        },
                        "occur": "filter",
                    },
                ]
            },
        ),
        (
            "Exact URL lookup",
            {
                "queryString": {
                    "query": "https://docs.example.com/billing/refund-policy",
                    "defaultField": "url",
                    "skipSyntax": True,
                }
            },
        ),
        (
            "Phrase search",
            {
                "queryString": {
                    "query": "\"refund policy\"",
                    "defaultField": "content",
                }
            },
        ),
        (
            "Proximity search",
            {
                "queryString": {
                    "query": "\"refund cancellation\"~5",
                    "defaultField": "content",
                }
            },
        ),
        (
            "Fuzzy search",
            {
                "queryString": {
                    "query": "cancelllation~2 refund~1",
                    "defaultField": "content",
                }
            },
        ),
        (
            "Boosted lexical search",
            {
                "queryString": {
                    "query": (
                        "title:(refund OR cancellation)^2.5 "
                        "OR content:(\"refund policy\"~4 OR subscription)"
                    )
                }
            },
        ),
        (
            "Minimum should match",
            {
                "queryString": {
                    "query": "(refund cancellation subscription billing)@2",
                    "defaultField": "content",
                }
            },
        ),
        (
            "Interval function search",
            {
                "queryString": {
                    "query": "fn:ordered(refund cancellation subscription)",
                    "defaultField": "content",
                }
            },
        ),
        (
            "RRF hybrid search",
            {
                "rrf": [
                    {
                        "queryString": {
                            "query": (
                                "title:(refund OR cancellation)^2.5 "
                                "OR content:(\"refund policy\"~4 "
                                "OR (refund cancellation subscription billing)@2)"
                            )
                        }
                    },
                    {
                        "knn": {
                            "field": "content_embedding",
                            "queryText": "Can I get my money back if I cancel my annual plan?",
                            "k": 10,
                        }
                    },
                ]
            },
        ),
    ]

    for label, query in searches:
        res = coll.query(size=5, query=query, consistent_read=True)
        print_results(label, res)

Run it:

python support_search_tutorial.py

The exact scores may differ, but the RRF hybrid query should surface the billing and refund-related articles near the top while still benefiting from semantic matches like “money back” and “annual plan”.

Clean Up

When you are done experimenting, you can delete the collection:

with LambdaDB(
    project_api_key=API_KEY,
    base_url=BASE_URL,
    project_name=PROJECT_NAME,
) as client:
    client.collections.delete(collection_name=COLLECTION_NAME)

Why This Matters

Most AI and RAG systems need more than one kind of search.

You need lexical search when exact terms matter:

policy names
URLs
section tags
product terminology
error codes

You need vector search when users ask in natural language:

“Can I get my money back?”
“Do I lose access if I cancel?”
“What happens to unused time on my plan?”

LambdaDB lets you combine both in one query. Query string search gives you precise control over exact language and metadata. RRF hybrid search adds semantic recall without forcing you to manually tune score weights or stitch together multiple result lists in application code.

That combination is especially useful for support search, documentation search, agent memory, and RAG retrieval.