Qdrant vs. LambdaDB: A 10M-Vector Benchmark

Upsert and query performance comparison between Qdrant Cloud and LambdaDB on the Cohere Wikipedia embeddings dataset.

This benchmark is not meant to prove that one system is universally faster in every setting. Qdrant Cloud and LambdaDB use different operating models: Qdrant Cloud is a provisioned cluster, while LambdaDB is a serverless, pay-as-you-go database. The fair comparison is therefore not raw infrastructure shape, but the user-visible result under the same workload: throughput, latency, recall, scaling behavior, and operational complexity.

Key Takeaways

Qdrant had the lowest median query latency: 33.54 ms p50, compared with 78.44 ms for LambdaDB and 53.44 ms for LambdaDB with 16 partitions.
Tail latency was close at p95: 157.11 ms for Qdrant, 170.34 ms for LambdaDB, and 150.45 ms for LambdaDB with 16 partitions. Recall@10 was also similar at 0.98, 0.97, and 0.97 respectively.
Qdrant scaled quickly at low concurrency, then flattened around the limit of this provisioned configuration.: 391.59 qps at 16 concurrency and 385.90 qps at 32.
LambdaDB continued scaling without database provisioning. With 16 partitions, it reached 488.85 qps at 32 concurrency, about 26.7% higher than Qdrant in this run, while also improving latency compared with non-partitioned LambdaDB.

The benchmark is open source and reproducible: lambdadb/lambdadb-bench.

Benchmark Methodology

The benchmark used the same dataset and client region for both systems, then compared throughput, latency, and Recall@10 under identical upsert and query workloads.

Area	Configuration
Environment	AWS us-east-1; client on r8g.2xlarge with 8 cores and 64 GB RAM
Qdrant Cloud	8 cores, 32 GB RAM, 240 GB balanced disk x 3 nodes for HA, replication factor 2, $1,673.73/month
LambdaDB	Standard Plan, $0 minimum, pay-as-you-go, cold start
Dataset	Cohere Wikipedia 2023-11 multilingual embeddings; 1024-dimensional vectors; English subset
Upsert workload	Ingest 1M vectors into a single collection with 2,000-document batches and 64 concurrent writers
Query workload	Search a preloaded 10M-vector collection at 1, 2, 4, 8, 16, and 32 concurrent clients for 30 seconds each
Metrics	Throughput, p50/p95/p99 latency, and Recall@10 against brute-force ground truth

A Note on Cost

We do not present monthly cost as a single headline comparison because the pricing models are different. Qdrant Cloud is priced around provisioned cluster capacity; in this run, the tested configuration was $1,673.73/month regardless of how fully the cluster is used.

LambdaDB is serverless and usage-based with no minimum charge: $0.33/GB-month for storage, $1.00/GB for writes, and $5.00/PB for reads. The actual monthly cost depends on storage, write volume, query volume, and how much data each query searches. For workloads that can use partitioning, the amount of data searched per query can be much smaller, which can materially change both latency and cost.

A fair cost comparison should therefore be scenario-based, using a fixed workload such as stored vector count, monthly query volume, monthly write volume, recall target, p95 latency target, and whether partition pruning applies.

Results

Upsert Throughput

LambdaDB ingested documents about 10.4% faster than Qdrant in this run.

Because LambdaDB uses a distributed serverless architecture, LambdaDB may scale further with additional client-side concurrency and write parallelism, depending on workload shape, batch size, and payload size.

Query Throughput

Qdrant was much faster at low concurrency, which is expected from a warm, provisioned cluster. As concurrency increased, Qdrant flattened around 16 concurrent clients, while LambdaDB continued scaling up to the highest tested concurrency.

At 32 concurrency, non-partitioned LambdaDB reached nearly the same throughput as Qdrant. Partitioned LambdaDB exceeded Qdrant by about 26.7%.

Query Latency and Recall

Qdrant had the best p50 latency. Partitioned LambdaDB had the best p95 latency in this run and narrowed the p50 gap substantially compared with non-partitioned LambdaDB.

Query latency profile (lower is better): 10M documents, overall latency during the run

Partitioning improved LambdaDB latency across the board:

Metric	Non-partitioned LambdaDB	LambdaDB with 16 partitions	Improvement
p50	78.44 ms	53.44 ms	31.9% lower
p95	170.34 ms	150.45 ms	11.7% lower
p99	330.65 ms	207.82 ms	37.1% lower
32-concurrency throughput	385.27 qps	488.85 qps	26.9% higher

What the Results Mean

Qdrant is strongest when the workload is latency-sensitive, the cluster is already provisioned, and the team is comfortable sizing and operating vector database infrastructure. In this benchmark, it delivered excellent median latency and high throughput at low to medium concurrency.

LambdaDB is strongest when the workload needs managed serverless scaling, lower operational overhead, and built-in production features such as multi-region deployment, continuous backups, point-in-time recovery, and high availability. It does not require users to choose or resize database nodes, and in this benchmark it scaled steadily as concurrency increased.

Partitioning is especially important for large knowledge bases. If documents have a natural routing key, such as tenant, customer, project, domain, or URL, LambdaDB can search a smaller slice of the collection for each query. That improves latency and throughput while reducing query work.

When to Use Qdrant

You need extremely low median query latency.
You have DevOps capacity to provision, tune, monitor, and scale the cluster.
Your workload is stable enough that provisioned capacity can be planned ahead of time.
You want direct control over cluster sizing and infrastructure behavior.

When to Use LambdaDB

You want a managed serverless vector database with no database provisioning.
You have bursty or growing workloads where capacity planning is hard.
You need multi-region deployments, continuous backups, point-in-time recovery, and high availability out of the box.
You have a large knowledge base that can benefit from partitioning.
You want costs to follow usage instead of committing to a fixed monthly cluster baseline.

Detailed Results