“Serverless” Databases Are Not Serverless

Jul 15, 2025

Starting and Growing a Career in Web Design

TL;DR

LambdaDB is a serverless-native vector database built entirely on serverless components like AWS Lambda and S3. It completely eliminates the need for infrastructure management by separating database logics from infrastructure. LambdaDB offers rich search (e.g., vector, full-text, filtering) and advanced features like point-in-time restore and zero-copy cloning out of the box. LambdaDB is 5-10x cheaper than alternatives such as Elastic and Pinecone, scales instantly zero from practically infinite, and is BYOC(Bring Your Own Cloud)-friendly by design.

Try it now: [Link to Playground Project]
Apply for early access: [Link to Early Access Application]

The Illusion of "Serverless” Databases

"Serverless" has become a core trend in cloud technology. It's a revolutionary paradigm that allows developers to focus solely on business logic without the burden of server management. The core value is simple: you pay only for what you use, and operational overhead is virtually zero.

Google Trends for serverless computing (2010 - present)

In line with this trend, numerous "serverless databases" have entered into the market. Existing leaders like Elastic, Confluent, and Pinecone, as well as new challengers like Neon, WarpStream, Upstash, and Turbopuffer, are all competing with serverless offerings.

But here’s the hidden truth: look under the hood, and you’ll find they aren't truly serverless. Most of these services are built on a cloud-native architecture, a brilliant but decade-old design that separates compute clusters from cloud storage. This model, pioneered by Snowflake [1], was a revolution for the serverful era. But it was never designed for modern serverless components like AWS Lambda.

As a result, providers create an illusion of serverless. Behind the scenes, they are still running clusters of servers, using complex software and human intervention to predict load, manage capacity, and ensure reliability.

Why This Mismatch Matters to You

This architectural mismatch isn't just a technical detail—it creates real problems for users (see [2, 3, 4, 5] for examples):

You're paying for idle servers: Their server clusters are always running for various purposes, from handling requests to performing background tasks. This is why most "serverless" plans come with monthly base fees and significant cost increases as usage grows—costs that don't truly reflect your actual usage.
Scaling is slow and capped**:** Spinning up new servers in a cluster takes minutes, not milliseconds. Providers often scale down conservatively, leaving resources idle even longer to avoid performance issues. This sluggishness means you can't handle sudden traffic spikes instantly, and you'll often face restrictive limits on data size and requests.
You're locked into their cloud**:** Because providers manage the infrastructure for you in their cloud, you are stuck with limited region availability. If you want to run the database in your own cloud account (BYOC), you’re most likely forced to have a complex and expensive enterprise contracts.

Monthly charges as database usage grows: ideal vs. reality.

The Real Solution: Serverless-Native Architecture

In the early days of cloud computing, most "cloud databases" were just legacy databases running on cloud VMs with local disks. It took a decade for a truly cloud-native architecture to emerge and unlock the full potential of the (serverful) cloud.

Now, nearly a decade after AWS Lambda’s launch, we are at another inflection point. The solution isn't to put a serverless sticker on old architecture. It's to build differently from the ground up. We call this new approach serverless-native.

A serverless-native architecture offloads all infrastructure management to the cloud itself. It doesn't use server clusters; it uses stateless functions and serverless services for everything. The database logic is completely separated from the infrastructure, allowing it to scale instantly and operate on a true pay-per-request model.

	Legacy (On-Prem)	Cloud-Native	Serverless-Native
Compute	Your own servers	Managed server clusters (EC2, etc.)	Stateless functions (Lambda)
Storage	Local disks / SAN	Object storage (S3)	Object storage (S3)
Management	Manual (DBAs)	Provider-managed servers	Cloud provider
Scaling	Manual / vertical	Cluster-based / slow	Per-request / instant

There’s a simple litmus test to identify if a database is truly serverless-native: Can you deploy it into your own cloud account without provisioning a Kubernetes cluster, a single VM, or any other compute servers? If the answer is no, it’s not serverless-native.

Of course, building a high-performance database this way presents a unique set of challenges. You must architect a consistent, distributed system on top of inherently ephemeral compute resources. You have to orchestrate concurrent reads and writes over high-latency object storage (S3) and work within the resource limits of individual functions. Additionally, you need to consider different pricing models for various serverless components to create a cost-effective solution. These are the hard problems we have focused on solving.

Introducing LambdaDB: The First Serverless-Native Database

LambdaDB is the result of solving these challenges. It is a vector database built from first principles to be serverless-native, allowing it to deliver on the original promise of the serverless model. It provides vector and full-text search with reranking, filtering, and sorting capabilities, as well as advanced features including continuous backup, point-in-time restore, and zero-copy data cloning—all without infrastructure management.

LambdaDB operates as a fully serverless service on AWS. User requests flow through a regional Gateway, which routes them to either control or data functions. A builder function periodically persists all buffered data to S3 storage.

The LambdaDB architecture: All the components inside LambdaDB are based on serverless cloud services

Gateway verifies the project API key in user requests for targeted projects. If the key is valid, it checks whether the project exceeds its configured rate limit. It then routes the request to either control or data functions, based on the type of work needed.

Control functions handle project/collection CRUD operations and data management requests such as point-in-time restore and zero-copy clone. They also performs maintenance tasks, such as adjusting the number of virtual shards for each collection to enable parallel query execution based on collection size, triggered by EventBridge Scheduler. They use DynamoDB for storing metadata and conducting distributed coordination among concurrent readers, writers, and background tasks.

Data functions perform actual data writes and reads. When a writer function receives a request to upsert, update, or delete records in a collection, it records the request details in a log along with a monotonically increasing sequence number. This request log is written into a serverless write buffer (EFS) before returning a response to the client, guaranteeing durability. Later, a builder function writes the buffered logs to S3 in batches and deletes them once the data is successfully committed. In S3, the data is organized as a tree structure where a root object contains intermediate objects pointing to leaf objects that store the actual data. So the root object basically acts like commit point that always contains a consistent collection view. This on-storage structure, combined with S3 versioning and lifecycle policies, enables us to implement multi-version concurrency control and advanced features like point-in-time restore efficiently and robustly without reinventing the wheel.

When a query is received, the router validates it and then invokes executor functions based on the number of virtual shards assigned to the collection by a control function. If the client specifies strong consistency, the router also runs the query against buffered logs. Each executor scans its assigned shard data and returns a list of top candidates to the router. The shard data is typically cached in the executor's memory and local storage. If data isn't cached, the executor fetches it from S3 in block units and caches it for future queries. The router then compiles all results, merges and deduplicates them with results from buffered logs if needed, selects the final top_k candidates with optional reranking, and returns them to the client.

Upsert latency distribution as concurrency increases using 960 dimensional vectors

To demonstrate LambdaDB's performance and scalability, we added one million 960-dimensional vectors to a collection with varying concurrency levels. With 10 upserts per second, the median latency is just 43 ms with a very short tail (133 ms at the 99th percentile). Write scalability is particularly impressive, as the latency remains similar even when traffic increases 100-fold.

Query latency distribution as concurrency increases using 960 dimensional 1 million vectors (4GB)

Similarly, query latency remains stable and scalable under varying loads—the 99th percentile ranges from 172 ms to 210 ms. You may occasionally experience a few seconds of latency due to function cold starts, but our production statistics show these occur in less than 0.01% of invocations. Cold starts are more common in development/test environments than in production. We're continuously optimizing query functions to improve both cold and warm latency through infrastructure-level enhancements [6, 7, 8]. For most production search applications, the current performance is already excellent, and the cost savings easily justify the rare latency spike from an occasional cold query.

How LambdaDB reduces infrastructure costs compared to the serverful database like Elasticsearch.

Query function invocations over a one-week period for a production collection

As expected, weekdays traffic is higher than that of weekends and day traffic is much higher than that of midnight traffic. Specifically, the peak traffic of a day is about 7x higher than that of the bottom traffic. Interestingly, traffic drops sharply at around 12 PM weekdays due to lunch time.

Compute pricing concern. AWS virginia 4GB memory ARM lambda 1 hour pricing: $0.19188 vs. m8g.medium $0.04488. 매초당 동시에 10개의 쿼리를 처리한다고 했을때 (안정적인 꾸준한 트래픽은 서버리스 컴퓨팅 활용 케이스에서 최악의 상황) 성능 저하 없이 동일하게 처리하려면 4코어는 필요할듯 그렇다면 aws 기준으로 별차이가 없다. 나아가서 실제 패턴에서는 스테이블하게 항상 동일한 초당 쿼리가 발생하지 않기 때문에 비용 개선 효과가 있다. 뤼튼 프로덕션 케이스에서 query-controller, request-controller concurrent execution maxium은 안정적인데 GB-s usage 는 일주내에서 베리에이션이 굉장히 크다. 콜드 스타트 비율 0.1%.

In summary, our unique architecture provides tangible benefits:

Dramatically lower costs: With no idle servers to pay and manage for, LambdaDB is 10x cheaper than leading cloud-native alternatives like Elastic and Pinecone. You only pay for the requests you make and the storage you use.
Truly instant & infinite scale: LambdaDB scales from zero to thousands of parallel functions in milliseconds. It handles unpredictable traffic and massive query loads without any pre-warming or configuration.
Simple to start, simple at scale: Build powerful AI applications with rich search capabilities—including vector, full-text, reranking, filtering, and sorting. As you grow, the architecture remains just as simple and cost-effective.
Advanced features, out of the box: Get enterprise-grade capabilities like point-in-time restore and zero-copy cloning without the enterprise-grade complexity or cost.

LambdaDB is currently serving millions of requests daily across billions of documents with zero management. This is just the beginning. We are continuously optimizing LambdaDB for improved latency and cost efficiency. In the long term, we plan to develop other data models—including relational data, stream, key-value, and graph—all with the same serverless-native architecture. If you find this intriguing, subscribe to this Substack where I'll regularly share updates on our progress in building the future of databases.

If you want to know more, here are the links: