Build AI Agent Memory in 5 Minutes

Build AI Agent Memory in 5 Minutes with LambdaDB
TL;DR
Your AI agent forgets everything after each conversation. This tutorial adds persistent long-term memory using LambdaDB as the storage layer. You'll store conversations as vectors, retrieve relevant memories with hybrid search, and pay $0 when your agent is idle. Total setup: ~20 lines of Python.
The Problem: Agents With Amnesia
Every LLM conversation starts from zero. Your agent helped a user debug a CORS issue yesterday. Today, the user says "that fix you suggested broke something else." The agent has no idea what fix. No context. No continuity.
The standard workaround is stuffing the entire conversation history into the prompt. This works until it doesn't:
- Token limits: GPT-4o gives you 128K tokens. A few weeks of conversations will blow past that.
- Cost: Sending 50K tokens of history per request adds up fast.
- Relevance: Most of that history is noise. The agent needs the 3 relevant memories, not the last 200 messages.
What you need is a memory layer. Store everything, retrieve only what matters.
Why LambdaDB for Agent Memory
A memory layer for agents needs three things:
- Vector search to find semantically similar past interactions ("that CORS fix" matches even if the user doesn't use the exact words).
- Full-text search to catch exact terms, error codes, and names that vector search misses.
- Low idle cost because most agents sit idle most of the time.
LambdaDB handles all three. It runs on AWS Lambda + S3. There are no servers running when nobody is querying. That means $0 at idle.
Compare that to Pinecone's $50/month minimum per index. If you're running memory for 100 agents, that's $5,000/month in Pinecone costs for indexes that are mostly sitting there. With LambdaDB, idle agents cost nothing.
Setup
Install the SDK:
pip install lambdadb openai
Get your API key from lambdadb.ai. Free tier, no credit card.
Step 1: Initialize the Client and Create a Collection
from lambdadb import LambdaDB
import openai
from datetime import datetime
# Initialize clients
db = LambdaDB(
project_api_key="your-api-key",
base_url="your-base-url",
project_name="your-project-name",
)
oai = openai.OpenAI()
# Create a collection for agent memories
db.collections.create(
collection_name="agent-memory",
index_configs={
"vector": {"type": "vector", "dimensions": 1536, "similarity": "cosine"}, # OpenAI text-embedding-3-small
"text": {"type": "text", "analyzers": ["english"]}, # full-text search
"user_id": {"type": "keyword"}, # filter by user
"type": {"type": "keyword"}, # memory type tag
"timestamp": {"type": "keyword"}, # temporal filtering
},
)
# Get a handle to the collection
memory = db.collection("agent-memory")
That's the infrastructure. No YAML files. No Kubernetes. No 120 configuration parameters.
Step 2: Store Memories
Every time your agent completes an interaction, store it:
def store_memory(user_id: str, conversation: str, metadata: dict = None):
"""Store a conversation as a searchable memory."""
# Generate embedding
response = oai.embeddings.create(
input=conversation,
model="text-embedding-3-small"
)
vector = response.data[0].embedding
# Store in LambdaDB
doc = {
"id": f"{user_id}:{datetime.now().isoformat()}",
"vector": vector,
"text": conversation,
"user_id": user_id,
"timestamp": datetime.now().isoformat(),
}
if metadata:
doc.update(metadata)
memory.docs.upsert(docs=[doc])
Each memory gets:
- A vector for semantic retrieval ("find conversations about authentication" matches "we discussed login flows").
- A text field for full-text search ("find the exact error code
ERR_CORS_422"). - Metadata for filtering (user ID, timestamps, tags).
Step 3: Retrieve Relevant Memories
Before your agent responds, pull in the relevant context:
def recall_memories(user_id: str, query: str, top_k: int = 5) -> list[dict]:
"""Retrieve the most relevant memories for a query."""
# Generate query embedding
response = oai.embeddings.create(
input=query,
model="text-embedding-3-small"
)
query_vector = response.data[0].embedding
# Hybrid search: vector similarity + full-text + metadata filter
results = memory.query(
size=top_k,
query={
"rrf": [
{"queryString": {"query": query, "defaultField": "text"}},
{"knn": {"field": "vector", "queryVector": query_vector, "k": top_k}},
]
},
filter={"queryString": {"query": f"user_id:{user_id}"}},
consistent_read=True,
)
return [
{"text": hit.doc["text"], "score": hit.score, "timestamp": hit.doc.get("timestamp", "")}
for hit in results.docs
]
This is where hybrid search matters. Pure vector search would match "CORS error" to any networking conversation. By combining vector similarity (knn) with full-text matching (queryString) via RRF (Reciprocal Rank Fusion), you get the most relevant memories for this user about this topic.
Step 4: Wire It Into Your Agent
def agent_respond(user_id: str, user_message: str) -> str:
"""Generate a response with memory-augmented context."""
# Recall relevant memories
memories = recall_memories(user_id, user_message)
# Build the prompt
memory_context = "\n".join(
f"[{m['timestamp']}] {m['text']}" for m in memories
)
response = oai.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": f"""You are a helpful assistant with long-term memory.
Here are relevant memories from past conversations with this user:
{memory_context}
Use these memories to provide continuity. Reference past interactions
when relevant. If no memories are relevant, ignore them.""",
},
{"role": "user", "content": user_message},
],
)
assistant_reply = response.choices[0].message.content
# Store this interaction as a new memory
store_memory(
user_id=user_id,
conversation=f"User: {user_message}\nAssistant: {assistant_reply}",
metadata={"type": "conversation"},
)
return assistant_reply
Now your agent:
- Receives a message.
- Searches its memory for relevant past interactions.
- Includes those memories in the system prompt.
- Responds with full context.
- Stores the new interaction for future recall.
The user says "that fix you suggested broke something else" and the agent knows which fix.
Step 5: Run It
# First conversation
print(agent_respond("user-42", "How do I fix CORS errors in my FastAPI app?"))
# ... days later ...
# The agent remembers the previous conversation
print(agent_respond("user-42", "That CORS fix you suggested is now blocking my webhook endpoint"))
Five minutes, roughly 60 lines of code, and your agent has persistent memory.
What About Multi-Agent Systems?
If you're building with CrewAI, LangGraph, or OpenAI Agents SDK, the pattern extends naturally. Each agent gets its own namespace, or they share a common memory pool:
# Shared memory across agents
def store_agent_memory(agent_id: str, task: str, result: str):
response = oai.embeddings.create(
input=f"{task} {result}",
model="text-embedding-3-small"
)
db.collections.docs.upsert(
collection_name="agent-memory",
docs=[{
"id": f"agent:{agent_id}:{datetime.now().isoformat()}",
"vector": response.data[0].embedding,
"text": f"Task: {task}\nResult: {result}",
"agent_id": agent_id,
"timestamp": datetime.now().isoformat(),
"type": "agent_task",
}],
)
# Any agent can search shared memory
def search_team_memory(query: str, top_k: int = 10):
response = oai.embeddings.create(
input=query, model="text-embedding-3-small"
)
return db.collections.query(
collection_name="agent-memory",
query={
"rrf": [
{"knn": {"field": "vector", "queryVector": response.data[0].embedding, "k": top_k}},
{"queryString": {"defaultField": "text", "query": query}},
]
},
size=top_k,
)
A research agent stores its findings. A writing agent searches those findings before drafting. A review agent checks both. Shared memory, zero coordination overhead.
Cost: The Part Nobody Talks About
Let's do the math for a system with 100 agents:
| Pinecone | LambdaDB | |
|---|---|---|
| --- | --- | --- |
| Minimum cost | $50/mo per index | $0 |
| 100 agents (mostly idle) | $5,000/mo | ~$3/mo |
| 10K queries/day | ~$70/mo additional | ~$5/mo |
| Total | $5,070/mo | ~$8/mo |
Pinecone charges you $50/month per index whether you query it or not. That's the "serverless" tax of running servers behind the scenes.
LambdaDB runs on Lambda + S3. No query, no compute, no cost. You pay per query and per GB stored. Nothing else.
For a solo founder running AI agents as a side project, the difference between $5,000/month and $8/month is the difference between "viable" and "dead."
Hybrid Search: Why It Matters for Memory
Pure vector search has a known failure mode: it returns results that are semantically close but factually wrong. Ask for "error code 422" and vector search might return conversations about "HTTP status codes" or "API validation". Close in meaning, wrong in specifics.
LambdaDB runs vector and full-text search in a single query via RRF (Reciprocal Rank Fusion). The vector component (knn) finds semantically relevant memories. The full-text component (queryString) ensures exact matches on error codes, function names, and specific terms.
# This single query combines vector + full-text search
results = db.collections.query(
collection_name="agent-memory",
query={
"rrf": [
{"knn": {"field": "vector", "queryVector": query_vector, "k": 5}},
{"queryString": {"defaultField": "text", "query": "ERR_CORS_422"}},
]
},
size=5,
)
One query. Two search modes fused by RRF. No post-processing.
What's Next
This tutorial covers the core pattern. From here, you can:
- Add memory decay. Weight recent memories higher with timestamp-based scoring.
- Categorize memories. Tag memories as "facts," "preferences," "decisions" for structured retrieval.
- Use zero-copy branching. Fork an agent's memory for A/B testing different personas without duplicating data.
- Build per-user personalization. Each user gets their own filtered memory space within the same collection.
The full API reference covers these patterns: docs.lambdadb.ai.
Get Started
pip install lambdadb
Free tier. No credit card. No minimum spend. Your agent's memory starts at $0 and scales with usage.