Building a Stateful AI Chatbot on AWS Lambda
March 3, 2026

AWS Lambda is event-driven, scales to zero, and costs nothing when idle. For chatbot workloads that are bursty and unpredictable, it’s a natural fit. Most bot architectures are webhook-based anyway. A user sends a message on Slack or Telegram or your own UI, it hits API Gateway, Lambda runs your handler, and a response comes back. Simple.
But Lambda is stateless. Every invocation starts completely fresh. There’s no shared memory between requests, no built-in way to know what happened two messages ago. If you’re building anything beyond a single-turn Q&A bot, you need to solve persistence.
What happens without it
Think about the flow when a user sends a message:
- Lambda spins up
- Your handler receives the message
- You need to call your LLM, but you only have this one message. No prior context.
- You generate a response based on that single message alone
- Lambda shuts down
Next message from the same user? Brand new invocation. No conversation history. No memory of preferences or prior questions. No awareness of whether you’re in the middle of a multi-step flow or starting from scratch. The user might reference something from three messages ago and your bot has absolutely no idea what they’re talking about.
For a demo, this is fine. For anything real, you need a persistence layer.
The usual approaches
Most developers reach for what’s already in their AWS account.
DynamoDB is the obvious first choice. It’s serverless, fast, and right there. You can store messages in it. But conversations aren’t simple key-value lookups. You need to model participants, message ordering, threading, metadata, and state that changes over time. You’ll design a schema, realize it doesn’t work for your access patterns, and redesign it. Then you’ll need pagination for long conversations, selective retrieval so you’re not blowing your LLM’s context window, and eventually some kind of summarization strategy.
RDS or Aurora gives you relational modeling, which is arguably a better fit for conversation data. But Lambda and connection pools don’t get along. Lambda’s ephemeral execution model means connections get opened and abandoned. Cold starts get worse. You end up adding RDS Proxy to manage it, which is another service to configure and pay for.
Redis is fast but volatile. It can work as a cache layer, but you still need durable storage behind it.
Any of these will get you basic message storage. The problem is that basic message storage is about 30% of what you need.
What you’re actually signing up for
Storing messages is the straightforward part. The real work starts after that.
Semantic search. At some point a user will say something like “what was that thing you suggested yesterday” and your bot needs to find the right message across a potentially long history. That’s not a database query. That’s vector search. So now you need an embedding model, a vector database like Pinecone or Qdrant, and a pipeline that vectorizes messages as they come in. That’s a separate system with its own deployment, scaling, and cost.
Memory management. Conversations get long. Context windows have hard limits. You need strategies for what to keep in context, what to summarize, and what to store for later retrieval. This is a meaningful engineering problem on its own.
State tracking. Beyond just messages, you often need to track where a user is in a flow, what preferences they’ve expressed, what actions have been taken. This state needs to persist across invocations and be available instantly on the next request.
Retention and storage. How long do you keep conversation data? Where does message content live? How do you handle deletion requests? These are operational questions that need answers before you go to production.
Multi-tenancy. If your bot serves more than one user or organization, every single query needs to be scoped correctly. Conversations must be fully isolated. This isn’t optional and it’s easy to get wrong.
Add it up and you’re looking at DynamoDB or Postgres for storage, a vector database for search, an embedding pipeline, custom retrieval logic, retention policies, and tenant isolation. You started building a bot and ended up building a conversation platform.
What you actually need
A persistence layer for a Lambda chatbot needs to handle:
- Message storage with ordering, roles, timestamps, and metadata
- Sub-second retrieval (Lambda doesn’t have time to wait)
- Conversation state that persists across invocations
- Semantic search across conversation history
- Memory management for long-running conversations
- Automatic scaling from development through production
- Storage and retention without manual intervention
- Tenant and user isolation
- Tool access, so your bot can query its own conversation history as part of its reasoning
- Portability, so conversations from one bot or platform can be accessed from another
That’s a significant engineering effort to build yourself, and an ongoing maintenance commitment.
DialogueDB
DialogueDB is a managed service built specifically for AI conversation data. It handles chat history, conversation state, memory, and vector search as a single service, so you don’t need to stitch together multiple tools to cover the requirements above.
npm install dialogue-db
Here’s what a Lambda handler looks like with the SDK:
import { DialogueDB } from 'dialogue-db';
const db = new DialogueDB({ apiKey: process.env.DIALOGUE_DB_API_KEY });
export const handler = async (event) => {
const { userId, message } = JSON.parse(event.body);
// Resumes existing conversation or starts a new one
const dialogue = await db.getOrCreateDialogue({ id: userId });
await dialogue.loadMessages();
await dialogue.saveMessage({ role: 'user', content: message });
const response = await callYourLLM(dialogue.messages);
await dialogue.saveMessage({ role: 'assistant', content: response });
return { statusCode: 200, body: JSON.stringify({ response }) };
};
The getOrCreateDialogue call is what makes this work across invocations. If a dialogue with that ID already exists, it loads it with the full conversation history. If it doesn’t exist yet, it creates one. Either way, the next Lambda invocation picks up exactly where the last one left off, which is the core problem this article is about.
Beyond message storage, conversation state is attached directly to each dialogue and persists across invocations. You can use it to track flow position, user preferences, or any accumulated context your bot needs. Messages are also vectorized automatically, so semantic search works without requiring a separate embedding pipeline or vector database. When it comes to context window management, you can retrieve messages selectively by relevance or load the most recent ones, giving you control over what the LLM sees rather than dumping the full history into the prompt.
For multi-tenant use cases, namespaces provide full data isolation:
const dialogue = await db.getOrCreateDialogue({
id: `${clientId}:${userId}`,
namespace: clientId,
});
Every query scoped to a namespace only returns data within that namespace. For full environment separation, each project gets its own API key and data boundary.
DialogueDB also provides an MCP server (GitHub) that lets your bot query its own conversation history as a tool during reasoning. Rather than writing retrieval logic for every situation where the bot might need historical context, the bot can search for and load past conversations on its own. This also means conversations stored by one service are accessible from another, so you’re not locked into a single integration point.
The service runs on AWS with regional data storage. Storage, retention, and scaling are all managed, and the same configuration works from your first test message through production traffic.
Where to spend your time
The interesting work in building a chatbot is the conversation design, the prompt engineering, the tool integrations, and the evaluation of how well it all performs. That’s what determines whether the bot is useful, and it’s where your time should go.
DialogueDB has a free tier. Sign up, grab an API key, and you can have persistent conversations running in your Lambda function this afternoon.
Sign up free | Documentation | SDK on npm | MCP server on npm | MCP server on GitHub