Every AI application that holds a conversation needs to answer the same question: where do the messages go?

If you’re building with OpenAI, Anthropic, or any LLM, the model itself is stateless. It doesn’t remember what you said five minutes ago. Your application is responsible for storing conversation history, feeding it back into context windows, and managing what gets kept and what gets dropped.

This post compares four real approaches to solving this, with code for each one. Pick the one that fits your constraints.

What Chat History Storage Actually Requires

Before comparing tools, here’s what a production conversation store needs to handle:

Per-user isolation. User A’s conversations must never leak into User B’s context.
Ordered messages. Chronological ordering within a conversation, with role attribution (user/assistant/system).
Metadata. Timestamps, token counts, tags, conversation state.
Retrieval patterns. Fetch the last N messages, search by content, list conversations by user.
Cleanup. TTL or expiration for temporary conversations, storage management at scale.
Concurrency. Multiple conversations happening simultaneously without conflicts.

Simple at small scale. Hard at production scale.

Storing AI Chat History in PostgreSQL

PostgreSQL is the most common starting point, and for good reason. If your application already has a Postgres database, adding conversation storage feels natural. You create a conversations table, a messages table, wire up a few queries, and you’re done.

This approach works well when conversations are one feature among many in a broader application. Your team already knows SQL. You already have backups, monitoring, and connection management in place. The marginal cost of adding two more tables is low.

Where it gets interesting is at the edges. Once you need semantic search, you’re adding pgvector and managing an embeddings pipeline. Once you need multi-tenancy, you’re designing row-level security or schema-per-tenant patterns. Once you need context windowing for LLM calls, you’re writing the summarization and token-counting logic yourself. Postgres gives you a strong foundation, but everything above the storage layer is your responsibility.

If you’re using a managed Postgres service like Supabase or Neon, you skip the server management. But you still own the schema design, migrations, and all of the application-level conversation logic.

Here’s what a typical schema looks like:

CREATE TABLE conversations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id TEXT NOT NULL,
  title TEXT,
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system')),
  content TEXT NOT NULL,
  token_count INTEGER,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_messages_conversation ON messages(conversation_id, created_at);
CREATE INDEX idx_conversations_user ON conversations(user_id, updated_at DESC);

Retrieval looks like this:

async function getConversationHistory(conversationId: string, limit = 50) {
  const result = await pool.query(
    `SELECT role, content, created_at
     FROM messages
     WHERE conversation_id = $1
     ORDER BY created_at DESC
     LIMIT $2`,
    [conversationId, limit]
  );
  return result.rows.reverse(); // chronological order
}

Pros:

You already know SQL
Strong consistency and ACID transactions
Rich querying with JSONB metadata
pgvector extension enables semantic search

Cons:

You own the schema, migrations, indexes, and connection pooling
No built-in conversation semantics, so you’re modeling everything yourself
Scaling requires read replicas, partitioning, or sharding
Semantic search with pgvector means managing embeddings yourself

Best for: Teams that already run Postgres and have database expertise, or applications where conversations are a small feature alongside other relational data.

The takeaway: PostgreSQL is a solid foundation that’s unlikely to paint you into a corner. Just keep in mind that choosing Postgres means building and maintaining the conversation layer yourself. Schema design, migrations, context windowing, token counting, and search infrastructure are all on your team. If conversations are central to your product, that investment can add up over time.

Caching AI Chat History with Redis

Redis shows up in conversation architectures when latency is the primary concern. Sub-millisecond reads make it ideal for real-time chat interfaces where every millisecond of delay affects the user experience.

A common pattern is to use Redis as a hot cache in front of a durable store. Messages get written to both Redis and a primary database like Postgres or DynamoDB. Recent conversations are served from Redis. Older ones fall through to the persistent layer. This gives you the speed of in-memory storage without the risk of data loss.

You can also use Redis as your only store for conversations that are inherently temporary. Think customer support sessions that expire after 7 days, or chatbot interactions where the user doesn’t expect persistence between sessions. Redis has built-in TTL support that handles this cleanly.

The limitation is that Redis is memory-bound. Every conversation lives in RAM, and RAM is expensive. At scale, storing millions of conversations in Redis gets costly fast. There’s also no native way to search across conversations. If you need to find “all conversations about billing,” you need a separate search layer.

Here’s the basic pattern:

import Redis from "ioredis";
const redis = new Redis();

async function addMessage(
  conversationId: string,
  role: string,
  content: string
) {
  const message = JSON.stringify({
    role,
    content,
    timestamp: Date.now(),
  });

  await redis
    .multi()
    .rpush(`conv:${conversationId}:messages`, message)
    .expire(`conv:${conversationId}:messages`, 86400 * 7) // 7-day TTL
    .exec();
}

async function getHistory(conversationId: string, limit = 50) {
  const messages = await redis.lrange(
    `conv:${conversationId}:messages`,
    -limit,
    -1
  );
  return messages.map((m) => JSON.parse(m));
}

Pros:

Sub-millisecond reads
Built-in TTL for automatic expiration
Simple key-value model maps well to conversation threads

Cons:

Memory-bound, so storing millions of conversations gets expensive fast
No native search (you can’t query “find conversations about billing”)
Persistence is opt-in and can lose data on restart
No multi-tenancy primitives, so you build namespace isolation yourself

Best for: Ephemeral conversations, real-time features where you also persist to a durable store, or caching hot conversations in front of a primary database.

The takeaway: Redis is excellent at what it does, but it’s rarely the complete answer for conversation storage on its own. It works best as a speed layer paired with something more durable underneath. If your conversations are truly temporary, Redis alone can work. For anything persistent, plan on running two systems.

AI Chat History on DynamoDB (Self-Managed)

DynamoDB is the go-to choice for teams that are already invested in AWS and want true serverless scale. There are no connections to pool, no servers to patch, and no capacity planning to worry about. You define your table, set it to on-demand mode, and it scales from zero to millions of requests without any architecture changes.

For conversation storage, DynamoDB’s single-table design works well. Each conversation becomes a partition key, and messages are stored as items sorted by timestamp. Queries are fast and predictable because DynamoDB guarantees single-digit millisecond reads at any scale.

The tradeoff is that you own the entire data model. DynamoDB is not a relational database. There are no JOINs, no ad-hoc queries, and no full-text search. Every access pattern needs to be planned upfront and supported by your key schema or a Global Secondary Index. If you need semantic search, you’re adding OpenSearch as a separate service. If you need conversation summarization, you’re building that pipeline yourself.

This approach tends to work best for teams with prior DynamoDB experience. If you’re comfortable with partition keys, sort keys, and GSI design, you can build a highly performant conversation store. If single-table design is new to you, expect a learning curve.

Here’s what the core read and write operations look like:

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand, QueryCommand } from "@aws-sdk/lib-dynamodb";

const client = DynamoDBDocumentClient.from(new DynamoDBClient({}));

async function addMessage(conversationId: string, role: string, content: string) {
  await client.send(new PutCommand({
    TableName: "Messages",
    Item: {
      pk: `CONV#${conversationId}`,
      sk: `MSG#${Date.now()}#${crypto.randomUUID()}`,
      role,
      content,
      createdAt: new Date().toISOString(),
    },
  }));
}

async function getHistory(conversationId: string, limit = 50) {
  const result = await client.send(new QueryCommand({
    TableName: "Messages",
    KeyConditionExpression: "pk = :pk AND begins_with(sk, :prefix)",
    ExpressionAttributeValues: {
      ":pk": `CONV#${conversationId}`,
      ":prefix": "MSG#",
    },
    ScanIndexForward: false,
    Limit: limit,
  }));
  return (result.Items ?? []).reverse();
}

Pros:

True serverless with no connections to pool and no servers to manage
Scales to millions of conversations without architecture changes
Pay-per-request pricing that scales to zero
Built-in TTL for automatic cleanup

Cons:

Single-table design is powerful but has a learning curve
No full-text search without adding OpenSearch
No semantic or vector search without a separate service
You build and maintain the conversation model, access patterns, and GSIs yourself

Best for: AWS-native teams building high-scale applications who are comfortable with NoSQL data modeling.

The takeaway: DynamoDB gives you serverless scale and predictable performance, which is hard to beat for high-throughput workloads. The cost is upfront complexity. You need to design your access patterns carefully before writing any code, and adding features like search or summarization means bringing in additional AWS services. If your team already thinks in partition keys and GSIs, this can be a great fit.

AI Chat History with DialogueDB (Managed Service)

The three approaches above all share a common characteristic: you’re using a general-purpose database and building the conversation layer on top of it. That means designing schemas, writing retrieval logic, implementing search, handling multi-tenancy, and maintaining all of it over time.

DialogueDB takes a different approach. Instead of giving you a blank database and leaving you to figure out conversation storage, it gives you a purpose-built API where conversations, messages, memory, and search are already modeled and ready to use. There’s no database to set up, no schema to design, and no infrastructure to manage.

That changes the math on how you spend engineering time. With any general-purpose database, your team needs to design a conversation schema, write retrieval and pagination logic, implement multi-tenant isolation, build a search pipeline, and then maintain all of it as your product evolves. That work can easily take a week or more before you write a single line of AI application code. With DialogueDB, you install a package, pass in your API key, and start storing conversations in minutes.

Conversations are not just rows in a table. They’re ordered collections of messages with roles, timestamps, state, tags, memory, and threading. By treating conversations as a first-class data type, DialogueDB handles the patterns that every AI application eventually needs: semantic search across conversation history, parent-child threading, structured memory for cross-session context, and automatic deduplication. These are features you’d otherwise build and maintain yourself on top of a general-purpose database.

For teams building AI-powered products, the question comes down to where you want to invest your engineering effort. If conversation storage is a solved problem you’d rather not re-solve, a managed service lets you skip straight to the features your users actually care about.

Here’s what integration looks like:

import { DialogueDB } from "dialogue-db";

const db = new DialogueDB({ apiKey: "your-api-key" });

// Create a conversation
const dialogue = await db.createDialogue({
  label: "Support Chat",
  tags: ["support", "billing"],
});

// Add messages (stored with automatic embeddings for search)
await dialogue.saveMessage({ role: "user", content: "How do I upgrade my plan?" });
await dialogue.saveMessage({ role: "assistant", content: "You can upgrade from Settings > Billing." });

// Retrieve history
await dialogue.loadMessages();
const messages = dialogue.messages;

// Search across all conversations semantically
const { results } = await db.searchDialogues("billing questions");

// Store structured memory (project-scoped, useful for cross-session context)
await db.createMemory({
  label: "user_plan",
  value: "Currently on Starter, asked about upgrading",
});

What you get out of the box:

Multi-tenancy. Namespace isolation per project with scoped API keys.
Semantic search. Messages and memories are automatically embedded for vector search.
Threading. Parent-child conversation threads via threadOf.
Memory. Structured key-value memory per project or namespace with semantic retrieval. Memories are top-level entities, not scoped to individual dialogues. Use labels and tags to associate them with specific conversations.
State management. Arbitrary JSON state per conversation with unlimited updates.
TTL. Automatic conversation expiration.
Deduplication. SHA256 content dedup on message storage.

Pros:

Minutes to integrate, not days. Works in serverless environments like Vercel, Lambda, and Cloudflare Workers with no connection pooling needed.
Conversation semantics are first-class: dialogues, messages, memories, state, and threads.
Semantic search is included without managing embeddings infrastructure.
Scales from prototype to production on the same API.

Cons:

External dependency. Your conversations live in a third-party service. Evaluate data export options and understand the availability model before going to production.
Adds a network hop compared to a co-located database. This is fine for most AI chat workloads, but worth benchmarking if you need sub-millisecond storage latency.
Less flexibility than a raw database for non-conversation data.
Monthly limits vary by plan tier. See pricing for details.

Best for: Teams that want to move fast on AI features and are comfortable with a managed service dependency for conversation data. Especially valuable when you need search, memory, or multi-tenancy without building the infrastructure yourself.

The takeaway: The biggest advantage here is not having to maintain a database at all. No schemas to migrate, no indexes to tune, no connection pools to manage, no scaling decisions to make. You install a package, call an API, and conversation storage is handled. That frees your team up to focus entirely on the AI experience your users care about. The tradeoff is depending on an external service, so it’s worth evaluating that before going to production.

Decision Matrix

Each of the four approaches above has real strengths. The right choice depends less on which technology is “best” and more on your team’s existing expertise, your timeline, and how central conversations are to your product.

To make the comparison concrete, here’s how each approach stacks up across the requirements we outlined at the top. All four can achieve all of these capabilities. The difference is how much you build and maintain yourself versus what’s included out of the box.

Requirement	DialogueDB	PostgreSQL	Redis	DynamoDB
Setup time	Minutes	Hours to days	Minutes	Hours
Semantic search	Yes, included	Yes, with pgvector (you manage embeddings)	No	No (add OpenSearch)
Multi-tenancy	Yes, included	Yes (you build isolation)	Yes (you build isolation)	Yes (you build isolation)
Memory and state	Yes, included	Yes (you build it)	Yes (you build it)	Yes (you build it)
Serverless-friendly	Yes	No (connection pooling needed)	No	Yes
Cost at zero scale	$0 (free tier)	Server cost (~$15+/mo)	Server cost (~$15+/mo)	$0 (on-demand)
Cost at 50K conversations/mo	$99/mo (Pro plan)	~$50-200/mo (RDS)	~$100-300/mo (ElastiCache)	~$5-30/mo (on-demand)
Full control over data model	No (managed)	Yes	Yes	Yes
Threading	Yes, included	Yes (you build it)	Yes (you build it)	Yes (you build it)

The Real Question

After walking through all four approaches, one pattern stands out. Every general-purpose database approach requires your team to build and maintain a conversation layer from scratch. Schema design, retrieval logic, multi-tenancy, search, memory, and threading all become your responsibility, and that work compounds over time as your product evolves.

The real question is whether that’s where your engineering time is best spent. If conversations are central to your product, the infrastructure underneath them should accelerate your team, not slow it down. A purpose-built service like DialogueDB handles the storage, search, and memory so your team can stay focused on building the AI features your users actually interact with.

Your team’s time is valuable. Spend it on the parts of your product that make it unique.

Getting Started with DialogueDB

If you want to try the managed approach, the fastest way is to grab a free API key and install the SDK. The whole process takes about five minutes from signup to your first stored conversation.

DialogueDB’s free tier includes 1,000 dialogues, 5,000 messages, and 1,000 memories per month with 500 MB of storage. That’s enough to build and validate a real application. Paid plans start at $29/mo for production workloads and scale up from there. See full pricing.

Once you have your key, install the SDK and you’re ready to go:

npm install dialogue-db

import { DialogueDB } from "dialogue-db";

const db = new DialogueDB({ apiKey: "your-api-key" });

const dialogue = await db.createDialogue({ label: "My First Conversation" });
await dialogue.saveMessage({ role: "user", content: "Hello!" });
await dialogue.saveMessage({ role: "assistant", content: "Hi there! How can I help?" });

await dialogue.loadMessages();
console.log(dialogue.messages);

Full documentation at docs.dialoguedb.com. SDK on npm. Questions? Open an issue on GitHub.

How to Store AI Chat History: 4 Approaches Compared

What Chat History Storage Actually Requires

Storing AI Chat History in PostgreSQL

Caching AI Chat History with Redis

AI Chat History on DynamoDB (Self-Managed)

AI Chat History with DialogueDB (Managed Service)

Decision Matrix

The Real Question

Getting Started with DialogueDB