Where Should Your Agent's Memory Live?
June 10, 2026

Every AI agent that persists state between sessions needs a place to put it. The options range from a JSON file on disk to a distributed graph database to a managed service like DialogueDB that handles the storage layer entirely, and the right answer depends on what your agent actually does, not on what sounds most sophisticated.
This post walks through five storage backends for agent memory, starting from the simplest and adding complexity only when the use case demands it. Each section covers what the backend handles well, where it breaks, and when to reach for the next tier. If you’ve already read about why retrieval alone doesn’t constitute memory, this picks up where that piece left off: you’ve decided your agent needs real memory, and now you need to choose where it lives.
What Does Agent Memory Actually Store?
Before evaluating backends, it helps to be concrete about what “agent memory” means at the data level. Agents that maintain state across sessions typically store some combination of these:
- Conversation history. The ordered sequence of messages (user, assistant, system, tool calls) within a session. This is the raw transcript.
- Extracted facts. Distilled observations from conversations: user preferences, decisions made, entities mentioned. Smaller and more structured than raw history.
- Session state. Arbitrary key-value data that tracks where a workflow stands: which step the user is on, what’s been approved, what’s pending.
- Episodic memories. Timestamped records of specific interactions or events that the agent should be able to recall later.
- Semantic indexes. Vector embeddings that let the agent search memory by meaning rather than by exact keys.
Different backends handle these data types differently. A system that’s excellent for ordered message logs may be poor at semantic search. One that handles flexible metadata well may struggle with relational queries across sessions.
Flat Files: The Simplest Starting Point
For a surprising number of use cases they’re the right starting point.
Flat-file memory stores agent state as JSON or Markdown files on the local filesystem. No database process, no network calls, no schema migrations. The agent reads a file, updates it, and writes it back.
This pattern shows up more often than you’d expect in production. Claude Code stores session memory as Markdown files in a project directory. Many coding assistants and local dev tools use JSON files for user preferences and session state. The MemGPT project initially stored agent memories as JSON documents.
What flat files handle well:
- Single-agent, single-user scenarios. A personal assistant, a dev tool, a CLI agent.
- Human-readable state. Markdown and JSON are inspectable without special tooling. You can debug memory by opening a file.
- Version control. Memory files can be committed to git, giving you a full audit trail of what the agent knew and when.
- Zero infrastructure. No database to provision, no connections to manage, no service to keep running.
{
"user_preferences": {
"language": "TypeScript",
"test_framework": "vitest",
"updated_at": "2026-06-09T14:30:00Z"
},
"session_state": {
"current_task": "refactor auth module",
"completed_steps": ["audit existing code", "write migration plan"],
"pending_steps": ["implement changes", "run tests"]
},
"facts": [
{
"content": "User prefers functional style over class-based",
"source_session": "2026-06-05",
"confidence": 0.9
}
]
}
Where flat files break:
- Concurrent writes. Two processes writing to the same file corrupts it. File locking helps, but it’s fragile across platforms.
- Search. Finding a specific fact means reading the entire file and scanning through it. Fine for hundreds of records, slow for thousands.
- Multi-user. You need one file per user, directory naming conventions, and cleanup logic. It works, but it starts resembling a database built from duct tape.
- Atomicity. A crash mid-write can leave you with a partial file. Journaling patterns (write-temp-then-rename) mitigate this, but you’re reimplementing database guarantees.
Choose flat files when: your agent runs as a single process, serves one user at a time, and stores fewer than a few thousand memory entries. Move on when you need concurrent access, search, or multiple users.
How to Move Agent Memory to SQLite
SQLite is the step up from flat files that avoids the jump to a full database server. It runs as a library inside your application process, stores everything in a single file, and provides SQL queries, transactions, and full-text search without any external dependencies.
SQLite for agent memory gives you a single-file embedded database with ACID transactions, SQL queries, and full-text search (FTS5), without requiring a separate database server.
For agent memory, SQLite covers a lot of ground. You get proper concurrent reads, atomic writes, and the ability to query across conversations. FTS5 gives you keyword search without an external search service. JSON functions let you store and query flexible metadata. The benchmarks speak for themselves: SQLite handles hundreds of thousands of operations per second for typical memory access patterns.
CREATE TABLE memories (
id INTEGER PRIMARY KEY,
agent_id TEXT NOT NULL,
category TEXT NOT NULL,
content TEXT NOT NULL,
metadata JSON,
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
CREATE VIRTUAL TABLE memories_fts USING fts5(content, category);
-- Find memories by meaning (keyword-level)
SELECT m.* FROM memories m
JOIN memories_fts f ON m.id = f.rowid
WHERE memories_fts MATCH 'deployment AND kubernetes'
ORDER BY rank;
The architecture works cleanly for single-process agents. A coding assistant on a developer’s machine, a personal knowledge manager, a research agent that runs locally. Several production agent frameworks use SQLite as their default local store, and it holds up under real workloads.
Where SQLite falls short for agent memory:
- Concurrent writes from multiple processes. SQLite uses file-level locking. A web server with multiple workers writing to the same database will hit contention. WAL mode helps, but it doesn’t eliminate the single-writer constraint.
- Network access. SQLite is an embedded database. If your agent runs in one process and your API server runs in another, they can’t share a SQLite file safely over a network filesystem. (Services like Turso and LiteFS replicate SQLite over the network, but at that point you’re running distributed infrastructure.)
- Semantic search. FTS5 does keyword matching, not semantic similarity. If your agent needs to find memories by meaning rather than by exact words, you’ll need to add an embedding pipeline and a vector search layer on top.
Choose SQLite when: your agent runs as a single process, needs structured queries and full-text search, and you want to avoid running a database server. Move on when you need multi-process access, network-accessible storage, or semantic search.
PostgreSQL as a Production Memory Store
For teams already running a web application with a Postgres database, adding agent memory tables is the path of least resistance. Postgres brings relational modeling, JSONB for flexible metadata, row-level security for multi-tenancy, and with pgvector you get semantic search in the same database.
PostgreSQL as an agent memory store provides relational data modeling, JSONB metadata, row-level security for multi-tenancy, and vector similarity search via pgvector, all in a single managed service.
This is where most teams building multi-user agent applications end up, and for good reason. The schema can model conversations, messages, memories, and state as separate tables with proper foreign keys. Queries can join across them. Permissions can enforce that one tenant’s agent memory is invisible to another. All of this runs on infrastructure your team likely already knows how to operate.
CREATE TABLE agent_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
namespace TEXT NOT NULL,
agent_id TEXT NOT NULL,
category TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_memories_namespace ON agent_memories(namespace, agent_id);
CREATE INDEX idx_memories_embedding ON agent_memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- Semantic search within a namespace
SELECT content, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM agent_memories
WHERE namespace = $2 AND agent_id = $3
AND (expires_at IS NULL OR expires_at > NOW())
ORDER BY embedding <=> $1::vector
LIMIT 10;
The tradeoffs are operational, not capability-based:
- Connection management. Serverless functions (Lambda, Cloudflare Workers) and Postgres don’t mix well without a connection pooler like PgBouncer or a managed proxy. Each cold start risks opening a new connection, and connection limits are a real constraint at scale.
- Embedding pipeline. pgvector stores and searches vectors, but you still need to generate the embeddings. That means an embedding model (OpenAI, Cohere, or a self-hosted option), a pipeline that embeds content before insertion, and logic to re-embed when content changes.
- Schema ownership. Every change to your memory model is a migration. Adding a new field, changing an index, restructuring how memories relate to conversations means writing and testing DDL, applying it in production, and handling backward compatibility. This is manageable but adds up over time.
- Scaling. A single Postgres instance handles a lot, but when you outgrow it the options (read replicas, partitioning, Citus) each bring their own complexity.
Individually, none of these are deal-breakers. But they compound: by the time you’ve wired up connection pooling, built an embedding pipeline, written migration tooling, and implemented tenant isolation, you’re maintaining a meaningful platform alongside your agent, one that needs its own monitoring, on-call rotation, and upgrade path. If agent memory infrastructure isn’t core to your product, that operational load is worth weighing against a managed alternative.
For a detailed comparison of Postgres against other storage options with code examples, see How to Store AI Chat History: 4 Approaches Compared.
Choose Postgres when: you’re building a multi-user or multi-tenant agent application, your team is comfortable with relational databases, and you’re willing to own the schema and embedding pipeline. Move on when semantic search becomes the primary access pattern and pgvector’s limitations (index rebuild time, recall accuracy at scale) become constraints.
When Do You Need a Dedicated Vector Database?
A dedicated vector database becomes the right choice when semantic retrieval is the dominant access pattern for your agent’s memory, and when the volume or performance requirements exceed what pgvector can provide in a general-purpose Postgres instance.
A vector database for agent memory is a purpose-built system (Pinecone, Weaviate, Qdrant, Chroma, Milvus) optimized for storing, indexing, and querying high-dimensional vector embeddings at scale.
Common options range from managed services like Pinecone to open-source engines like Qdrant and Weaviate.
What a dedicated vector database gives you over pgvector:
- Purpose-built indexing. HNSW and other ANN algorithms tuned for high-dimensional data, with better recall and latency at scale than pgvector’s ivfflat or HNSW implementations.
- Hybrid search. Most vector databases support combining semantic similarity with metadata filters in a single query. Find memories that are semantically similar to the query AND tagged with a specific user AND created after a certain date.
- Managed scaling. Serverless options like Pinecone scale index size and query throughput independently of your application database.
import { QdrantClient } from "@qdrant/js-client-rest";
const client = new QdrantClient({ url: "http://localhost:6333" });
await client.upsert("agent_memories", {
points: [
{
id: crypto.randomUUID(),
vector: embedding, // from your embedding model
payload: {
agent_id: "agent-1",
namespace: "project-alpha",
content: "User prefers Terraform over CloudFormation",
category: "preference",
created_at: "2026-06-09T14:30:00Z",
},
},
],
});
const results = await client.search("agent_memories", {
vector: queryEmbedding,
filter: {
must: [
{ key: "namespace", match: { value: "project-alpha" } },
{ key: "agent_id", match: { value: "agent-1" } },
],
},
limit: 5,
});
The costs of a dedicated vector database:
- Another service to run. Even managed options add a dependency, an API key, network latency, and a failure mode to your architecture.
- Embedding pipeline. Same as with pgvector. You still generate embeddings yourself.
- Not a primary database. Vector databases store and retrieve vectors. They don’t replace your conversation storage, session state, or relational data. Most architectures that include a vector database also include Postgres or DynamoDB for the structured data. You’re running two systems, each with its own embedding pipeline, monitoring, and failure modes to keep in sync.
- Consistency. When memory lives in both a primary database and a vector index, keeping them synchronized is ongoing application-level work, and a stale vector index means your agent silently retrieves the wrong context.
At this level of stack complexity, it’s worth asking whether assembling and maintaining these systems is the best use of your team’s engineering time.
Choose a dedicated vector database when: semantic search is the primary way your agent accesses memory, your corpus exceeds tens of millions of entries, or you need sub-100ms similarity search at scale. Skip it when pgvector inside Postgres covers your recall and latency requirements.
Graph Databases for Relationship-Rich Memory
Graph databases like Neo4j and Apache AGE (a Postgres extension) model memory as nodes and relationships rather than rows and columns. This maps naturally to certain agent memory patterns where the connections between facts matter as much as the facts themselves.
A graph database for agent memory stores knowledge as entities (nodes) and their relationships (edges), enabling traversal queries like “what does this user know about topic X through their interactions with agent Y.”
A user node connects to conversation nodes, which connect to topic nodes, which connect to other users who discussed the same topics. When an agent needs to answer “what context do we have about this customer’s deployment issues,” a graph traversal can pull together relevant memories from across sessions, agents, and users that a relational query would need multiple joins to assemble.
// Neo4j: find relevant context across sessions
MATCH (u:User {id: $userId})-[:PARTICIPATED_IN]->(c:Conversation)
-[:CONTAINS]->(m:Memory)-[:ABOUT]->(t:Topic {name: "deployment"})
WHERE m.created_at > datetime() - duration({days: 30})
RETURN m.content, c.label, m.created_at
ORDER BY m.created_at DESC
LIMIT 10
Where graph databases shine for agent memory:
- Multi-hop queries. “Find all memories related to topics this user has discussed across any conversation” is natural in a graph and awkward in SQL.
- Knowledge graph construction. Agents that build up a structured understanding of a domain over time, extracting entities and relationships from conversations, benefit from native graph storage.
- Provenance tracking. Tracing a memory back through the conversation and facts that produced it is a graph traversal problem.
Where they add friction:
- Operational complexity. Neo4j is a separate database with its own query language (Cypher), deployment model, and operational requirements. It’s the highest-complexity option on the spectrum.
- Overkill for most agents. If your agent’s memory access patterns are “get recent messages,” “search by topic,” and “look up user state,” a relational or document model handles these without the overhead of graph semantics.
- Limited ecosystem for AI. Vector search in graph databases is relatively new. Neo4j added vector indexes in 2023, but the tooling around embedding generation, hybrid search, and retrieval pipelines is less mature than in dedicated vector databases or Postgres with pgvector.
Choose a graph database when: your agent builds explicit knowledge graphs from conversations, your queries involve multi-hop relationship traversals, or provenance tracking across connected entities is a core requirement. Skip it when your access patterns are simpler.
Storage Backend Comparison
The summary below captures the tradeoffs across the five backends discussed above. Every backend can be made to handle every capability with enough engineering effort. The table reflects what each handles natively or with minimal additional work.
| Capability | Flat Files | SQLite | PostgreSQL | Vector DB | Graph DB |
|---|---|---|---|---|---|
| Setup complexity | None | Minimal | Moderate | Moderate | High |
| Concurrent writes | No | Single-writer | Yes | Yes | Yes |
| Structured queries | No | SQL | SQL + JSONB | Metadata filters | Cypher / GQL |
| Full-text search | No | FTS5 | Built-in | Varies | Varies |
| Semantic search | No | No | pgvector | Native | Limited |
| Multi-tenancy | Manual | Manual | Row-level security | Namespace filters | Labels / partitions |
| Serverless-friendly | Yes | Yes (embedded) | Needs pooler | Managed options | No |
| Human-readable | Yes | No | No | No | No |
| Horizontal scaling | N/A | No | With effort | Native (managed) | With effort |
Decision Flowchart
Start from the simplest option that covers your needs:
- Single user, single process, < 1K memories? Flat files.
- Single process, need queries or search, no network access needed? SQLite.
- Multi-user, multi-tenant, or need shared network access? PostgreSQL.
- Semantic search is the primary access pattern at scale? Add a vector database (or use pgvector if Postgres is already in the stack).
- Building explicit knowledge graphs with multi-hop traversals? Add a graph database.
- Want conversation storage, semantic search, and multi-tenancy without owning the infrastructure? A managed memory service like DialogueDB.
PostgreSQL with pgvector is a common starting point for production agent systems, and for good reason. But as the sections above illustrate, each capability brings its own operational surface area: embedding pipelines, connection pooling, schema migrations, tenant isolation. That work is ongoing and cumulative. Whether it makes sense depends on whether agent memory infrastructure is where your engineering time creates the most value.
The Case for a Managed Memory Layer
There’s a pattern in the sections above: each backend adds query flexibility and multi-tenancy, but also adds operational surface area. Connection pooling, embedding pipelines, schema migrations, vector index tuning, sync logic between systems. This is real engineering work, and it’s ongoing. Every schema change, every index rebuild, every embedding model upgrade is your team’s responsibility for as long as the agent is running.
A managed memory layer sits at the end of this spectrum. Instead of assembling the storage stack yourself, you get conversation persistence, semantic search, multi-tenancy, and lifecycle management through an API. The operational concerns threaded through every section above (embedding generation, connection management, schema evolution, tenant isolation) are handled by the service.
DialogueDB is purpose-built for this. The SDK provides structured storage for conversations, messages, memories, and state with semantic search built in. Namespaces handle multi-tenancy natively. Connections work with serverless runtimes without a pooler. Embeddings are generated and deduplicated automatically. You don’t design a conversation schema, build embedding pipelines, or tune vector indexes. You call an API that already models the access patterns most agents need, and put the engineering time you’d spend on infrastructure toward the agent itself.
The tradeoff is straightforward: you give up control over storage internals in exchange for velocity and operational simplicity. If your agent’s memory access patterns are genuinely unusual enough to require a custom data model, own the stack. But if you’ve read through the five options above and recognized the same patterns repeating (conversation history, extracted facts, semantic search, multi-tenancy), that’s exactly what a purpose-built service handles. You can be running in hours instead of building for months.
For a practical comparison of self-managed versus managed approaches with code examples, see How to Store AI Chat History.
Frequently Asked Questions
Skip the Infrastructure
DialogueDB handles conversation storage, semantic search, and memory so your team can focus on the agent. Free tier included.
Get Your API Key