Mem0 vs Cognee vs Letta: Which AI Agent Memory Actually Sticks?

Q: "What is AI agent memory?"

" AI agent memory is a system that lets LLM-based agents retain and recall information across separate sessions. Without memory, each conversation starts from scratch. Memory systems extract facts from interactions, store them persistently, and inject relevant context back into the agent\u0026rsquo;s prompt when needed."

Q: "Which AI agent memory framework should I use in 2026?"

" Start with your data shape. If you\u0026rsquo;re storing user preferences from conversations, Mem0 gets you there fastest. If you need to ingest documents and codebases with relationship-aware retrieval, Cognee is the better fit. If your agent runs for days and needs to manage its own knowledge autonomously, look at Letta. And if you\u0026rsquo;re unsure, Mem0 is the easiest to add and remove, so try it first."

Q: "Can I use these memory systems with Claude Code or GPT?"

" Yes. All three provide MCP server implementations that work with Claude Code, and Python SDKs that integrate with OpenAI\u0026rsquo;s API. Mem0 has the broadest LLM support and works with any model that supports chat completions. Cognee and Letta both default to OpenAI but support Anthropic, local models via Ollama, and other providers through configuration."

TL;DR

Mem0 is the fastest way to bolt persistent memory onto an existing agent. Five lines of Python, user-preference recall that works out of the box. Cognee is a knowledge graph engine that shines when your memory problem is document-shaped: entities, relationships, and structured retrieval across large corpora. Letta (formerly MemGPT) treats the agent itself as a stateful service with self-editing memory, which is overkill for a chatbot but the only real option for agents that run autonomously for days. All three are open-source, all three are Python-first, and picking the wrong one wastes weeks.

The Agent Memory Problem

Every LLM call starts from zero. Ask Claude or GPT the same question twice, and it has no idea you already asked. For a single-turn chatbot, that’s fine. For an AI coding agent that’s supposed to learn your codebase over months, or a customer-support agent that should remember a user’s last five conversations, it’s a dealbreaker.

I hit this wall building a Claude Code workflow that needed to track decisions across sessions. The agent would make the same mistake on Tuesday that I’d corrected on Monday, because Monday’s context was gone. I needed memory: a layer that persists facts between runs and feeds them back when they’re relevant.

The agent memory space exploded in late 2025 and through 2026. There are now at least eight frameworks competing for this niche. But after testing the major contenders over several weeks, the field narrows to three architecturally distinct approaches: Mem0, Cognee, and Letta. Each solves memory differently, and understanding how they differ saves you from picking one and rebuilding two months later.

The Three Approaches

Before comparing features, you need to understand what each tool actually is.

Mem0 is a memory layer you attach to an existing agent. Your agent handles reasoning, Mem0 handles remembering. It extracts facts from conversations, deduplicates them, and retrieves relevant memories when the agent needs them. Think of it as a smart key-value store with semantic search.

Cognee is a knowledge graph engine. It ingests documents, PDFs, codebases, and conversation logs, then builds a traversable graph of entities and relationships. It’s closer to graph-RAG than to chat memory. When an agent queries Cognee, it traverses connected nodes rather than searching a flat list.

Letta is an agent runtime where memory is the architecture. Born from the MemGPT research paper at UC Berkeley, Letta gives each agent a self-editing “core memory” (small, always in context) plus a massive “archival memory” (searched on demand). The agent itself decides what to remember and what to forget. You deploy agents inside Letta.

59K+

Mem0 GitHub stars

26K+

Cognee GitHub stars

23K+

Letta GitHub stars

Feature Comparison

Feature	Mem0	Cognee	Letta
Architecture	Vector store + graph hybrid	Knowledge graph + vector + relational	Self-editing core + archival memory
Memory model	Extracted facts from conversations	Entity-relationship graph from any data	Agent-managed tiered memory
Setup complexity	~5 lines of Python	~10 lines of Python	Server deployment + client SDK
Self-hosted	Yes (Docker)	Yes (Docker/pip)	Yes (Docker)
Cloud option	Mem0 Platform (managed)	Cognee Cloud	Letta Cloud
License	Apache 2.0	Apache 2.0	Apache 2.0
MCP support	Yes	Yes	Yes
Multi-user	Yes (user_id scoping)	Yes (user-level permissions)	Yes (per-agent isolation)
Data sources	Conversations, text	PDFs, docs, code, conversations, images	Conversations, tool outputs
LLM required	Yes (for extraction)	Yes (for entity extraction)	Yes (for reasoning + memory management)
Benchmarks	91.6 LoCoMo, 93.4 LongMemEval	0.93 HotPotQA correctness	Not published
Best for	Chat memory, user preferences	Document-heavy knowledge bases	Long-running autonomous agents

Setup: How Quickly Can You Start?

Mem0: Five Lines to Working Memory

Mem0’s pitch is speed-to-value. Install it, point it at a conversation, and it extracts and stores memories automatically.

from mem0 import Memory

m = Memory()
m.add("I prefer dark mode and use VS Code with Vim keybindings.", user_id="alex")
m.add("My project uses FastAPI with PostgreSQL, deployed on Hetzner.", user_id="alex")

results = m.search("What IDE does Alex use?", user_id="alex")
print(results)
# Returns: relevant memories about VS Code, Vim keybindings, dark mode

That’s it for local mode. Mem0 runs an in-process LanceDB vector store by default. No external services required. For production, you swap in Qdrant, Pinecone, or Mem0’s managed platform, but the API stays identical.

The graph memory feature (released in v0.1.x) adds entity extraction on top of the vector store: Mem0 identifies “Alex → uses → VS Code” as a triple and stores it in a Neo4j-compatible graph. This means you can query by entity (“what do we know about Alex?”) rather than just by semantic similarity.

Cognee: Graph-First in Ten Lines

Cognee’s setup is slightly heavier because it runs a full ingestion pipeline: entity linking, relationship mapping, and graph construction on top of fact extraction.

import cognee

cognee.config.set_llm_config({"provider": "openai", "model": "gpt-4o-mini"})

await cognee.add("The FastAPI service connects to PostgreSQL via SQLAlchemy. "
                 "Redis handles session caching. Nginx sits in front as a reverse proxy.")
await cognee.cognify()  # Builds the knowledge graph

results = await cognee.search("What database does the service use?")
print(results)
# Returns: structured results with entity relationships

The cognify() call is where Cognee earns its keep. It runs a six-stage pipeline: classify the input, check permissions, chunk the text, extract entities and relationships with an LLM, generate summaries, then embed everything into both a vector store and a graph store. The output is a navigable graph where “PostgreSQL” is connected to “FastAPI” via “connects_to” and to “SQLAlchemy” via “accessed_through”.

Cognee 1.0 simplified this further with the remember/recall/forget/improve API. cognee.remember() runs the full pipeline under the hood. cognee.recall() auto-routes queries across 14 retrieval modes (vector similarity, graph traversal, chain-of-thought over multi-hop relationships). And cognee.forget() does targeted deletion from both the graph and vector store, which is useful for GDPR compliance.

Letta: Deploy an Agent With Built-In Memory

Letta is the heaviest setup. You’re deploying a server or using their cloud.

pip install letta-client

from letta_client import Letta

client = Letta(api_key="LETTA_API_KEY")

agent = client.agents.create(
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
    memory_blocks=[
        {"label": "persona", "value": "You are a senior code reviewer."},
        {"label": "human", "value": ""},
    ],
)

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Our project uses Go 1.26 with the standard library. No frameworks."}],
)

After that message, the agent’s human memory block has been updated by the agent itself. Letta agents call their own memory tools: core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search. The agent decides what’s worth remembering and edits its own context window on every turn.

Core memory stays in the system prompt on every call, always visible, like RAM. Archival memory lives in a vector database and gets searched on demand (cold storage). Recall memory is a timestamped log of every conversation turn. This three-tier approach comes directly from the MemGPT paper: treat the LLM’s context window like virtual memory in an operating system.

How They Differ Under the Hood

Mem0’s Single-Pass Extraction

Mem0 processes conversations through a single LLM call that extracts discrete facts. “Alex prefers dark mode” becomes a memory entry. A second LLM call decides whether this fact already exists (deduplication) or conflicts with a stored memory (update). The 2026 algorithm adds entity linking: Mem0 identifies named entities and builds a lightweight graph alongside the vector store.

Mem0 excels at user-preference recall: stable, slowly-changing facts about individuals. It’s less useful for complex relationships between entities or for temporal reasoning (“what changed between last week and this week”).

Cognee’s Six-Stage Pipeline

Cognee runs every piece of input through classification, chunking, entity extraction, relationship mapping, summarization, and embedding. The graph it builds is a full knowledge graph with typed edges and provenance tracking, where every entity connects to every related entity.

Ingestion is slower than Mem0’s single-pass approach. A large PDF takes seconds, not milliseconds. But the payoff is query power: ask “which services depend on PostgreSQL?” and get a traversal-based answer with full relationship context. For codebases, org charts, or compliance documentation, that graph structure is worth the ingestion cost.

Cognee 1.0 consolidated its backing store onto a single Postgres instance. The graph layer, vector store, and metadata all run on one database. That eliminates the operational headache of managing separate Neo4j and Qdrant instances that earlier versions required.

Letta’s Self-Editing Memory

Letta’s agents don’t have an external memory service. Memory is part of the agent’s tool set. When an agent encounters information worth remembering, it calls core_memory_append or archival_memory_insert as a tool action, the same way it’d call a web search tool or a code execution tool.

That makes Letta the right tool for agents that operate independently for extended periods. A coding agent that runs for days, managing its own knowledge about a large repo, can prune outdated information and update its understanding as the codebase changes. No human needs to orchestrate the memory operations.

The downside is cost: every memory operation burns an LLM call. Mem0 and Cognee separate storage from reasoning. Letta bundles them, which means more token spend and a higher floor of complexity.

When to Pick What

After using all three, here’s how I’d pick:

Mem0 is the right call when your agent talks to users and needs to remember their preferences, past interactions, and context across sessions. Customer support bots, personal coding assistants that remember your stack, tutoring systems that adapt to learning pace. It’s also the right choice if you have an existing agent and need memory added in under an hour. The MemPalace review I wrote earlier compared several tools on the LongMemEval benchmark, and Mem0 was a top performer with 91.6 on LoCoMo.

Go with Cognee when your memory problem is document-shaped. You have PDFs, codebases, internal wikis, or compliance docs, and your agent needs to reason over the relationships between entities in those documents. It’s also the better choice for multi-agent systems where multiple agents share a knowledge base, since the graph structure gives every agent a consistent view of the same data. If you’ve built something with LangChain or LlamaIndex and found that flat RAG retrieval misses relationship-heavy queries, Cognee is the upgrade path.

Letta makes sense when your agent needs full autonomy over its own memory. An agent that runs for days or weeks, makes its own decisions about what to remember and forget, and operates as a persistent service. The MemGPT research showed that self-editing memory beat fixed-context approaches by a wide margin on multi-session tasks. If your agent’s memory needs are closer to “managing a personal knowledge base” than “remembering user preferences,” Letta is the architecture you want.

Skip all three if your agent only runs single-turn interactions, or if the memory need is simple enough to solve with a system prompt and a JSON file. A “memory” system that never gets queried is dead weight.

What About Zep?

Zep deserves an honorable mention. It builds a temporal knowledge graph from conversations, extracting entities, tracking how facts change over time, and supporting “what did we know about X on date Y” queries. It’s the strongest option for customer-facing agents where chronological accuracy is more relevant than relationship depth.

I focused on Mem0, Cognee, and Letta because they represent three cleanly distinct architectural approaches. Zep overlaps with Mem0 on the “conversation memory” use case, with the temporal graph as its differentiator. It has no equivalent to Cognee’s document ingestion pipeline or Letta’s agent-as-runtime model. If your primary requirement is temporal tracking of conversation state, evaluate Zep alongside Mem0.

Pricing and Self-Hosting

All three are Apache 2.0 licensed, so self-hosting is free beyond your compute costs.

	Mem0	Cognee	Letta
Self-hosted cost	Free + LLM API calls	Free + LLM API calls	Free + LLM API calls + server resources
Cloud free tier	10,000 memories, 1K searches/mo	1M tokens, unlimited API calls	Free tier available
Cloud paid	$19/mo (Starter) to $249/mo (Pro)	$2.50/1M tokens (Standard)	Usage-based
LLM cost per operation	~1 LLM call per add/search	~2-3 LLM calls per cognify	~1-3 LLM calls per interaction
Storage backend	LanceDB (default), 20+ vector DBs	Kuzu + LanceDB (default), Postgres	Postgres + vector store

The LLM cost is the hidden variable. Every memory operation in all three systems requires at least one LLM call for extraction or reasoning. At scale, the token spend on memory management can approach the spend on the agent’s primary task. Mem0’s single-pass extraction is the cheapest per-operation. Cognee’s six-stage pipeline costs more per ingestion but amortizes well when the same graph is queried repeatedly. Letta’s per-interaction memory management is the most expensive because every conversation turn triggers memory tool calls.

Integration With Existing Agent Frameworks

All three provide Python SDKs and MCP server implementations. The integration patterns differ:

Mem0 works as a middleware layer. Wrap your existing LangChain or LlamaIndex agent with Mem0’s add() and search() calls. The agent framework handles reasoning; Mem0 handles memory. The MCP server exposes add_memory, search_memory, and get_all_memories as tools.

Cognee is a knowledge backend. Your agent queries Cognee the way it’d query a database. search() returns structured results with entity relationships and provenance. The agentic memory research from Alibaba explored a similar pattern: agents that learn to manage their own memory queries.

Letta replaces your agent framework entirely. You deploy agents inside Letta and interact with them through Letta’s API. It’s a bigger architectural commitment, but it gives you the most capable memory model of the three.

FAQ

What is AI agent memory?

AI agent memory is a system that lets LLM-based agents retain and recall information across separate sessions. Without memory, each conversation starts from scratch. Memory systems extract facts from interactions, store them persistently, and inject relevant context back into the agent’s prompt when needed.

How does Mem0 compare to Cognee?

Mem0 is a drop-in memory layer optimized for user-preference recall. It extracts facts from conversations and retrieves them by semantic similarity. Cognee builds a knowledge graph from any data source (documents, code, conversations) and supports relationship-based queries. Mem0 is faster to set up and cheaper per-operation. Cognee handles document-heavy, relationship-rich use cases that flat memory retrieval can’t serve.

Which AI agent memory framework should I use in 2026?

Start with your data shape. If you’re storing user preferences from conversations, Mem0 gets you there fastest. If you need to ingest documents and codebases with relationship-aware retrieval, Cognee is the better fit. If your agent runs for days and needs to manage its own knowledge autonomously, look at Letta. And if you’re unsure, Mem0 is the easiest to add and remove, so try it first.

How does Letta differ from Mem0?

Mem0 is a library you import into your existing agent. Letta is a server that hosts agents. In Mem0, your code decides when to store and retrieve memories. In Letta, the agent itself decides. It has tool-calling access to its own memory and edits its context autonomously. Mem0 is simpler; Letta is more capable for complex, multi-day agent workflows.

Can I use these memory systems with Claude Code or GPT?

Yes. All three provide MCP server implementations that work with Claude Code, and Python SDKs that integrate with OpenAI’s API. Mem0 has the broadest LLM support and works with any model that supports chat completions. Cognee and Letta both default to OpenAI but support Anthropic, local models via Ollama, and other providers through configuration.

Is self-hosting production-ready?

Mem0 is the simplest to self-host: it runs in-process with LanceDB by default and only needs an external vector store for production scale. Cognee 1.0 consolidated to a single Postgres instance, which cut the deployment footprint in half. Letta requires a dedicated server process, which adds operational complexity but gives you a REST API and built-in multi-agent support.

Sources

Mem0 GitHub repository — source code, documentation, and benchmarks
Cognee GitHub repository — source code and API documentation
Letta GitHub repository — source code, formerly MemGPT
MemGPT: Towards LLMs as Operating Systems (arXiv:2310.08560) — the research paper behind Letta’s architecture
Mem0 State of AI Agent Memory 2026 — benchmark methodology and results
Cognee 1.0 announcement — unified Postgres architecture and remember/recall/forget API
Best AI Agent Memory Systems in 2026 (Vectorize) — independent comparison with framework analysis
AI Agent Memory in 2026 (DEV Community) — practical guide with code examples

Bottom Line

These are three architecturally different approaches to the same problem. Mem0 is the right default for most developers because it’s the fastest path from no-memory to working-memory with the smallest surface area of new concepts. Cognee earns its complexity when your agents need to reason over structured knowledge. And Letta is the tool you reach for when the agent needs to be a persistent, self-managing service.

Start with Mem0. Graduate to Cognee when vector search stops answering your queries. Move to Letta when you need the agent to run without you.

TL;DR#

The Agent Memory Problem#

The Three Approaches#

Feature Comparison#

Setup: How Quickly Can You Start?#

Mem0: Five Lines to Working Memory#

Cognee: Graph-First in Ten Lines#

Letta: Deploy an Agent With Built-In Memory#

How They Differ Under the Hood#

Mem0’s Single-Pass Extraction#

Cognee’s Six-Stage Pipeline#

Letta’s Self-Editing Memory#

When to Pick What#

What About Zep?#

Pricing and Self-Hosting#

Integration With Existing Agent Frameworks#

FAQ#

What is AI agent memory?#

How does Mem0 compare to Cognee?#

Which AI agent memory framework should I use in 2026?#

How does Letta differ from Mem0?#

Can I use these memory systems with Claude Code or GPT?#

Is self-hosting production-ready?#

Sources#

Bottom Line#

Don't miss what's next

Related Articles

FastMCP in Python: Build a Real MCP Server (2026 Guide)

Pyrefly vs mypy vs ty: Which Python Type Checker Should You Use in 2026?

MarkItDown vs Docling vs Marker: PDF to Markdown for LLMs

Polars vs Pandas in 2026: Performance Benchmarks, Real Numbers, and When to Switch