Agent Memory Systems
How agents maintain context, learn from past interactions, and build persistent knowledge across sessions using layered memory architectures.
Without memory, every agent interaction starts from scratch. The model has no knowledge of prior conversations, no record of user preferences, and no way to learn from past successes or failures. Memory systems address this by providing structured persistence at multiple timescales: the current conversation turn, the current session, across sessions, and across entire workflows. Well-designed memory architectures can significantly reduce token consumption compared to naive full-history approaches while preserving the information the agent actually needs.
The four canonical types — working, short-term, long-term, and episodic — are not mutually exclusive but complementary. Most production systems use at least two in combination. The choice of implementation for each layer involves meaningful trade-offs between latency, cost, retrieval fidelity, and engineering complexity.
Memory systems like Mem0 can achieve significant token reductions while preserving fidelity through intelligent summarization and retrieval. Instead of keeping entire conversation history, store and retrieve relevant facts.
Types of Agent Memory
┌─────────────────────────────────────────────────────────────┐
│ WORKING MEMORY │
│ Current conversation in context window │
│ Scope: Current turn │ Capacity: Model's context limit │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SHORT-TERM MEMORY │
│ Facts extracted from current session │
│ Scope: Current session │ Storage: In-memory │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LONG-TERM MEMORY │
│ Persistent facts and knowledge │
│ Scope: Cross-session │ Storage: Vector DB │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ EPISODIC MEMORY │
│ Specific past experiences and outcomes │
│ Scope: Cross-session │ Storage: Indexed experiences │
└─────────────────────────────────────────────────────────────┘ | Type | Scope | Implementation | Use Case |
|---|---|---|---|
| Working | Current conversation | Context window | Immediate task context |
| Short-term | Current session | In-memory store | Session-specific facts |
| Long-term | Cross-session | Vector DB + metadata | User preferences, knowledge |
| Episodic | Specific interactions | Indexed experiences | Learning from past tasks |
Memory System Implementation
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.memory import ConversationSummaryBufferMemory
from langchain_core.messages import HumanMessage, AIMessage
from datetime import datetime
class AgentMemorySystem:
def __init__(self, persist_directory: str = "./memory_store"):
self.llm = ChatOpenAI(model="gpt-4")
# Working memory with auto-summarization when limit exceeded
self.working_memory = ConversationSummaryBufferMemory(
llm=self.llm,
max_token_limit=2000,
return_messages=True
)
# Short-term: in-memory for current session
self.short_term: list[dict] = []
# Long-term: Chroma vector store
self.embeddings = OpenAIEmbeddings()
self.long_term = Chroma(
collection_name="agent_memory",
embedding_function=self.embeddings,
persist_directory=persist_directory
)
def add_to_working_memory(self, human_msg: str, ai_msg: str):
self.working_memory.save_context(
{"input": human_msg},
{"output": ai_msg}
)
def store_long_term(self, content: str, metadata: dict = None):
self.long_term.add_texts(
texts=[content],
metadatas=[metadata or {}],
ids=[f"mem_{datetime.now().timestamp()}"]
)
def recall(self, query: str, n_results: int = 5) -> list[str]:
results = []
long_term_docs = self.long_term.similarity_search(query, k=n_results)
results.extend([doc.page_content for doc in long_term_docs])
for mem in self.short_term:
if self._is_relevant(query, mem["content"]):
results.append(mem["content"])
working = self.working_memory.load_memory_variables({})
if working.get("history"):
results.append(f"Recent context: {working['history']}")
return results[:n_results]
Working Memory: Summarization
The context window has finite capacity. When conversations grow long, you need strategies to compress history while retaining the information that matters. The most effective approach keeps recent messages verbatim — preserving immediate context — while summarizing older exchanges into a compact narrative.
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryBufferMemory
class ConversationSummarizer:
def __init__(self, max_token_limit: int = 4000):
self.llm = ChatOpenAI(model="gpt-4")
# Automatically summarizes when buffer exceeds limit
self.memory = ConversationSummaryBufferMemory(
llm=self.llm,
max_token_limit=max_token_limit,
return_messages=True
)
def add_exchange(self, human_input: str, ai_output: str):
self.memory.save_context(
{"input": human_input},
{"output": ai_output}
)
def get_context(self) -> str:
return self.memory.load_memory_variables({})
# LangGraph with built-in persistence
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
agent = create_react_agent(
llm,
tools,
checkpointer=MemorySaver()
)
Keep recent messages verbatim (last 5–10) and summarize older ones. This preserves immediate context while retaining key facts from earlier in the conversation.
Episodic Memory: Learning from Experience
Episodic memory stores complete interaction trajectories — the full sequence of thoughts, actions, and observations — enabling agents to learn from past successes and failures. When a new task arrives, the agent can retrieve similar past episodes and use them as few-shot examples.
from dataclasses import dataclass, field
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document
from datetime import datetime
import json
@dataclass
class Episode:
task: str
trajectory: list[dict]
outcome: dict
timestamp: datetime = field(default_factory=datetime.now)
class EpisodicMemory:
def __init__(self, persist_path: str = "./episodic_memory"):
self.embeddings = OpenAIEmbeddings()
self.vectorstore = Chroma(
collection_name="episodes",
embedding_function=self.embeddings,
persist_directory=persist_path
)
def record_episode(self, task: str, trajectory: list[dict], outcome: dict) -> str:
episode_id = f"ep_{datetime.now().timestamp()}"
search_content = f"{task}\n{outcome.get('result', '')}"
doc = Document(
page_content=search_content,
metadata={
"task": task,
"trajectory": json.dumps(trajectory),
"outcome": json.dumps(outcome),
"success": outcome.get("success", False),
"timestamp": datetime.now().isoformat()
}
)
self.vectorstore.add_documents([doc], ids=[episode_id])
return episode_id
def retrieve_similar(self, current_task: str, k: int = 3) -> list[Episode]:
results = self.vectorstore.similarity_search(current_task, k=k * 2)
episodes = [
Episode(
task=doc.metadata["task"],
trajectory=json.loads(doc.metadata["trajectory"]),
outcome=json.loads(doc.metadata["outcome"]),
timestamp=datetime.fromisoformat(doc.metadata["timestamp"])
)
for doc in results
]
episodes.sort(key=lambda e: (not e.outcome.get("success"), -e.timestamp.timestamp()))
return episodes[:k]
Episodic memory enables dynamic few-shot learning. Instead of hardcoded examples, the agent retrieves relevant past experiences to guide current tasks — and these examples improve automatically as the agent accumulates more successful trajectories.
Production Memory: Mem0
Mem0 is a purpose-built framework for production memory systems. It handles the full pipeline — extracting relevant facts from conversations, storing them with appropriate metadata, resolving conflicts when facts change, and retrieving them efficiently. Most teams find that Mem0 or a similar abstraction reduces the engineering burden significantly compared to building memory infrastructure from scratch.
from mem0 import Memory
config = {
"llm": {"provider": "openai", "config": {"model": "gpt-4"}},
"embedder": {"provider": "openai", "config": {"model": "text-embedding-3-small"}},
"vector_store": {
"provider": "chroma",
"config": {"collection_name": "agent_memories", "path": "./mem0_data"}
}
}
memory = Memory.from_config(config)
# Add memories with user context
memory.add(
"User prefers dark mode and uses VS Code",
user_id="user_123",
metadata={"category": "preferences"}
)
# Search memories
results = memory.search("What IDE does the user prefer?", user_id="user_123")
for result in results:
print(f"Memory: {result['memory']}, Score: {result['score']}")
# Update when facts change
memory.update(
memory_id=results[0]["id"],
data="User prefers dark mode, uses VS Code with Vim keybindings"
)
Memory Design Patterns
| Pattern | Description | When to Use |
|---|---|---|
| Rolling Window | Keep last N messages only | Simple chatbots, low-stakes tasks |
| Summarize + Recent | Summarize old, keep recent verbatim | Most agent applications |
| Entity Memory | Track entities and their states | Complex workflows, state machines |
| Knowledge Graph | Store facts as relationships | Domain-specific agents, reasoning |
| Hierarchical | Multiple summary levels | Very long conversations (100+ turns) |
Common Pitfalls
Storing everything leads to retrieval degradation: as the memory store grows, similarity search surfaces increasingly irrelevant results alongside relevant ones. Use importance scoring to gate what enters long-term memory. A related problem is conflicting memories — when user preferences or facts change, stale memories can contradict current ones. Implement update and invalidation mechanisms rather than appending new facts indefinitely. Aggressive summarization trades token savings for information loss; always test recall of specific facts after compression to ensure critical details survive. Finally, vector search adds non-trivial latency; for real-time applications consider caching frequently accessed memories or prefetching based on predicted query patterns.