Memory and State Management

The Agent Memory Problem

When you chat with ChatGPT, you might notice: it remembers what was said in this conversation, but not what you discussed last week. Close the window and reopen — everything starts from zero.

For simple Q&A, that's fine. But for agents, memory is a core problem:

A coding agent needs to remember which files it has read and what approaches it has tried
A customer service agent needs to remember what issues the user previously reported
A research agent working on a long-term project needs to recall earlier findings

An agent without memory is like an employee with amnesia — starting from scratch every time.

Three Types of Memory

We can categorize agent memory into three types, analogous to human memory:

Short-Term Memory: The Context Window

The most direct form of memory — all messages in the current conversation. Everything the model can see lives in this window.

[System prompt] + [User msg 1] + [Assistant reply 1] + [Tool call 1] + [Tool result 1] + ...

Characteristics:

Immediately available: The model sees it directly, no retrieval needed
Capacity-limited: Bounded by the context window (4K - 1M tokens)
Gone when the session ends: Doesn't persist across conversations

For single tasks, short-term memory is usually sufficient. A bug-fixing agent completing all work in one conversation has all intermediate results in context.

Working Memory: Task Intermediate State

When tasks are too complex for short-term memory alone, the agent needs a "scratchpad" for intermediate state.

Current plan:
- [✓] Read project structure
- [✓] Find relevant files
- [→] Modify src/api.ts
- [ ] Update tests
- [ ] Run tests to verify

Key findings:
- Project uses Express framework
- Database is PostgreSQL
- API version is v2

Working memory can be implemented as:

Dynamic sections in the system prompt: Updated with current state at each step
A dedicated scratchpad tool: The agent can actively write and read notes
A structured state object: Maintained at the code level as a task state

The key value of working memory is keeping the model from getting lost. In a 20-step reasoning chain, without explicit state tracking, the model may forget what it's doing or what it has already done.

Long-Term Memory: Cross-Session Persistence

When agents need to retain information across multiple conversations, they need long-term memory.

Common implementations:

Vector database storage: Encode important information as vectors and store in databases like Pinecone or Chroma. Retrieve via semantic search when needed later.

Session 1: Agent learns user prefers TypeScript, project uses monorepo structure
            → stored in vector database

Session 2: Agent retrieves previous information
            → writes code in TypeScript, follows monorepo structure

File system storage: Write memories directly to files. Simple but effective.

Claude Code's CLAUDE.md is a form of long-term memory —
it stores project conventions and tech stack info in a file,
read on every startup.

Database storage: Store user preferences, historical decisions, and project information in a structured format.

The hard part of long-term memory isn't storage — it's retrieval. How do you find the right memory at the right time? If you've stored a thousand pieces of information but can't retrieve the one you need, it's as good as not stored.

Context Window Management

Even with 100K-1M context windows, complex tasks can exhaust them. An agent making 50 tool calls, each returning hundreds of lines — context fills up fast.

Several common management strategies:

Truncation

The simplest approach — when the window is full, discard the earliest messages.

Keep: [System prompt] + ... + [most recent N turns]
Discard: [early conversation content]

The problem: discarded content may contain critical information. The model might redo work it has already done.

Summarization

Compress earlier conversation into summaries, preserving key information in fewer tokens:

Original (2000 tokens):
  [20 turns of project structure exploration and discussion]

Compressed (200 tokens):
  "Project is a Next.js app using TypeScript with PostgreSQL.
   Main APIs are in src/api/. Bug confirmed in the user route."

This is a trade-off between space and information fidelity — summaries may lose details.

Sliding Window + Keyframes

Keep the most recent N turns plus a few "keyframes" (important milestone information):

[System prompt]
[Keyframe: project structure and tech stack]
[Keyframe: confirmed bug cause]
[Most recent 10 turns]

This combines the advantages of truncation and summarization — detailed recent context plus preserved historical milestones.

How Real Products Handle This

Claude Code: Uses the file system as "external memory." CLAUDE.md files store project-level long-term memory. Automatic context compression summarizes earlier content as conversations grow long.

ChatGPT: Offers a Memory feature that explicitly stores user preferences and facts. Users can view and delete these memories.

Cursor: Achieves long-term memory through .cursorrules files and project indexing. Reads the index each time a project is opened to understand code structure.

The common pattern: move information that needs persistence outside the model's context window, and retrieve it when needed.

Memory System Design Trade-offs

There's no perfect memory solution — every approach involves trade-offs:

Aspect	Pure Context	Summarization	Vector Retrieval	File System
Implementation complexity	Low	Medium	High	Low
Information fidelity	High	Medium	Depends on retrieval	High
Cross-session persistence	No	No	Yes	Yes
Capacity	Window-limited	Window-limited	Nearly unlimited	Nearly unlimited
Latency	None	Yes (summarization)	Yes (retrieval)	Yes (file reads)

Practical advice: start with the simplest approach (pure context), and add complexity only when you hit problems. In most scenarios, a large enough context window plus simple summarization is sufficient.

Key Takeaways

Agents have three types of memory: short-term (context window), working memory (task intermediate state), and long-term (cross-session persistence).
Working memory prevents the model from getting lost in long tasks — implement via scratchpad, state objects, or similar mechanisms.
The hard part of long-term memory is retrieval, not storage — stored information you can't find is as good as not stored.
Three context management strategies: truncation, summarization, and sliding window + keyframes — each with their own trade-offs.
Start simple — pure context is sufficient for most scenarios. Consider more complex solutions only when you hit bottlenecks.