Article

How Good Agent Memory Actually Works in Production

Good agent memory is not one vector store plus chat history. It is a governed system for deciding what gets scoped, promoted, compressed, pinned, and retrieved.

Most memory talk in agent systems is still too vague.

Teams say things like:

That language sounds fine until you try to build a system that has to survive real use.

Then the cracks show.

The problem is that memory often gets used to mean four different things at once:

Those are not the same layer.

And if you collapse them into one blurred bucket, you get exactly the kind of sloppy agent memory system that feels impressive in a demo and brittle in production.

That is the practical companion to Agent Memory Is Growing Up - Why Agents Are Starting to Remember How, Not Just What.

The opinion piece makes the directional argument.

This article is the system-design version.

My view is simple:

Good agent memory in production is not one store. It is a governed system for deciding what should stay visible, what should be retrievable, what should persist, what should be compressed, and who gets to update it.

If you want the operational version, use this:

A serious memory design needs five decisions: Scope, Class, Ownership, Promotion, and Exposure.

Why Most Memory Talk Is Still Too Loose

The site has already covered two foundational distinctions:

Those basics still matter.

But once you move from conceptual explanation to production design, the real problem changes.

It is no longer just:

Does the agent have memory?

It becomes:

What kind of memory is this, how long should it live, when should it be visible, and who should be allowed to change it?

That is a much better engineering question.

Because most failures in agent memory do not come from having no storage at all.

They come from bad decisions about:

In other words, memory quality is an architecture problem.

What Current Systems Are Actually Converging On

The current serious memory surface is more useful than the hype makes it sound.

It is also narrower.

The interesting thing is not that every framework has a memory feature.

The interesting thing is that the better ones are converging on a few practical ideas.

Letta: some memory should stay pinned

Letta treats memory blocks as editable pieces of in-context state. Some of that memory is always visible rather than retrieved on demand.

That is an important production lesson.

Some memory matters because it should shape behavior continuously:

If the agent has to retrieve those every time, the system is already designed incorrectly.

Mem0: memory needs layers and promotion rules

Mem0 separates conversation, session, user, and organizational memory. That is not just tidy product design. It is a real systems lesson.

Different memory belongs to different scopes and different lifetimes.

And just as important, Mem0’s public model explicitly includes promotion:

capture -> promote -> retrieve

That is the right shape.

Raw interaction is not durable memory yet.

Something has to decide what graduates.

LangGraph and Deep Agents: writes need strategy and permissions

LangGraph and LangChain long-term memory make two useful things explicit:

The Deep Agents memory docs go one step further by emphasizing scope, retrieval mode, and whether memory is writable or read-only.

That matters because a writable memory system without a write policy is just a very expensive way to accumulate bad state.

MIA: compression and planning belong inside the memory story

The MIA paper is interesting less because it proves production readiness and more because it makes the right problem visible.

Its Manager -> Planner -> Executor architecture treats memory as experience that should improve future planning, not just retrieval.

That is the most useful lesson to borrow.

If your memory layer never improves planning, never compresses history, and never influences future strategy, then it is probably doing less than you think.

The S.C.O.P.E. Model

If I were designing an agent memory system in production, this is the framework I would use.

Call it S.C.O.P.E.:

That is the minimum set of decisions a serious memory design needs to make.

1. Scope

The first question is:

Who does this memory belong to, and how long should it last?

That sounds obvious.

It is also where a lot of systems fail first.

Memory can belong to:

Those scopes should not be mixed casually.

A user preference is not the same thing as a task-local scratchpad.

A workspace policy is not the same thing as a personal memory.

An agent that cannot separate those scopes eventually pollutes itself with the wrong kind of continuity.

This is one reason layered systems like Mem0 are useful. They force the designer to admit that lifetime and audience are part of the architecture, not just metadata cleanup.

2. Class

The second question is:

What kind of memory is this?

The most useful production distinction is still:

Semantic memory is durable knowledge.

Examples:

Episodic memory is memory of what happened.

Examples:

Procedural memory is memory of how to do something.

Examples:

This is where memory starts touching Planning and Task Decomposition and ReAct and the Basic Reasoning Loop.

If you want agents to get better over time, procedural memory is the category that matters most.

It is also the category most systems handle worst.

That does not mean most production systems already have durable procedural memory solved.

They do not.

In most real systems, procedural memory is still partial, brittle, or heavily scaffolded by prompts, rules, and human-maintained workflows. That is exactly why it deserves separate treatment instead of being blurred into user preferences or generic retrieval.

3. Ownership

The third question is:

Who is allowed to write, approve, or override this memory?

This part is consistently under-discussed.

Not every memory store should be agent-writable.

Some memory should be:

This is where memory design becomes governance rather than storage.

If an agent can freely rewrite durable business facts, policy, or long-lived procedures based on noisy runs, the problem is not “bad retrieval.”

The problem is that the memory system has no authority model.

That is also why I would treat writable memory as adjacent to Tool Use: How Agents Take Action. The moment an agent can write memory, memory itself becomes a tool with consequences.

4. Promotion

The fourth question is:

What gets upgraded from raw trace into durable memory?

This may be the most important part of the whole system.

A lot of memory architectures quietly assume that saving more history is the same thing as learning.

It is not.

Raw traces are not durable memory yet.

Something has to decide what moves from:

into a state that should still influence later runs.

That promotion step may involve:

This is where MIA is directionally useful. It treats compression and memory evolution as core parts of the design rather than an afterthought.

That is exactly right.

Good memory is not what the system stores.

It is what the system decides is worth carrying forward.

5. Exposure

The fifth question is:

What should stay pinned in context, and what should be retrieved only when needed?

This is where systems often swing between two bad extremes.

Bad extreme one:

Bad extreme two:

The right answer depends on operational importance.

Pinned memory is best for things like:

Retrieved memory is better for things like:

Letta’s distinction between in-context blocks and retrievable out-of-context state is useful here because it makes visibility a first-class decision.

It should be.

The practical rule is simple:

A Concrete Example: A Coding Agent

Take a coding agent operating inside a real engineering workflow.

A bad memory system stores everything:

That is not memory architecture.

That is a junk drawer.

A better system would separate the layers.

Pinned Memory

Session Memory

Retrievable Memory

Durable Promoted Memory

That is a much more believable production design.

It preserves continuity without pretending every transient detail deserves immortality.

What Bad Memory Systems Usually Get Wrong

The failure patterns are fairly predictable.

They treat vector search as the whole design

Retrieval matters.

It is not the whole memory architecture.

They store too much raw history

Accumulation is not learning.

Too much raw state becomes noise, contradiction, and stale context.

They do not separate memory classes

Semantic, episodic, and procedural memory behave differently.

If they all get stored and exposed the same way, the system becomes harder to steer.

They let memory writes happen without enough guardrails

Writability without ownership rules is how bad runs become durable policy.

They confuse retrieval quality with memory quality

A system can fetch relevant text and still have terrible continuity.

Those are different competencies.

What Builders Should Actually Do

If I were building this today, I would do five things first.

1. Decide scope before schema

Before debating embeddings, decide who the memory belongs to and how long it should survive.

2. Separate memory classes early

Do not dump semantic, episodic, and procedural memory into one unlabeled bucket.

3. Add explicit promotion rules

Do not pretend raw history is durable memory just because it is saved somewhere.

4. Keep pinned memory small

A pinned layer should be behavior-shaping, not archival.

5. Evaluate memory writes as part of the runtime

Do not evaluate only the final answer.

Evaluate whether the system is writing the right things into memory and whether those writes improve later behavior instead of degrading it.

That is one of the places where memory design eventually meets the production disciplines behind Context Engineering: The New Core Skill.

FAQ

Is a vector database enough to count as agent memory?

No.

A vector database can support retrieval, but it does not by itself define what should persist, what should be pinned, what should be promoted, or who can write durable state.

What is the biggest mistake teams make with memory systems?

They confuse storage with continuity.

Saving more history is easy. Deciding what should still influence later behavior is the real design problem.

What should always stay pinned in context?

Usually only the behavior-shaping state that matters on most runs:

If the pinned layer starts looking like an archive, it is too large.

What should usually be retrieved instead of pinned?

Lower-frequency but still relevant material:

That material is useful, but it does not need to crowd the current step by default.

What is procedural memory in practice?

Procedural memory is memory of how to do something, not just what happened or what is true.

In practice that can look like:

Is procedural memory already solved in production agents?

No.

Some systems are moving in that direction, but most production memory layers are still much stronger at semantic and episodic memory than at durable reusable skill formation.

Why is promotion so important?

Because raw traces are not durable memory yet.

Promotion is the step where the system decides what graduates from temporary interaction into something that should shape later runs.

Without that step, memory becomes accumulation instead of learning.

Why does ownership matter in memory design?

Because writable memory is a governance problem.

If agents can freely rewrite durable facts, rules, or procedures without enough authority checks, the memory system can quietly become a source of bad policy and unstable behavior.

What is the cleanest way to start designing agent memory?

Start with scope first.

Before choosing stores, embeddings, or schemas, decide:

That usually clarifies the rest of the design much faster.

How should this article connect to the rest of the memory sequence?

Use Memory: Why Agents Need More Than Context Windows for the foundational continuity argument, Short-Term Context, Retrieval, and Long-Term Memory for layer separation, and Agent Memory Is Growing Up - Why Agents Are Starting to Remember How, Not Just What for the more directional view of where the field is heading.

Final Thought

Good agent memory is not about how much the system can store.

It is about how well the system decides:

That is why production memory is a design problem, not a feature checkbox.

And that is why the strongest systems are converging on architecture rather than accumulation.