Most memory talk in agent systems is still too vague.
Teams say things like:
- we added retrieval, so now the agent has memory
- we store chat history, so continuity is handled
- we use a vector database, so long-term memory is solved
That language sounds fine until you try to build a system that has to survive real use.
Then the cracks show.
The problem is that memory often gets used to mean four different things at once:
- current working context
- searchable history
- durable user or system state
- reusable workflows or skills
Those are not the same layer.
And if you collapse them into one blurred bucket, you get exactly the kind of sloppy agent memory system that feels impressive in a demo and brittle in production.
That is the practical companion to Agent Memory Is Growing Up - Why Agents Are Starting to Remember How, Not Just What.
The opinion piece makes the directional argument.
This article is the system-design version.
My view is simple:
Good agent memory in production is not one store. It is a governed system for deciding what should stay visible, what should be retrievable, what should persist, what should be compressed, and who gets to update it.
If you want the operational version, use this:
A serious memory design needs five decisions:
Scope,Class,Ownership,Promotion, andExposure.
Why Most Memory Talk Is Still Too Loose
The site has already covered two foundational distinctions:
- Memory: Why Agents Need More Than Context Windows
- Short-Term Context, Retrieval, and Long-Term Memory
Those basics still matter.
But once you move from conceptual explanation to production design, the real problem changes.
It is no longer just:
Does the agent have memory?
It becomes:
What kind of memory is this, how long should it live, when should it be visible, and who should be allowed to change it?
That is a much better engineering question.
Because most failures in agent memory do not come from having no storage at all.
They come from bad decisions about:
- scope
- promotion
- compression
- retrieval
- write permissions
In other words, memory quality is an architecture problem.
What Current Systems Are Actually Converging On
The current serious memory surface is more useful than the hype makes it sound.
It is also narrower.
The interesting thing is not that every framework has a memory feature.
The interesting thing is that the better ones are converging on a few practical ideas.
Letta: some memory should stay pinned
Letta treats memory blocks as editable pieces of in-context state. Some of that memory is always visible rather than retrieved on demand.
That is an important production lesson.
Some memory matters because it should shape behavior continuously:
- identity
- role
- standing constraints
- operator instructions
- stable user-specific facts
If the agent has to retrieve those every time, the system is already designed incorrectly.
Mem0: memory needs layers and promotion rules
Mem0 separates conversation, session, user, and organizational memory. That is not just tidy product design. It is a real systems lesson.
Different memory belongs to different scopes and different lifetimes.
And just as important, Mem0’s public model explicitly includes promotion:
capture -> promote -> retrieve
That is the right shape.
Raw interaction is not durable memory yet.
Something has to decide what graduates.
LangGraph and Deep Agents: writes need strategy and permissions
LangGraph and LangChain long-term memory make two useful things explicit:
- long-term memory should be persisted in real stores and namespaces
- memory can be semantic, episodic, or procedural
The Deep Agents memory docs go one step further by emphasizing scope, retrieval mode, and whether memory is writable or read-only.
That matters because a writable memory system without a write policy is just a very expensive way to accumulate bad state.
MIA: compression and planning belong inside the memory story
The MIA paper is interesting less because it proves production readiness and more because it makes the right problem visible.
Its Manager -> Planner -> Executor architecture treats memory as experience that should improve future planning, not just retrieval.
That is the most useful lesson to borrow.
If your memory layer never improves planning, never compresses history, and never influences future strategy, then it is probably doing less than you think.
The S.C.O.P.E. Model
If I were designing an agent memory system in production, this is the framework I would use.
Call it S.C.O.P.E.:
ScopeClassOwnershipPromotionExposure
That is the minimum set of decisions a serious memory design needs to make.
1. Scope
The first question is:
Who does this memory belong to, and how long should it last?
That sounds obvious.
It is also where a lot of systems fail first.
Memory can belong to:
- the current turn
- the current session
- a user
- a workspace or account
- an agent
- an organization
Those scopes should not be mixed casually.
A user preference is not the same thing as a task-local scratchpad.
A workspace policy is not the same thing as a personal memory.
An agent that cannot separate those scopes eventually pollutes itself with the wrong kind of continuity.
This is one reason layered systems like Mem0 are useful. They force the designer to admit that lifetime and audience are part of the architecture, not just metadata cleanup.
2. Class
The second question is:
What kind of memory is this?
The most useful production distinction is still:
- semantic memory
- episodic memory
- procedural memory
Semantic memory is durable knowledge.
Examples:
- the customer prefers weekly summaries
- this environment uses a nonstandard API host
- finance requires human approval above a threshold
Episodic memory is memory of what happened.
Examples:
- the last two runs failed after the same tool call
- this account already received a manual refund
- the previous repair attempt broke tests in a specific module
Procedural memory is memory of how to do something.
Examples:
- the repair pattern that usually works for a known failure mode
- the escalation workflow for a specific class of incidents
- the multi-step investigation sequence that avoids a common dead end
This is where memory starts touching Planning and Task Decomposition and ReAct and the Basic Reasoning Loop.
If you want agents to get better over time, procedural memory is the category that matters most.
It is also the category most systems handle worst.
That does not mean most production systems already have durable procedural memory solved.
They do not.
In most real systems, procedural memory is still partial, brittle, or heavily scaffolded by prompts, rules, and human-maintained workflows. That is exactly why it deserves separate treatment instead of being blurred into user preferences or generic retrieval.
3. Ownership
The third question is:
Who is allowed to write, approve, or override this memory?
This part is consistently under-discussed.
Not every memory store should be agent-writable.
Some memory should be:
- system-managed
- human-approved
- read-only shared policy
- agent-writable but heavily scoped
This is where memory design becomes governance rather than storage.
If an agent can freely rewrite durable business facts, policy, or long-lived procedures based on noisy runs, the problem is not “bad retrieval.”
The problem is that the memory system has no authority model.
That is also why I would treat writable memory as adjacent to Tool Use: How Agents Take Action. The moment an agent can write memory, memory itself becomes a tool with consequences.
4. Promotion
The fourth question is:
What gets upgraded from raw trace into durable memory?
This may be the most important part of the whole system.
A lot of memory architectures quietly assume that saving more history is the same thing as learning.
It is not.
Raw traces are not durable memory yet.
Something has to decide what moves from:
- chat history
- tool output
- observations
- failed attempts
- temporary plans
into a state that should still influence later runs.
That promotion step may involve:
- summarization
- deduplication
- conflict resolution
- tagging failure versus success
- converting a trace into a reusable workflow or skill
This is where MIA is directionally useful. It treats compression and memory evolution as core parts of the design rather than an afterthought.
That is exactly right.
Good memory is not what the system stores.
It is what the system decides is worth carrying forward.
5. Exposure
The fifth question is:
What should stay pinned in context, and what should be retrieved only when needed?
This is where systems often swing between two bad extremes.
Bad extreme one:
- pin too much
- bloat the prompt
- keep stale or weakly relevant state always visible
Bad extreme two:
- pin almost nothing
- force the agent to retrieve core behavior-shaping state over and over again
The right answer depends on operational importance.
Pinned memory is best for things like:
- core role and identity
- stable preferences that affect most turns
- standing rules and boundaries
- highly reusable behavioral constraints
Retrieved memory is better for things like:
- old incidents
- archived conversations
- prior task traces
- large external knowledge stores
- lower-frequency contextual detail
Letta’s distinction between in-context blocks and retrievable out-of-context state is useful here because it makes visibility a first-class decision.
It should be.
The practical rule is simple:
- pin what shapes behavior on most runs
- retrieve what is situational
- archive what should remain available but not constantly visible
A Concrete Example: A Coding Agent
Take a coding agent operating inside a real engineering workflow.
A bad memory system stores everything:
- raw command history
- every file edit
- every test output
- every thought
- every failed approach
That is not memory architecture.
That is a junk drawer.
A better system would separate the layers.
Pinned Memory
- repository-specific guardrails
- codebase conventions
- stable approval policy
- standing task objective for the active run
Session Memory
- files already inspected
- what has already failed
- intermediate hypotheses
- current repair plan
Retrievable Memory
- old bugfix traces
- prior incidents with similar failures
- previous PRs that touched the same subsystem
- architecture notes
Durable Promoted Memory
- a validated repair pattern that repeatedly works for a specific class of failures
- a known repo-specific trap worth avoiding
- a stable workflow for shipping a certain kind of change safely
That is a much more believable production design.
It preserves continuity without pretending every transient detail deserves immortality.
What Bad Memory Systems Usually Get Wrong
The failure patterns are fairly predictable.
They treat vector search as the whole design
Retrieval matters.
It is not the whole memory architecture.
They store too much raw history
Accumulation is not learning.
Too much raw state becomes noise, contradiction, and stale context.
They do not separate memory classes
Semantic, episodic, and procedural memory behave differently.
If they all get stored and exposed the same way, the system becomes harder to steer.
They let memory writes happen without enough guardrails
Writability without ownership rules is how bad runs become durable policy.
They confuse retrieval quality with memory quality
A system can fetch relevant text and still have terrible continuity.
Those are different competencies.
What Builders Should Actually Do
If I were building this today, I would do five things first.
1. Decide scope before schema
Before debating embeddings, decide who the memory belongs to and how long it should survive.
2. Separate memory classes early
Do not dump semantic, episodic, and procedural memory into one unlabeled bucket.
3. Add explicit promotion rules
Do not pretend raw history is durable memory just because it is saved somewhere.
4. Keep pinned memory small
A pinned layer should be behavior-shaping, not archival.
5. Evaluate memory writes as part of the runtime
Do not evaluate only the final answer.
Evaluate whether the system is writing the right things into memory and whether those writes improve later behavior instead of degrading it.
That is one of the places where memory design eventually meets the production disciplines behind Context Engineering: The New Core Skill.
FAQ
Is a vector database enough to count as agent memory?
No.
A vector database can support retrieval, but it does not by itself define what should persist, what should be pinned, what should be promoted, or who can write durable state.
What is the biggest mistake teams make with memory systems?
They confuse storage with continuity.
Saving more history is easy. Deciding what should still influence later behavior is the real design problem.
What should always stay pinned in context?
Usually only the behavior-shaping state that matters on most runs:
- core role or identity
- standing constraints
- stable preferences
- active task objective
If the pinned layer starts looking like an archive, it is too large.
What should usually be retrieved instead of pinned?
Lower-frequency but still relevant material:
- prior incidents
- archived threads
- old task traces
- large knowledge sources
- historical examples
That material is useful, but it does not need to crowd the current step by default.
What is procedural memory in practice?
Procedural memory is memory of how to do something, not just what happened or what is true.
In practice that can look like:
- a repair sequence
- an investigation workflow
- a reusable escalation pattern
- a validated task strategy
Is procedural memory already solved in production agents?
No.
Some systems are moving in that direction, but most production memory layers are still much stronger at semantic and episodic memory than at durable reusable skill formation.
Why is promotion so important?
Because raw traces are not durable memory yet.
Promotion is the step where the system decides what graduates from temporary interaction into something that should shape later runs.
Without that step, memory becomes accumulation instead of learning.
Why does ownership matter in memory design?
Because writable memory is a governance problem.
If agents can freely rewrite durable facts, rules, or procedures without enough authority checks, the memory system can quietly become a source of bad policy and unstable behavior.
What is the cleanest way to start designing agent memory?
Start with scope first.
Before choosing stores, embeddings, or schemas, decide:
- who the memory belongs to
- how long it should last
- whether it should be pinned or retrieved
- who can update it
That usually clarifies the rest of the design much faster.
How should this article connect to the rest of the memory sequence?
Use Memory: Why Agents Need More Than Context Windows for the foundational continuity argument, Short-Term Context, Retrieval, and Long-Term Memory for layer separation, and Agent Memory Is Growing Up - Why Agents Are Starting to Remember How, Not Just What for the more directional view of where the field is heading.
Final Thought
Good agent memory is not about how much the system can store.
It is about how well the system decides:
- what should survive
- what should stay visible
- what should be fetched later
- what should be compressed
- what should be trusted
That is why production memory is a design problem, not a feature checkbox.
And that is why the strongest systems are converging on architecture rather than accumulation.