Memory in AI, what it is, what it is not, and why it matters

Dr Claude DelormeHead of Research, moccet

AI memory is the persistent context that allows a system to know who a user is across sessions. The popular feature in chat products is a Rolodex of stored facts. A genuine personal intelligence requires a structured model.

AI memory is the persistent context that allows a system to know who a user is across sessions. The popular version of this feature in chat products is a Rolodex of stored facts. A genuine personal intelligence requires a structured model of the user. The architectural difference between the two has become the central design battleground of the next generation of AI, and the empirical work on context windows in 2026 has shown that simply enlarging working memory is not the path to deeper memory. moccet is being built around structured memory rather than around lists.

This essay explains what AI memory actually is, the three different things the term refers to, and why the distinction is consequential for the products users will choose to live with.

What is the lost in the middle problem in long-context AI?

The most striking empirical finding about AI memory in 2026 came from a research firm called TokenMix, which published in April 2026 the results of testing eighteen frontier language models on what is known as the lost in the middle problem. The phenomenon was first formally described in a 2023 Stanford paper by Liu, Lin, Hewitt, and colleagues, titled Lost in the Middle. Models perform better at retrieving and using information placed at the beginning or end of a long context than information placed in the middle.

The TokenMix study, run on production versions of GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, and fifteen others, found accuracy degradations of 10 to 25 percent for information placed in the middle of long contexts. Models with the largest advertised context windows, including Gemini 3 Pro at 1 million tokens and Llama 4 Scout at 10 million, showed the greatest degradation. Larger windows had more middle to get lost in. Around the same time, the database company Chroma published research showing that context rot, the firm's term for accuracy decline as context length grows, exceeded 30 percent in mid-window positions across all eighteen frontier models tested.

The finding goes to a misunderstanding that has shaped the public conversation about AI memory for the past two years. Bigger context windows are not a path to AI that actually remembers things. A larger context window is a wider working memory, not a deeper long-term one. The architectural problem of giving an AI system a meaningful, durable model of a user's life is not solved by stuffing more text into a single inference call. The model gets confused, or at minimum less reliable, the more you put in.

What are the three kinds of AI memory?

Three different things currently get called memory in the trade press. Understanding what AI memory actually is requires unpacking each.

The first kind is the context window itself. A language model, in its base form, is a stateless function. Text in, text out, no persistence. Modern chat products simulate continuity by sending the full conversation history with each new message, formatted as input to the model, so the model has the appearance of remembering what was said earlier. The bound on this trick is the context window. Current production models range from 128,000 tokens for Claude Haiku and GPT-5.4 Mini, to 1 million tokens for Gemini 3 Pro, with Llama 4 Scout claiming 10 million. The windows have grown by orders of magnitude in eighteen months, and the memory experience for the user has not transformed. The lost in the middle phenomenon is part of the reason.

The context window is not memory. The context window is more like the size of the desk a model is sitting at. A larger desk lets you spread out more work at once. A larger desk does not let you remember the work you did last week.

The second kind of memory is the persistence-of-facts feature that ChatGPT introduced in early 2024 and that has since been replicated across the major chat products. This is what most users mean when they say AI memory. The system extracts facts from a conversation, stores them in a separate database, and retrieves them in future conversations by inserting them into the context window before the user's new message arrives. The user types, I prefer concise answers. The system stores, User prefers concise answers. The next time the user opens a chat, the stored fact is added to the system prompt and the model behaves accordingly.

The feature is genuinely useful. The feature addresses the most basic complaint about chat AI, which is having to repeat your context every time you open a new session. The feature is also shallow in a way that becomes obvious as soon as you ask the system to do more than recall preferences. The memory is a list of explicit facts the system has decided to remember. The decision of what to remember is made by an extraction model, often imperfectly, and the stored facts are static. The facts do not update unless the system extracts a new fact that contradicts an old one. There is no reasoning across them. There is no continuous picture. The memory is a Rolodex.

A Rolodex helps with discrete recall. A Rolodex does not help with judgement that requires seeing the user as a coherent whole. If the system has stored that you have two children, that you prefer concise answers, that you are working on a book about ambient AI, and that you went to school in Cambridge, those facts sit on the Rolodex side by side. The system cannot tell that the book is the most recent project, or that the writing pattern you mentioned in conversation last week explains why you have been working at unusual hours, or that the tone you use with your editor is different from the tone you use with your collaborators. None of that follows from a list of strings.

The third kind of memory is the kind a personal intelligence requires. A genuine model of a user is not a list. A genuine model is a structured representation that supports retrieval, inference, update, and continuity. The model includes patterns, commitments, context, and relationships, organised in ways that allow the system to reason across them rather than retrieve them one at a time.

What does a structured model of a user actually contain?

A structured model of the user includes patterns. When the user works best, what their typical week looks like, who they communicate with most, how they speak to different people. The patterns are derived from continuous observation across connected sources, not stated by the user as preferences.

A structured model includes commitments. Things the user has said they will do. Things others have said they will do for the user. Deadlines. Dependencies. The commitment graph is updated as new commitments are made and old ones are completed or deferred.

A structured model includes context. What is happening this week, this month, in the longer arc of the year, that matters but the user has not yet explicitly thought about. Context is what allows the system to know that this Tuesday is different from last Tuesday in ways that affect what the user should be doing.

A structured model includes relationships. Who matters, in what way, with what history. The relationship graph is what allows the system to draft in the user's voice differently to different people, because the way the user writes to a co-founder is not the way the user writes to a board member.

Building this is hard, and the hardness is why most products in the category are not delivering it. Building a Rolodex of facts is a weekend project. Building a continuously updated, structured model of a person's life is engineering on the order of years. The work involves data ingestion pipelines for every connected source, structured representations that survive ingestion errors, retrieval mechanisms that surface the right context for each task, and update logic that keeps the model coherent as new data arrives, sometimes contradicting what was there before. The work also involves choosing which representational substrate to use for which kind of information. Vector databases for semantic similarity. Graph databases for relationships among entities. Time-series structures for patterns and trends. Traditional relational structures for explicit records.

Why does memory architecture shape user trust?

The way a system remembers shapes the kind of trust users can place in it.

A system that remembers shallowly produces shallow trust. The user can rely on the system to remember a few preferences. The user cannot rely on the system for anything more. Important decisions still have to be made by the user, because the system does not have the context to make them. The user is doing all the orchestration and the system is providing minor convenience.

A system that remembers deeply produces a different kind of trust. The user can hand the system more of their life. The reason is that the system is no longer making decisions in the dark. The system is making decisions based on the same picture the user has, and sometimes a more complete picture, because the system can see things across domains the user has temporarily forgotten about. The trust is not given upfront. The trust is earned across thousands of small accurate decisions, and once earned, scales the user's willingness to delegate.

A system that knows this much about a person is also a system that can hurt the person if the data is mishandled. The architectural answer is that deep memory must be paired with deep privacy. Encryption in transit and at rest. No use of personal data for model training. Sandboxed action with user confirmation. The right to revoke access and delete data at any time. SOC 2 Type II, HIPAA, GDPR. The features that make a personal intelligence worth using are the features that make it safe to use, and a fuller account is in the essay on the privacy architecture of a personal intelligence. A product that has built deep memory without the corresponding privacy architecture is not actually a personal intelligence. It is a liability with an appealing front end. moccet is being built with structured memory and the privacy architecture together, treating them as a single design commitment rather than two separate features.

What is the next frontier in AI memory?

The next two years will see memory become the central design battleground of the AI category. The systems that build shallow memory will continue to feel like assistants with a notepad attached. The systems that build deep memory will become representatives of their users, capable of acting on their behalf with the context required to act well.

The empirical work on context rot tells us that the brute-force path is bounded. Bigger context windows, more text in the prompt, do not produce deeper memory. Past a certain point, larger contexts produce worse retrieval. The architectural path, structured models of the user paired with the privacy infrastructure to make depth safe, is where the next frontier of the category actually is.

For knowledge workers evaluating AI products, the practical question is which kind of memory the product has. A Rolodex of stored facts will deliver a few moments of pleasant surprise and not much else. A structured model of the user is what scales into a system worth living with.

Try moccet

moccet is a personal intelligence built around a continuous model of one person’s life. The product is in early access. The founders run a live twenty-minute session daily at 1pm Pacific that walks through how it works on a real week.

Claim your seat

Common questions.

AI memory is the persistent context that allows a system to know who a user is across sessions. The term refers to three different things in current products. The context window, which is working memory within a single inference call. The persistence of facts feature, which is a list of stored facts the system has chosen to remember. And a structured model of the user, which is the architecture a personal intelligence requires.
Lost in the middle is the phenomenon in which language models retrieve information from the beginning or end of a long context more accurately than information placed in the middle. The phenomenon was first formally described in a 2023 Stanford paper by Liu and colleagues. Testing in April 2026 found accuracy degradations of 10 to 25 percent across eighteen frontier models.
No. A larger context window is a wider working memory, not a deeper long-term one. Models with the largest advertised context windows show the greatest accuracy degradation in mid-window positions, because there is more middle to get lost in. The architectural problem of giving an AI a meaningful, durable model of a user's life is not solved by enlarging context.
A stored-facts memory is a list of explicit facts the system has chosen to remember, retrieved by being inserted into the context window. A structured model of the user is a representation that includes patterns, commitments, context, and relationships organised in ways that support inference and update. A list grows linearly. A structured model has structure.
Deep memory must be paired with deep privacy. The features that make a personal intelligence useful, including continuous observation and structured modelling of the user, are the same features that create catastrophic risk if the system is built without the right architecture. Encryption in transit and at rest, no training on personal data, sandboxed action with confirmation, and the right to revoke and delete are conditions, not optional features.
Live, daily at 1pm Pacific.

See moccet on a real week of yours.

Twenty minutes with the founders. They’ll show you how moccet works on a week like yours, what it’s good at, what it can do for you. Ten minutes for your questions.

Claim your seat