For a while, the smartest thing I worked with started every conversation already tired. I’d open a fresh session and, before I had asked it a single question, a wall of remembered context would pour in — rules, history, notes-to-self, everything it had ever learned about the work. By the time it said hello, a large part of its attention was already spent. A few exchanges later the window was full, the answers got vaguer, and I’d quietly start over. I was paying for a bigger and bigger memory and getting a smaller and smaller mind.

The thing I’d built to help it remember had become the thing stopping it from thinking.

How it happened

It happened the way these things always do: one helpful append at a time. Every decision, every “remember for next time,” every bit of shipped history went into one always-loaded document, because that was the simplest place to put it. Each addition was reasonable. The sum was not.

By the time I looked properly, that one file was around six hundred lines, and roughly seventy percent of it was history — a changelog the agent re-read, in full, at the start of every single session. Worse, because everyone and everything appended to the same block, it generated merge conflicts constantly. The memory wasn’t just heavy. It was actively in the way.

A brain is a database you query, not a document you load. A hundred percent recall is not the same as a hundred percent resident.

— the rule I kept after

The reframing that fixed it was small and a little embarrassing in hindsight: total recall and total residency are different things. I had been treating them as one. I wanted the agent to be able to remember everything, so I kept everything in front of it at all times. But you don’t need a fact in the room to be able to fetch it. You need to know it exists and where to find it.

The shape of the fix

From a loaded document to a queried brain Before: one ever-growing document fills the context window every session. After: a tiny always-on index, with the bulk kept on disk and fetched on demand by a subagent. BEFORE — LOAD EVERYTHING CLAUDE.md + brain everything, every session room left to think One ever-growing document becomes the context budget. Every session starts near the limit; a few exchanges later, it’s full. ≈ 634 lines · 70% history · always loaded AFTER — FETCH ON DEMAND Constitution + Index tiny · always on room to think a subagent reads the bulk, returns a short conclusion THE BRAIN · ON DISK · IN GIT Decisions — the why State — the now Backlog — the next Log — almost never retrieve
The same knowledge, two architectures. Left: the document is the budget. Right: a tiny index stays resident; everything else is retrieved just-in-time and read by a subagent, so the main thread stays clear.

The new arrangement keeps only two things permanently in view, and both are tiny:

  • a Constitution — the hard rules and conventions the agent must never violate, kept deliberately lean; and
  • an Index — a one-line-per-item map of everything else: a title, and a hook that says when this is worth opening.

Everything else — the decisions and their reasoning, the current state of the world, what’s planned next, the long history — lives on disk, in separate files, and is retrieved only when the task at hand actually needs it. The index is the map; the rest is territory you visit on purpose.

And there’s one more move that matters more than it looks. When a task does need to read a lot — say, “why did we decide X, and what shipped around it?” — the main thread doesn’t read it. It hands that off to a subagent, which goes off, reads the bulk, and comes back with a short conclusion. The heavy reading happens somewhere else; only the answer comes home. The room where the thinking happens stays clear.

Keeping the past without carrying it

The fear with “don’t load everything” is that you lose the past. You don’t — you just stop carrying it on your back.

Two rules make history reconstructable on demand:

  1. Append, don’t overwrite. A new decision never edits the old one. When something changes, you add a new entry and mark the old one superseded by the new. The chain stays intact.
  2. Date everything. Every entry is stamped, and the whole thing lives in version control.

Put those together and any question about the timeline has an answer that was never sitting in context:

"What was true on the 8th?"   → read the dated entries up to that date
"Why did this rule change?"    → follow the 'superseded by' chain
"What's live right now?"       → one small state file, read on demand

The history is total. The footprint is nearly nothing. That’s the whole trick.

What I’d tell myself a year ago

I reached for the obvious upgrades first — a managed memory service, a fancier store — and didn’t need any of them. Plain dated files in git gave me provenance, audit, and an honest timeline for free, and they happen to suit the way I actually work. The lesson wasn’t get a better database. It was stop confusing what you can recall with what you must hold.

The win shows up as calm. Sessions start light. The answers stay sharp deep into a problem instead of fogging over. And the memory, finally, is something the mind reaches into — not something it has to wade through to begin.