🧠

Context Window Management — Strategy for Finite Memory

Even with 1M tokens, how you fill it matters

When GPT-4 had 8K tokens, what to put in context was the most critical problem. In 2026's 1M-token era, the problem hasn't disappeared — it's changed.

Why Management Still Matters

Cost — more tokens means higher API costs. Unnecessary info wastes money.

Accuracy — "Lost in the Middle" phenomenon. Models struggle to reference information in the middle of context. Important stuff goes at the beginning or end.

Speed — longer context means slower responses. Noticeable in real-time conversations.

Harness Context Management Strategies

Selective inclusion — don't dump the entire codebase. Include only files relevant to the current task. This is what Claude Code does when it explores a repo and reads only relevant files.

Summarization and compression — summarize long conversation histories to save tokens. Keep only the essence instead of full message history.

Dynamic management — needed information changes as work progresses. Files included early may be swapped out for different ones later.

Persistent storage integration — information outside the context window is stored in files, DBs, or vector stores and retrieved when needed. This is the essence of RAG.

The CLAUDE.md Pattern

Claude Code's CLAUDE.md is a good example of context management. Project rules, structure, and patterns stored in files, injected into context at each session start. Simulating the model's "long-term memory" with external files.

How It Works

1

Selective inclusion — only put task-relevant info in context (don't dump entire codebase)

2

Priority placement — important info at start/end of context, prevent Lost in the Middle

3

Dynamic swapping — update context contents as work progresses through stages

4

External storage — what doesn't fit goes in files/DB/vector stores, retrieved when needed

Use Cases

Claude Code conversation compression — auto-summarizes earlier messages when approaching context limits CLAUDE.md / memory system — maintaining cross-session info via external files RAG — dynamically injecting only needed info into context via vector search

References

🔗 Lost in the Middle (Liu et al., 2023)