Claude Code’s token budget is finite, and CLAUDE.md loads into every session. Most guides stop at “keep it short.” This one goes further — covering lazy loading patterns, conditional sections, memory offloading architectures, and the measurement techniques that show you what is actually eating your context before you start cutting.
This guide complements our CLAUDE.md token budget optimization article, which covers the fundamentals. We are picking up where that leaves off.
See real CLAUDE.md files with different token profiles in our gallery.
Why Standard Pruning Isn’t Enough
Standard advice — “delete what you don’t use” — hits a ceiling quickly. The problem is that most of what’s in a well-maintained CLAUDE.md is genuinely used. You can’t cut it. You need a different strategy.
The core insight: CLAUDE.md is optimized for worst-case behavior (Claude doing the wrong thing without instructions), but it loads on every request, including simple ones. A 3,000-token CLAUDE.md loaded for “what does this function return?” is paying a high tax for no benefit on that request.
Advanced optimization is about matching instruction density to actual task complexity.
Step 1: Classify Your Instructions
Before optimizing, classify everything currently in your CLAUDE.md into four buckets:
Always-needed (core). Instructions Claude will get wrong without them, on virtually any task. Example: “always use TypeScript strict mode,” “never console.log in production code.” These stay in CLAUDE.md.
Frequently-needed. Instructions relevant to most (but not all) work. Example: “our API base URL is X,” “the design system tokens are in /src/tokens.” These are candidates for a MEMORY.md pattern.
Rarely-needed. Instructions that only matter for specific tasks. Example: detailed migration procedures, legacy system quirks, release process steps. These should not live in CLAUDE.md.
Dead. Instructions for patterns you no longer use, deprecated dependencies, or context about a project phase that ended. Delete these.
Most CLAUDE.md files have 30–50% dead or rarely-needed content. Removing it is the fastest win.
Lazy Loading with MEMORY.md
The standard CLAUDE.md pattern loads everything upfront. The MEMORY.md pattern splits your knowledge base:
~/.claude/CLAUDE.md (or ./CLAUDE.md)
→ Core instructions only (target: < 800 tokens)
./MEMORY.md (or .claude/MEMORY.md)
→ Extended knowledge, loaded on demand
In CLAUDE.md, add a single reference:
## Extended Knowledge
For project-specific details, conventions, and reference information,
read MEMORY.md. Load it when you need context beyond these core instructions.
This does not reduce the token cost when Claude reads MEMORY.md, but it gives you control over when it happens. For simple tasks, Claude may not read it at all. For complex tasks, Claude reads it once and the content is in context for the remainder of the session.
The effective result: you pay the full token cost only when it is needed, not on every simple request.
MEMORY.md structure
# Project Memory
## Architecture
[System design, major components, data flow]
## API and Infrastructure
[Endpoints, auth patterns, external service details]
## Development Conventions
[Code style specifics, patterns to prefer, patterns to avoid]
## Deployment
[How to build, test, and release]
## Current Work
[Active sprint, current PR, open issues — update this regularly]
## Historical Decisions
[ADRs and the reasons behind them — why you chose X over Y]
The “Current Work” section is especially valuable because it is high-churn content that would otherwise pollute CLAUDE.md. Keep it in MEMORY.md and update it rather than letting outdated context accumulate.
Conditional Section Patterns
Not every section of CLAUDE.md is relevant to every task. You can hint to Claude when sections apply:
## Frontend Work (applies when working in /src/components or /src/pages)
- Always use the design system tokens from /src/tokens/
- Use server components by default; add "use client" only when needed
- Tailwind utility classes preferred over inline styles
## Backend Work (applies when working in /api or /services)
- Use the query builder in /lib/db.ts — never raw SQL
- All endpoints must have Zod validation on input
- Log errors with our logger (/lib/logger.ts), not console.error
## Testing (applies when writing or modifying tests)
- Use vitest for unit tests, Playwright for e2e
- Test command: npm test (unit) / npm run test:e2e (e2e)
- Never snapshot test UI components
This does not automatically scope which sections Claude reads — the entire CLAUDE.md is still in context. What it does is guide prioritization. When Claude is working on a backend endpoint, it naturally weights the backend section over the frontend section when deciding how to behave.
The benefit is psychological as much as mechanical. Scoped sections prevent instructions from one domain bleeding into another.
Token-Efficient Instruction Writing
Instruction quality affects token cost. Verbose instructions are not always better instructions.
Replace explanations with rules
Before (verbose):
When writing TypeScript, we prefer to use interfaces over types when defining
object shapes because they are more extensible and produce better error messages.
However, for union types, mapped types, and conditional types, we use type aliases
because interfaces can't express those patterns.
After (rule-first):
TypeScript: prefer `interface` for object shapes, `type` for unions/mapped/conditional.
The rule-first version is ~70% shorter and equally actionable. Claude does not need the explanation — it already knows the tradeoffs. You are paying tokens to tell it something it already knows.
Bullet brevity
# Before
- Make sure that when you are creating new React components, you always use
functional components with hooks rather than class components, as our codebase
has standardized on the modern functional approach.
# After
- React: functional components only (no class components)
The “after” version conveys the same instruction in 7 words instead of 35.
Remove self-evident statements
These add tokens without changing behavior:
- “Write clean, maintainable code” (Claude already tries to do this)
- “Add helpful comments to complex code” (already default behavior)
- “Follow best practices” (not a concrete instruction)
Remove them. They provide no signal, only noise.
The Reference Pattern for Large Knowledge Bases
For large projects with extensive documentation, use CLAUDE.md as an index rather than a knowledge store:
## Reference Materials
When you need project-specific context, consult these files:
- **Architecture**: docs/ARCHITECTURE.md
- **API specification**: docs/api/README.md
- **Data models**: prisma/schema.prisma + docs/DATA_MODEL.md
- **Deployment**: docs/DEPLOYMENT.md
- **Testing standards**: docs/TESTING.md
Read only what the current task requires.
This approach has zero upfront token cost for the referenced content. Claude reads specific docs when needed rather than loading everything. For large knowledge bases (10,000+ tokens of documentation), this can reduce session token overhead by 80% or more.
The tradeoff: Claude may occasionally miss context it would have had if everything were loaded. Mitigate this with clear task framing that tells Claude what reference materials are relevant.
Measuring Real Impact
Knowing your theoretical token count is not enough. You need to measure what actually happens in a session.
What to measure
For a representative sample of your typical tasks (5–10 sessions), track:
- Tokens consumed at session start (before your first message)
- Tokens Claude spends on your CLAUDE.md-related behavior vs. actual task work
- Frequency of Claude referencing MEMORY.md or external docs vs. inline knowledge
Claude Code surfaces token usage in its interface. Use this data, not estimates.
Warning signs your CLAUDE.md is too heavy
- Claude regularly summarizes or paraphrases your CLAUDE.md back to you (it is trying to compress it)
- You see
/compactrecommendations early in a session - Simple tasks feel slower than they should
- Claude misses instructions that are buried deep in the file (cognitive load, not logic failure)
Warning signs your CLAUDE.md is too sparse
- Claude frequently asks clarifying questions about conventions you’ve established
- Recurring correction patterns (you keep saying “no, we do X not Y”)
- Inconsistent output across sessions on the same type of task
The sweet spot is usually around 600–1,200 tokens for the core CLAUDE.md, with extended knowledge in MEMORY.md or referenced docs.
Session-Level Optimization
Beyond the file itself, session management affects effective token budget.
Use /compact proactively
When context starts getting long, /compact condenses the conversation history while preserving key decisions. Use it at natural break points — between major phases of a task, after a large codebase read, before starting a new subtask.
Scope task framing
Starting a task with “I’m working on the payment processing module — focus only on the files in /src/payments/” is not just polite. It tells Claude what to load and what to ignore, which affects which parts of MEMORY.md or referenced docs it reads.
End sessions before they overfill
A session that has run for six hours, read 50 files, and executed 200 tool calls is a session with a compressed context window. Quality degrades before you hit the hard limit. Starting a fresh session for a new phase of work is often more efficient than continuing a long session.
For the foundational approach, see our CLAUDE.md token budget optimization guide. For CLAUDE.md templates organized by project type, see our use-case templates. Real CLAUDE.md examples are available in our gallery.