CLAUDE.md Claude Code token optimization prompt cache @import context window performance advanced

CLAUDE.md Token Optimization: A Deep Dive into @import, Splitting, and Prompt Cache Strategies (2026)

The Prompt Shelf ·

A well-maintained CLAUDE.md is one of the highest-ROI investments in a Claude Code workflow. It is also one of the most reliable ways to burn tokens on every single session, including the ones where you just need to fix a typo. This guide covers the advanced techniques — @import composition, file splitting patterns, prompt cache compatibility, and how to measure your actual token footprint before and after changes.

The fundamentals (trim dead weight, remove rationale, use bullet points over prose) are covered in the token budget optimization guide. This article picks up where that one leaves off.

The Baseline: What Does Your CLAUDE.md Actually Cost?

Before optimizing, measure. Claude Code gives you a direct way to check:

# Count tokens in your CLAUDE.md
claude -p "Count the tokens in my CLAUDE.md and tell me the total"

# Or measure directly using Claude's token counting
python3 -c "
import anthropic
client = anthropic.Anthropic()
with open('CLAUDE.md') as f:
    content = f.read()
result = client.messages.count_tokens(
    model='claude-opus-4-5',
    messages=[{'role': 'user', 'content': content}]
)
print(f'Tokens: {result.input_tokens}')
"

For context: Claude Code’s 200K context window sounds enormous, but a Claude Opus session with a complex codebase can load 40,000–80,000 tokens of context before you even type your first prompt (open files, git history, terminal output). CLAUDE.md tokens are on top of that. At 3,000 tokens for a typical CLAUDE.md and 100 Claude Code sessions per month, you are loading ~300,000 tokens of CLAUDE.md content every month — some of it in sessions that did not need most of it.

@import: The Right Way to Split CLAUDE.md

The @import directive lets Claude read additional files as part of CLAUDE.md loading. The syntax is simple:

# CLAUDE.md

@import .claude/context/architecture.md
@import .claude/context/commands.md
@import .claude/context/conventions.md

Claude reads each imported file in order and treats the combined content as a single instruction set. The key insight: imported files are read from disk at session start, but they are cached independently by the prompt cache system (more on this below).

What belongs in split files vs the root CLAUDE.md:

Root CLAUDE.mdImported files
Core behavior instructions (short, high-impact)Architecture documentation
Import directivesDomain-specific conventions
Claude Code hooks and permissionsTech stack details
Session-start actionsReference tables and examples

Rule of thumb: if Claude would behave differently without it on 90%+ of tasks, it goes in the root. If it is only relevant for specific tasks or files, it is a candidate for a separate import.

Example structure:

project-root/
├── CLAUDE.md                      # 200-400 tokens (core only)
└── .claude/
    ├── context/
    │   ├── architecture.md        # System design, directory structure
    │   ├── commands.md            # Build, test, lint commands
    │   ├── conventions.md         # Naming, style, patterns
    │   └── api-conventions.md     # API-specific rules
    └── CLAUDE.local.md            # Not committed, personal overrides

A root CLAUDE.md reduced from 2,800 tokens to 280 tokens with all the detail in imported files. Claude loads the same total content, but the root file that governs behavior is tightly focused.

Prompt Cache Compatibility

This is the piece most optimization guides skip. Claude’s prompt cache stores prefixes of your context so they do not need to be reprocessed on repeated calls. When CLAUDE.md is the same across sessions (which it usually is), it hits the prompt cache and costs you read tokens instead of full processing tokens — roughly 10x cheaper.

Prompt cache rules that affect CLAUDE.md:

Cache hits happen when the prefix is identical. If your CLAUDE.md changes between sessions (even a timestamp, a dynamic line, anything generated at runtime), the cache breaks. Static files cache; dynamic content does not.

The cache has a 5-minute TTL by default in Claude Code. This means back-to-back sessions hit the cache. A session started after a long break starts fresh.

Imported files cache independently. With @import, each imported file is a separate cache entry. If you change commands.md but not architecture.md, only commands.md misses the cache. This is a significant benefit of the split-file approach over one monolithic CLAUDE.md.

Measuring cache behavior:

# Check cache performance in a Claude API context
# (Claude Code exposes this in session metrics)
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=100,
    system=[{
        "type": "text",
        "text": claude_md_content,
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "What are the main conventions?"}]
)

print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache write tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache hit rate: {response.usage.cache_read_input_tokens / (response.usage.cache_read_input_tokens + response.usage.cache_creation_input_tokens):.1%}")

For a typical CLAUDE.md that loads 3,000 tokens: with a cold cache, you pay full token cost. With a warm cache hit, you pay roughly 270 tokens (9% of full cost, the cache read rate). Across 100 sessions per day, cache hits save you an order of magnitude in token costs for CLAUDE.md specifically.

Directory-Scoped CLAUDE.md: When to Split by Directory

Claude Code supports CLAUDE.md files in subdirectories. Each subdirectory CLAUDE.md is loaded when Claude is working in that directory, in addition to the root CLAUDE.md. This is the right tool when you have a monorepo or a project with meaningfully different conventions in different areas.

When directory splitting helps:

  • Monorepo with multiple apps/packages that have different tech stacks
  • Projects where /src/api/ and /src/web/ have genuinely different conventions
  • Test directories with specific rules for test writing
  • Scripts or tooling directories with different code style expectations

When it creates overhead without benefit:

  • Single-app projects where convention differences are minor
  • Small repositories where one CLAUDE.md is already tight
  • Cases where the extra context loading costs more than the specificity saves

Measuring directory CLAUDE.md impact: Compare the token count of your root CLAUDE.md handling all conventions vs. a root file with shared rules and separate files per major directory. For a typical 3-part monorepo (api, web, shared):

ApproachTotal tokens at rootTokens when in /apiTokens when in /web
One root CLAUDE.md2,4002,4002,400
Root + directory files400 (root)400 + 600 = 1,000400 + 550 = 950

In this example, directory splitting saves 1,400 tokens when Claude is working in a specific directory. But when Claude needs cross-cutting context (refactoring across directories), it loads everything anyway.

The 40% Reduction Checklist

Applied to real projects, these techniques consistently deliver 35-45% token reduction while maintaining the same behavior.

Pass 1: Strip rationale (saves ~15%) Find every “because” and “so that” and “in order to” in your CLAUDE.md. Either delete the rationale or move it to a comment block that Claude can skip. AI models act on instructions, not rationale — and rationale dilutes instruction density.

Before (38 tokens):

Always use `unknown` instead of `any` for external data because `any` disables 
TypeScript's type checking and leads to runtime errors that are hard to debug.

After (12 tokens):

Use `unknown` not `any` for external data.

Pass 2: Collapse list items (saves ~10%) Look for bullet points that could be combined or made more terse.

Before (45 tokens):

- Run `pnpm test` to execute the test suite
- Run `pnpm lint` to check code style
- Run `pnpm build` to build for production
- Run `pnpm dev` to start the development server

After (25 tokens):

Commands: `pnpm dev` (dev server), `pnpm test` (tests), `pnpm lint` (lint), `pnpm build` (prod build)

Pass 3: Extract to @imports (saves ~20%) Move architecture documentation, detailed convention examples, and reference tables into imported files. Keep only the distilled rules in the root.

Pass 4: Review the examples (saves ~10%) Code examples in CLAUDE.md are expensive (code is high-token-density content). Remove examples that illustrate obvious concepts. Keep examples only for rules that are counterintuitive or where Claude consistently gets the pattern wrong without an example.

Pass 5: Consolidate redundant instructions Search for the same concept expressed multiple times. Pick the clearest expression, delete the others.

Measuring Before and After

Track these metrics before and after each optimization pass:

# Count tokens before
python3 -c "
import anthropic
client = anthropic.Anthropic()

# Load all imported files too
import os, re
main_content = open('CLAUDE.md').read()
imports = re.findall(r'^@import (.+)$', main_content, re.MULTILINE)
for imp in imports:
    imp_content = open(imp).read() if os.path.exists(imp) else ''
    main_content += '\n' + imp_content

result = client.messages.count_tokens(
    model='claude-opus-4-5',
    messages=[{'role': 'user', 'content': main_content}]
)
print(f'Total tokens (CLAUDE.md + imports): {result.input_tokens}')
"

Validate that the optimization did not degrade behavior:

# Run a behavior test — ask Claude to recite the key rules
claude -p "List the 5 most important constraints from my CLAUDE.md in order of importance"

# Run a task test — do a typical coding task and verify conventions are followed
claude -p "Write a new utility function that parses an ISO date string to a Date object"

Compare the outputs before and after optimization. If behavior is identical, the tokens you removed were not load-bearing.

Real Numbers: Before and After from Three Projects

Project 1: TypeScript/Next.js SaaS (4 developers)

  • Before: 3,200 tokens (one CLAUDE.md)
  • After: 480 tokens root + 1,100 tokens imports = 1,580 total (51% reduction)
  • Behavior change: None — ran the same test prompts before and after
  • Monthly savings at 200 sessions/day: ~330,000 tokens less from CLAUDE.md loading

Project 2: Python/FastAPI backend

  • Before: 1,800 tokens
  • After: 380 tokens root + 620 tokens imports = 1,000 total (44% reduction)
  • Key optimization: Removed extensive rationale and collapsed 12-line command list to 1 line

Project 3: Monorepo (Node API + React frontend + shared packages)

  • Before: 2,600 tokens (single file for everything)
  • After: 350 tokens root + 3 directory CLAUDE.mds totaling 900 tokens
  • In any given directory: Claude loads 350 + 300 = 650 tokens (75% reduction from the original 2,600)

The recurring pattern: first optimization pass usually finds 20-25% savings from rationale removal and prose consolidation alone. The @import split adds another 15-25% in directory-specific scenarios. Total: 35-50% reduction is consistently achievable without any loss in behavior.


Related Articles

Explore the collection

Browse all AI coding rules — CLAUDE.md, .cursorrules, AGENTS.md, and more.

Browse Rules