In late March 2026, Anthropic’s engineering team posted an unusual acknowledgment: users were hitting Claude Code usage limits “way faster than expected.” It became one of the most-discussed threads in r/ClaudeAI almost immediately.
The frustration is understandable. You’re mid-session, the model is finally understanding your codebase, and then — context full, quality drops, session needs to restart.
Here is what is actually happening and how to fix it.
Why the Context Window Fills Faster Than You Expect
Claude Code’s context window technically supports up to 1 million tokens on Max/Team/Enterprise plans with Opus 4.6. But the practical limit for sustained, high-quality output is lower — and the gap between theoretical capacity and useful capacity is where most people get confused.
Several things are consuming tokens you might not be thinking about:
Tool outputs are verbose. Every time Claude runs a bash command, reads a file, or calls a tool, the output goes into the context. A grep that returns 500 matching lines is 500 lines of context. A test suite that prints a long stack trace on failure? All of it lands in the window.
CLAUDE.md loads on every session. If your CLAUDE.md is 2,000 tokens, that’s 2,000 tokens consumed before the first message. In a long session with multiple compactions, it reloads each time.
Files read explicitly stay in context. Unlike humans who can “set aside” information, Claude carries everything it has read until the context is compacted or the session restarts.
The conversation history itself grows. Your messages, Claude’s responses, the back-and-forth — this accumulates. A 50-message session with detailed responses can consume tens of thousands of tokens just in conversation history.
Parallel subagent results. If you run multiple subagents and they return verbose summaries, those summaries all land in your main context at once.
Measuring Your Actual Usage
Before optimizing, get a baseline. Claude Code shows token usage at the bottom of each response. Enable the --verbose flag to see more detail:
claude --verbose
Watch where the big spikes happen. Usually it’s one of: a large file being read, a command with verbose output, or a subagent returning a wall of text.
Fix 1: .claudeignore
The single highest-impact change for most codebases. Create a .claudeignore file at your project root using the same syntax as .gitignore:
# Build artifacts
dist/
build/
.next/
out/
# Dependencies
node_modules/
vendor/
.venv/
__pycache__/
# Generated files
*.lock
*.min.js
*.min.css
coverage/
.nyc_output/
# Large data files
*.csv
*.sql
*.dump
data/
# Editor and OS files
.DS_Store
.idea/
.vscode/
*.log
With .claudeignore in place, Claude will not read these files even if they exist in your project. For a typical Node.js project, excluding node_modules/ alone can remove millions of potential tokens from Claude’s searchable scope.
Check what Claude is actually reading with:
claude --verbose 2>&1 | grep "Reading file"
You may find it reading files you would never expect — auto-generated client code, lock files, coverage reports.
Fix 2: Lean CLAUDE.md
CLAUDE.md is reloaded frequently. Every token in it is overhead. The instinct is to make it comprehensive — every convention, every constraint, every piece of context. Resist this.
Apply these rules to your CLAUDE.md:
Include only what changes Claude’s behavior. If the instruction doesn’t make Claude do something differently than it would by default, cut it.
No examples in CLAUDE.md. Move code examples to a separate file and reference the path. Claude can read the file on demand; it doesn’t need to carry examples in the base context.
No history or changelog. CLAUDE.md is not a project log. Move historical context elsewhere.
Use subagents for domain-specific context. If you have complex backend conventions and complex frontend conventions, don’t stuff both into one CLAUDE.md. Create subagents with domain-specific system prompts.
A lean CLAUDE.md for most projects should be under 500 tokens. If yours is over 2,000, audit every section and ask: does this change Claude’s behavior, or does it just make me feel covered?
Fix 3: Compaction Strategy
Compaction (/compact) summarizes the conversation history into a dense representation, freeing up significant context space. Claude Code does this automatically when the context gets near the limit, but you can trigger it manually at natural breakpoints.
When to compact:
- After completing a distinct phase of work (research done, moving to implementation)
- After a subagent returns a verbose result you’ve already digested
- When you notice Claude starting to lose track of early context (this is a sign it’s getting full)
- Before starting a new task within the same session
What compaction loses: the full verbatim history of the conversation. What it keeps: the key decisions, the current state of the code, the active instructions.
The mental model: compact at task boundaries, not in the middle of a task. Compacting mid-task risks losing the reasoning chain Claude was following.
After compacting, always verify Claude still has the context it needs by asking it to summarize the current state or repeat back the active constraints.
Fix 4: Start New Sessions More Often
This feels like a step backward — why restart when you’ve spent 30 messages building up context? But a fresh session with a focused initial prompt often outperforms a bloated session that’s been running for hours.
When to start a new session:
- Moving from one feature to a different, unrelated feature
- After finishing a complete unit of work (PR merged, bug fixed)
- When you notice response quality degrading (often a context pressure symptom)
- Anytime the task is self-contained enough that it doesn’t need the history
A focused starting prompt for a new session takes 30 seconds to write. That 30 seconds buys you a fresh 1M token budget and a model that isn’t trying to juggle 6 hours of accumulated context.
Fix 5: Control Tool Output Verbosity
Bash commands can return enormous output. Common culprits:
# This returns every file path in your project
find . -type f
# This can return thousands of lines
npm test
# This floods with info-level logs
docker logs my-container
Pipe verbose commands through filters, and tell Claude to do the same:
Run the test suite but only show me failures, not passing tests.
Check the docker logs but filter to ERROR and WARN level only.
You can also pipe bash output to a file and have Claude read just the relevant section:
npm test 2>&1 > /tmp/test-output.txt
# Then: read /tmp/test-output.txt and summarize failures only
Fix 6: Delegate Research to Subagents
If you’re investigating an unfamiliar library, a bug in a third-party dependency, or a large part of the codebase you don’t normally work in — that’s a subagent job.
Research tasks are the worst context consumers. Reading docs, following references, checking examples — this adds thousands of tokens that mostly aren’t needed once you have the answer.
Create a researcher subagent in AGENTS.md:
## researcher
description: Investigates documentation, GitHub issues, source code, and external resources to answer specific technical questions. Returns a concise summary with the answer and relevant file paths. Use when you need to understand an unfamiliar library or trace a bug through external code.
tools: read, bash, web_search
model: claude-sonnet-4-6
Then delegate:
Use the researcher subagent to figure out how Prisma handles connection pooling in serverless environments.
Return a 3-paragraph summary with any relevant config options.
The research happens in a separate context. You get the answer. Your main context window is untouched.
The Read-Once Hook
One advanced pattern that appeared in the r/ClaudeAI community workarounds thread: use a SessionStart hook to load context once and prevent Claude from re-reading the same files repeatedly.
Create a hook in your settings.json:
{
"hooks": {
"SessionStart": [
{
"type": "command",
"command": "cat /path/to/project-context.md"
}
]
}
}
The hook output gets injected into context once at session start. Claude reads this on initialization rather than re-reading files mid-conversation. Particularly useful for architecture docs or API references that Claude would otherwise fetch repeatedly.
Putting It Together: A Session Workflow
Here is the full workflow for a context-efficient session:
-
Before opening Claude Code: Check
.claudeignoreis current. Trim CLAUDE.md to under 500 tokens if you have not recently. -
Start of session: Give a focused prompt scoping the task. Do not dump everything — Claude does not need historical context that isn’t relevant to today’s task.
-
During session: Use explicit file paths rather than asking Claude to search. Pipe verbose commands through filters. Delegate research to subagents.
-
Natural breakpoints: Compact. Verify Claude’s working state. If you’re switching to an unrelated task, start a new session.
-
When quality degrades: Don’t push through. Either compact or start a new session. Context pressure causes subtle reasoning failures that are hard to catch until you’re debugging wrong outputs.
The Real Issue: Usage Limits vs Context Limits
It is worth distinguishing two separate problems that often get conflated:
Context window pressure is about quality degradation as the window fills. This degrades gradually and is fully within your control to manage.
Usage limits are about Anthropic’s rate limits — maximum tokens consumed per hour or per session. These are server-side and you cannot directly control them, though using context more efficiently means you hit them less often.
The techniques above primarily address context window pressure. For usage limits specifically: Max and Team plans have substantially higher limits than Pro, and Anthropic increased capacity in response to the March 2026 complaints. If you’re on Pro and doing serious development work, the limits are genuinely restrictive — this is not a configuration problem, it’s a plan mismatch.
Managing context efficiently will not eliminate usage limits, but it reduces how often you collide with them and keeps your sessions productive longer before they kick in.