Claude Code is a natural implementation-first coder. Ask it to build a feature and it will write the code, then write the tests. This is the opposite of test-driven development, and it matters: tests written after implementation tend to test what the code does rather than what it should do. They find far fewer bugs.
Making Claude Code do TDD properly — write a failing test, make it pass, refactor — requires more than asking nicely in a prompt. Prompts get forgotten across a session, especially after /compact. You need enforcement at the infrastructure level: hooks that block writes without corresponding tests, CLAUDE.md rules that describe the workflow, and optionally a skill that executes the full loop autonomously.
This guide covers the practical stack for enforcing TDD in Claude Code projects. See enforcement patterns from real open-source setups in our gallery.
Why Prompts Alone Fail
Telling Claude to “use TDD” at the start of a session works for the first few tasks. By the fifth task, especially after context compaction, it reverts. The instruction drifts to the edge of its attention.
Three things actually enforce TDD across a full session:
- CLAUDE.md — explicit, session-persistent rules that Claude reads at every start
- Hooks — deterministic, Claude-independent enforcement (hooks run even if Claude ignores the rules)
- A TDD skill — a structured prompt Claude can invoke that makes the loop explicit
They are most effective in combination.
Step 1: CLAUDE.md Rules
Put these in your project CLAUDE.md. Keep them at the top — Claude Code weighs content higher when it appears early in the file.
## Development workflow
**TDD is mandatory. No exceptions.**
Order of operations for any new function, class, or behavior:
1. Write a test that fails (verify it fails with `[test command]`)
2. Write the minimum implementation to make it pass
3. Refactor — clean up without changing behavior
4. Run the full test suite before marking the task done
**Enforcement:**
- Never write implementation code before the test exists
- Never mark a task done if tests are failing
- If asked to "just implement it quickly", still write the test first
- If a test file does not exist for a new module, create it before touching the source file
**Test file naming:**
- Python: `tests/test_{module}.py`
- TypeScript: `{module}.test.ts` or `{module}.spec.ts` co-located with source
- Go: `{module}_test.go` in the same package
This alone improves compliance significantly. But it is not enough on its own.
Step 2: Hooks for Automated Enforcement
Hooks run outside Claude’s decision-making. They are deterministic. This is what makes them the right layer for enforcement.
Block implementation writes when no test file exists
This PreToolUse hook intercepts Write and Edit tool calls. If Claude is trying to write to a source file but no corresponding test file exists, it blocks the action and tells Claude why.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "python3 ~/.claude/hooks/tdd_guard.py"
}
]
}
]
}
}
The tdd_guard.py script reads the tool input from stdin and checks whether a test file exists:
#!/usr/bin/env python3
"""
TDD guard hook for Claude Code.
Blocks writes to source files if no corresponding test file exists.
Place at: ~/.claude/hooks/tdd_guard.py
"""
import json
import sys
import os
from pathlib import Path
data = json.load(sys.stdin)
tool_name = data.get("tool_name", "")
tool_input = data.get("tool_input", {})
# Only check Write and Edit operations
if tool_name not in ("Write", "Edit", "MultiEdit"):
sys.exit(0)
# Get the target file path
file_path = tool_input.get("file_path", "")
if not file_path:
sys.exit(0)
path = Path(file_path)
# Skip test files themselves (they are allowed)
name = path.stem
if name.startswith("test_") or name.endswith(("_test", ".test", ".spec")):
sys.exit(0)
# Skip non-source files
source_extensions = {".py", ".ts", ".tsx", ".js", ".jsx", ".go", ".rs"}
if path.suffix not in source_extensions:
sys.exit(0)
# Skip __init__ files and config files
if name in ("__init__", "index", "types", "constants", "config"):
sys.exit(0)
# Look for a corresponding test file
test_candidates = [
path.parent / f"test_{name}{path.suffix}", # test_module.py
path.parent / f"{name}.test{path.suffix}", # module.test.ts
path.parent / f"{name}.spec{path.suffix}", # module.spec.ts
path.parent / "tests" / f"test_{name}{path.suffix}", # tests/test_module.py
path.parent.parent / "tests" / f"test_{name}{path.suffix}",
]
# For Go: test file is in the same package with _test.go suffix
if path.suffix == ".go":
test_candidates.append(path.parent / f"{name}_test.go")
test_exists = any(c.exists() for c in test_candidates)
if not test_exists:
print(
f"TDD guard: No test file found for {file_path}.\n"
f"Write the test first. Looked for:\n"
+ "\n".join(f" - {c}" for c in test_candidates[:4]),
file=sys.stderr
)
sys.exit(2) # Exit 2 blocks the action and sends stderr to Claude
sys.exit(0)
When Claude tries to write src/auth/token_validator.py without a test file, it receives:
TDD guard: No test file found for src/auth/token_validator.py.
Write the test first. Looked for:
- src/auth/test_token_validator.py
- src/auth/token_validator.test.py
- src/auth/tests/test_token_validator.py
- tests/test_token_validator.py
Claude adjusts. It creates the test file first.
Auto-run tests after every source file edit
This PostToolUse hook runs your test suite after any source file edit. It feeds the results back to Claude so it can see immediately whether the implementation makes the tests pass.
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/run_tests.sh"
}
]
}
]
}
}
#!/bin/bash
# ~/.claude/hooks/run_tests.sh
# Reads tool input from stdin, runs tests if a source file was edited
input=$(cat)
file_path=$(echo "$input" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('tool_input',{}).get('file_path',''))" 2>/dev/null)
# Skip if no file path or if it's a test file (already ran)
if [ -z "$file_path" ]; then exit 0; fi
# Detect project type and run appropriate test command
if [ -f "pyproject.toml" ] || [ -f "setup.py" ]; then
result=$(python3 -m pytest --tb=short -q 2>&1 | tail -20)
elif [ -f "package.json" ]; then
result=$(npm test -- --passWithNoTests 2>&1 | tail -20)
elif [ -f "go.mod" ]; then
result=$(go test ./... 2>&1 | tail -20)
else
exit 0
fi
echo "$result"
# Always exit 0 here — we want Claude to see the output but not block the action
exit 0
The hook does not block on test failure (exit 0, not exit 2). That is intentional — you want Claude to see the failure and respond to it, not be hard-stopped. The guard hook handles the “no test exists” case; this hook handles the “tests exist but are failing” feedback loop.
Step 3: A TDD Skill
Skills are markdown prompt files that Claude can invoke explicitly. Placing a TDD skill in .claude/skills/tdd.md gives Claude a structured protocol to follow when you say “use TDD for this”:
# TDD Skill
When invoked, follow this exact sequence. Do not skip steps.
## Phase 1: Red (write a failing test)
1. Identify the smallest behavior to implement
2. Write a test for that behavior only
3. Run the test and confirm it fails
4. Show me the failure output — verify it is failing for the right reason
(failing because code doesn't exist = correct; failing with syntax error = fix the test)
## Phase 2: Green (make it pass)
1. Write the minimum implementation to make the test pass
2. No gold-plating. No additional features. Only what the test requires.
3. Run the test. If it fails, fix the implementation (not the test).
4. Repeat until the specific test passes.
5. Run the full test suite. Fix any regressions before continuing.
## Phase 3: Refactor
1. With tests green, clean up the implementation
2. Extract duplication, improve naming, simplify logic
3. Re-run tests after each refactor step
4. Do not change behavior — only structure
## Repeat
Return to Phase 1 for the next behavior. One behavior per loop iteration.
## Rules
- If you are tempted to write more than one test at a time, write only one
- If the implementation is obvious, write it anyway — the test documents the contract
- If a test is hard to write, that is design feedback — simplify the interface
Invoke this from your session with:
/tdd implement the password reset email flow
Claude runs the full red-green-refactor loop, narrating each phase.
Handling Legacy Code
TDD guards break down when working in codebases where test coverage is sparse. The guard hook will block every edit because nothing has test files yet.
Disable the guard for specific directories with an exception list:
# Add to tdd_guard.py, after the path is determined:
EXCLUDED_DIRS = {
"migrations",
"legacy",
"scripts",
"generated",
"vendor",
"node_modules",
}
if any(part in EXCLUDED_DIRS for part in path.parts):
sys.exit(0)
Or disable it entirely for the session when doing a legacy refactor:
# In your shell — temporarily bypass the hook
CLAUDE_SKIP_TDD_GUARD=1 claude
Add a check to the hook:
if os.environ.get("CLAUDE_SKIP_TDD_GUARD"):
sys.exit(0)
Real-World Setup: The Minimal Stack
If you want to implement this gradually, start here:
Week 1: Add the CLAUDE.md rules only. Observe how often Claude follows them unprompted.
Week 2: Add the run_tests.sh PostToolUse hook. This alone is high-value — you get immediate test feedback without blocking anything.
Week 3: Add the tdd_guard.py PreToolUse hook for new projects or modules. Do not enable it retroactively on legacy code.
As needed: Add the TDD skill for complex features where you want explicit loop control.
What This Does Not Solve
The hook-based approach enforces test file existence, not test quality. Claude can write a test that trivially passes:
def test_token_validator():
assert True # placeholder
This defeats the purpose. The CLAUDE.md rule “verify the test fails before writing implementation” is your defense against this — Claude should confirm red before green. You can add an optional prompt hook that asks Claude to confirm test failure, but this adds latency.
The deeper fix is code review. Treat AI-generated tests with the same skepticism as AI-generated implementation code. The hooks and skills improve the process; they do not replace judgment.