Claude Code Multi-Agent Orchestration AGENTS.md

Claude Code Multi-Agent Orchestration: 6 Patterns (2026)

The Prompt Shelf ·

Multi-agent orchestration is not a buzzword problem. It’s an engineering problem with clear tradeoffs: context isolation vs. coordination overhead, parallelism vs. token cost, flexibility vs. configuration complexity.

Claude Code offers three primitives for multi-agent work — subagents (within a session), agent teams (across sessions, experimental), and background agents (long-running, monitored via the agent view). The 2026 release also made it possible to reference subagent definitions as agent team members, enabling reusable role definitions that work in both contexts.

This guide presents 6 orchestration patterns, each grounded in the official Claude Code docs, with real AGENTS.md configs and an honest look at when each pattern justifies its cost.


The Primitives: What Claude Code Gives You

Before the patterns, a clear map of what’s available:

PrimitiveScopeCommunicationBest For
SubagentsWithin sessionReport to orchestrator onlyQuick parallel workers, context isolation
Agent teamsSeparate sessionsDirect peer messaging + shared task listComplex parallel work requiring debate
Background agentsLong-running, monitoredAgent View (web/desktop)Autonomous tasks, CI-style workflows
--worktree sessionsManual, separate terminalsNone (manual merge)Fully independent parallel branches

The patterns below use subagents and agent teams as the primary building blocks. Background agents are covered where they add value for specific use cases.


Pattern 1: Orchestrator-Worker

Description: One orchestrator agent manages multiple worker agents. Workers are specialized for specific tasks and report only to the orchestrator.

When to use: Task has clearly decomposable subtasks. Workers don’t need to know what other workers are doing. This is the baseline multi-agent pattern.

AGENTS.md Configuration

# agents

## orchestrator
description: Manages the overall task. Breaks work into subtasks, spawns specialized workers, synthesizes results, and produces the final output.
tools: read, task_create, agent_spawn
model: claude-opus-4-5

## code-writer
description: Writes implementation code based on a specification. Receives a clear spec, produces working code with tests. Does not modify existing files.
tools: read, write, bash, edit
isolation: worktree
model: claude-sonnet-4-6

## code-reviewer
description: Reviews code diffs for correctness, security, and style. Returns structured findings with severity ratings.
tools: read, bash
model: claude-haiku-4-5

## doc-writer
description: Writes technical documentation for a completed implementation. Receives the implementation and writes clear API docs.
tools: read, write
model: claude-haiku-4-5

Orchestration Prompt

You are the orchestrator.

Task: Implement a rate-limiting middleware for our Express API.

Plan:
1. Spawn code-writer to implement src/middleware/rate-limiter.ts with Redis backend
2. Spawn code-reviewer to review the implementation
3. If reviewer finds critical issues, spawn code-writer again with reviewer feedback
4. Spawn doc-writer to document the middleware API

Return a summary of: what was implemented, any issues found/fixed, and where docs were written.

Cost Analysis

RoleModelEstimated TokensCost Estimate
OrchestratorOpus15K in / 8K out~$0.35
Code-writerSonnet25K in / 20K out~$0.45
Code-reviewerHaiku30K in / 5K out~$0.05
Doc-writerHaiku20K in / 8K out~$0.04
TotalMixed~90K tokens~$0.89

Compare: a single Opus session for the same task typically uses 40–60K tokens at ~$1.20+, with less specialized quality on each component.


Pattern 2: Parallel Reviewers

Description: Spawn multiple reviewers simultaneously, each applying a different lens to the same artifact. The orchestrator synthesizes findings.

When to use: Code review, security audits, architecture evaluations. Any task where multiple independent perspectives add value. Single reviewers tend to anchor on the first issue type they find.

AGENTS.md Configuration

# agents

## security-reviewer
description: Reviews code changes specifically for security vulnerabilities. Checks input validation, authentication, authorization, injection risks, and secrets exposure. Returns findings with OWASP category and severity.
tools: read, bash
model: claude-sonnet-4-6

## performance-reviewer
description: Reviews code changes for performance implications. Checks algorithmic complexity, database query patterns, caching opportunities, and memory usage. Returns findings with estimated impact.
tools: read, bash
model: claude-sonnet-4-6

## test-coverage-reviewer
description: Reviews test suites for coverage gaps. Identifies untested edge cases, missing error handling tests, and integration test opportunities.
tools: read, bash
model: claude-haiku-4-5

## review-synthesizer
description: Takes structured findings from multiple reviewers and produces a prioritized action list. Groups related findings, assigns overall risk rating.
tools: read, write
model: claude-opus-4-5

Orchestration Prompt

Review PR #142 which adds OAuth2 authentication to the API.

Spawn three reviewers in parallel:
- security-reviewer: focus on the OAuth2 implementation (token validation, scope enforcement, redirect URI handling)
- performance-reviewer: check token storage, Redis usage patterns, and session lookup efficiency
- test-coverage-reviewer: audit the test files added in this PR

Wait for all three to complete, then spawn review-synthesizer with all three reports.
Return the synthesizer's prioritized action list.

Implementation via Agent Teams

With agent teams enabled, reviewers can message each other to challenge findings:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}
Create an agent team. Spawn three reviewer teammates:
- security-reviewer (use the security-reviewer agent type)
- performance-reviewer (use the performance-reviewer agent type)
- test-coverage-reviewer (use the test-coverage-reviewer agent type)

After each reviewer completes their review, have them share findings with each other. Security and performance reviewers should check if the other's findings have implications for their area. Synthesize the final report.

The peer-messaging capability of agent teams is most valuable here: a performance finding about connection pooling might have security implications, and having the security reviewer respond to that finding produces better coverage than two isolated reports.


Pattern 3: Pipeline Chain

Description: Agents run in sequence where each agent’s output is the input for the next. Each stage transforms the artifact.

When to use: Multi-stage transformations with clear checkpoints. Research → planning → implementation → testing → documentation pipelines. Each stage benefits from a fresh context window with only the previous stage’s output.

AGENTS.md Configuration

# agents

## researcher
description: Researches a technical topic and returns structured findings. Output: markdown document with key facts, best practices, library recommendations, and links. Does not write code.
tools: read, bash
model: claude-sonnet-4-6

## architect  
description: Takes research findings and designs a technical solution. Output: architecture document with component diagram (text-based), API contracts, data models, and implementation sequence. Does not write code.
tools: read, write
model: claude-opus-4-5

## implementer
description: Takes an architecture document and implements the solution. Writes all files specified in the architecture document. Runs tests to verify the implementation.
tools: read, write, bash, edit
isolation: worktree
model: claude-sonnet-4-6

## validator
description: Takes an implementation and validates it against the original architecture document. Checks that all components were implemented, tests pass, and API contracts match. Returns pass/fail with details.
tools: read, bash
model: claude-sonnet-4-6

Orchestration Prompt

Build a webhook event processing system.

Run this pipeline:
1. researcher: research webhook patterns for Node.js. Focus on reliability, idempotency, and retry handling.
2. architect: design the system based on researcher's output. Include: event queue, processor workers, dead letter queue, admin API.
3. implementer: implement based on architect's design. Use TypeScript, BullMQ for queuing.
4. validator: validate the implementation against the architecture document.

If validator fails, spawn implementer again with validator's feedback. Maximum 2 retry passes.

Return: architecture document path, implementation summary, validator report.

Pipeline State Management

For long pipelines, persist state between stages to enable recovery:

#!/usr/bin/env bash
# pipeline-state.sh — write stage output to disk

STAGE="$1"
OUTPUT="$2"
STATE_DIR="${CLAUDE_PROJECT_DIR}/.claude/pipeline-state"
mkdir -p "$STATE_DIR"

echo "$OUTPUT" > "$STATE_DIR/$STAGE.md"
echo "Stage $STAGE complete, output saved to $STATE_DIR/$STAGE.md" >&2

Reference previous stage outputs by path in each agent’s prompt:

architect: design the system based on the research in .claude/pipeline-state/researcher.md

Pattern 4: Debate and Convergence

Description: Spawn agents with explicitly conflicting perspectives on the same problem. Force them to challenge each other’s conclusions. Use agent teams for peer messaging.

When to use: Architectural decisions, technology choices, debugging unknown root causes. Sequential investigation anchors on the first plausible answer. Adversarial agents are more likely to surface the correct one.

This pattern requires agent teams (experimental).

AGENTS.md Configuration

# agents

## hypothesis-a
description: Investigates the hypothesis that the performance degradation is caused by database query patterns. Gathers evidence, runs EXPLAIN ANALYZE, checks slow query logs.
tools: read, bash
model: claude-sonnet-4-6

## hypothesis-b
description: Investigates the hypothesis that the performance degradation is caused by memory leaks in the application layer. Gathers evidence, checks heap snapshots, reviews event listener patterns.
tools: read, bash
model: claude-sonnet-4-6

## hypothesis-c
description: Investigates the hypothesis that the performance degradation is caused by network I/O — specifically third-party API calls with no timeouts. Gathers evidence, checks API call patterns and timeout configurations.
tools: read, bash
model: claude-sonnet-4-6

## devil-advocate
description: Reviews the evidence from all investigators and actively tries to disprove their conclusions. Points out weak evidence, alternative explanations, and gaps in reasoning.
tools: read
model: claude-opus-4-5

Orchestration Prompt (Agent Teams)

Create an agent team to debug our API performance degradation (P95 latency 4s, was 800ms two weeks ago).

Spawn four teammates:
- hypothesis-a: database investigation
- hypothesis-b: memory leak investigation  
- hypothesis-c: network I/O investigation
- devil-advocate: challenges all three investigators

Protocol:
1. All three hypothesis agents investigate simultaneously and share findings
2. devil-advocate reviews all findings and sends challenges to each investigator
3. Investigators respond to challenges with counter-evidence
4. After one round of debate, all four agents vote on the most likely root cause
5. Report consensus and evidence strength

Do not commit any fixes yet. Investigation only.

Why This Works Better Than Sequential Investigation

A single agent investigating the same problem tends to:

  1. Find the first plausible explanation
  2. Interpret subsequent evidence to support that explanation (confirmation bias)
  3. Stop investigating once a theory is internally consistent

Three agents with separate context windows cannot anchor on each other’s theory. The devil’s advocate adds a fourth pressure: each investigator must defend their evidence quality, not just their conclusion.


Pattern 5: Fan-Out / Fan-In

Description: Spawn N agents to process N independent work items in parallel, then merge all results into one artifact.

When to use: Large codebase migrations, batch documentation generation, multi-file refactoring where each file or module is independent. Classic map-reduce structure.

AGENTS.md Configuration

# agents

## module-migrator
description: Migrates a single module from the old API to the new API. Receives a file path and migration guide. Writes the migrated file in-place and runs module-level tests. Returns: file path, changes summary, test result.
tools: read, write, bash, edit
isolation: worktree
model: claude-sonnet-4-6

## migration-aggregator
description: Takes migration reports from multiple module-migrator agents and produces a single migration summary. Identifies common patterns, lists failures, and outputs a consolidated PR description.
tools: read, write
model: claude-haiku-4-5

Orchestration Prompt

We're migrating from axios to the native fetch API across our codebase.

Migration guide: see .claude/migration-guides/axios-to-fetch.md

Files to migrate (all independent, no shared state):
- src/api/users.ts
- src/api/products.ts
- src/api/orders.ts
- src/api/webhooks.ts
- src/api/analytics.ts

Spawn one module-migrator for each file simultaneously (5 in parallel). Each migrator should work in a worktree, migrate its file, run tests, and return a report.

After all 5 complete, spawn migration-aggregator with all reports. Return the aggregated summary.

Scaling Fan-Out

For large fan-outs (20+ files), batch the work rather than spawning 20 simultaneous agents. Each batch reduces coordination overhead:

Divide 40 files into 4 batches of 10. 
Spawn 4 module-migrator agents, each handling their batch sequentially.
This uses 4 agents instead of 40, with each agent handling 10 files in sequence.

The tradeoff: less parallelism, lower token overhead per batch boundary, easier to monitor.


Pattern 6: Self-Healing Pipeline with Retry Logic

Description: Build retry and recovery logic into the orchestration prompt itself. Validators gate progression. Failed stages retry with feedback from the failure.

When to use: Mission-critical code generation, automated test writing, data pipeline construction where correctness gates are non-negotiable.

AGENTS.md Configuration

# agents

## spec-writer
description: Takes a feature description and writes a formal specification: function signatures, input/output types, edge cases, and acceptance criteria. Does not write implementation code.
tools: read, write
model: claude-opus-4-5

## test-writer
description: Takes a specification and writes comprehensive tests BEFORE implementation. Tests should cover: happy path, edge cases, error conditions, and integration scenarios. Uses TDD approach.
tools: read, write, bash
isolation: worktree
model: claude-sonnet-4-6

## implementer
description: Takes a specification and failing tests, writes implementation code that makes all tests pass. Does not modify test files.
tools: read, write, bash, edit
isolation: worktree
model: claude-sonnet-4-6

## quality-gate
description: Runs the full test suite and static analysis. Returns structured report: test results (pass/fail counts), coverage percentage, lint errors. Does not modify code.
tools: read, bash
model: claude-haiku-4-5

Orchestration Prompt with Built-In Retry

Build a currency conversion service using this TDD pipeline.

Feature: Convert amounts between currencies using live exchange rates from exchangeratesapi.io.
Requirements: handle 170 currencies, cache rates for 1 hour, return conversion with rate used and timestamp.

PIPELINE:
1. spec-writer: write the formal spec to .claude/specs/currency-converter.md

2. test-writer: write tests to src/__tests__/currency-converter.test.ts based on the spec
   - Tests must be comprehensive (minimum 15 test cases)
   - All tests should FAIL at this stage (no implementation yet)

3. quality-gate: verify tests exist and fail as expected
   IF quality-gate reports < 15 tests or tests pass at this stage: 
     spawn test-writer again with feedback, retry max 2 times

4. implementer: write src/services/currency-converter.ts to make all tests pass
   
5. quality-gate: run full test suite
   IF any tests fail:
     spawn implementer again with failing test names and error output
     retry max 3 times
   IF coverage < 80%:
     spawn test-writer to add missing test cases, then implementer to make them pass

6. Return: spec path, implementation path, final test results, coverage report.

ABORT if: after maximum retries a stage still fails. Report which stage failed and why.

Hooks to Support Self-Healing

Add a TaskCompleted hook to log each stage’s outcome, enabling audit trails for retry pipelines:

{
  "hooks": {
    "TaskCompleted": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "bash -c 'jq -c \"{ts: \\\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\\\", task: .task_name, status: .task_status}\" >> ${CLAUDE_PROJECT_DIR}/.claude/pipeline-audit.jsonl'"
          }
        ]
      }
    ]
  }
}

Native Claude Code vs. LangGraph vs. CrewAI: Decision Tree

With these 6 patterns in mind, when do you reach for a framework instead of native Claude Code?

Is the entire workflow inside Claude Code sessions?
├── YES → Use native subagents/agent teams. No framework needed.

└── NO → Do other systems participate in the workflow?
    ├── YES (external APIs, other LLMs, custom logic)
    │   └── Does your team write Python/JS workflows professionally?
    │       ├── YES → LangGraph (stateful, complex routing) 
    │       │         or CrewAI (role-based, simpler API)
    │       └── NO → Claude Code MCP tools + hooks can bridge many
    │                external integrations without a framework

    └── NO (everything is Claude Code, but needs complex state)
        └── Complexity is likely in the orchestration prompt, not
            the framework. Simplify the prompt before adding a framework.

Native Claude Code Strengths

  • Zero setup: subagents work immediately, agent teams need one env var
  • AGENTS.md portability: definitions work across all Claude Code contexts
  • Integrated with hooks: gate quality, send notifications, audit automatically
  • Context sharing: subagents load the same CLAUDE.md, MCP servers, and skills as the main session

LangGraph Strengths

  • Complex conditional routing: if/else branching that’s hard to express in natural language
  • Persistent state across sessions: state graphs survive restarts
  • Multi-LLM workflows: mix Claude, GPT-4, local models in one graph
  • Streaming and checkpointing: built-in for long-running workflows

CrewAI Strengths

  • Declarative role definitions: clean YAML/Python API for team roles
  • Built-in sequential/parallel process types: less prompt engineering for standard patterns
  • Tool ecosystem: pre-built integrations with common services

The honest answer for most Claude Code users: native subagents cover 80% of multi-agent needs. Reach for a framework when the workflow complexity exceeds what you can express cleanly in an orchestration prompt.


Token Cost Reality Check

Multi-agent orchestration multiplies token usage. Each agent has its own context window, and coordination overhead adds up. Real estimates for each pattern:

PatternTypical Token RangeCost Range (Sonnet)When Worth It
Orchestrator-Worker80K–150K$1.50–$3.00Tasks > 45 min single-session
Parallel Reviewers60K–120K$1.00–$2.50Quality-critical PRs
Pipeline Chain100K–200K$2.00–$4.00Multi-stage transformations
Debate + Convergence150K–300K$3.00–$6.00Ambiguous root cause debugging
Fan-Out / Fan-InVariable (N × 30K)$0.60×NBatch processing, migrations
Self-Healing Pipeline200K–400K$4.00–$8.00High-stakes code generation

Cost reduction tactics:

  • Use Haiku for reviewers and validators (high context, concise output)
  • Use Opus only for orchestrators and architects (needs broad judgment)
  • Use Sonnet for implementers (best code/cost ratio)
  • Use isolation: worktree to limit context bleed between agents
  • Set explicit output length limits in subagent definitions: “Return a summary under 500 words”

AGENTS.md Scope and Reuse

Subagent definitions work across multiple contexts:

ScopeLocationWhen Loaded
Project.claude/agents/All sessions in this project
User~/.claude/agents/All sessions for this user
PluginPlugin directoryWhen plugin is active
FrontmatterSession/skill fileThat session’s lifetime

Reference project-scope definitions as agent team members:

Spawn a teammate using the security-reviewer agent type to audit src/auth/.

Claude honors the definition’s tools allowlist and model from the AGENTS.md definition. Team coordination tools (SendMessage, task management) are always available to teammates regardless of tools restrictions.



Frequently Asked Questions

Q: What is the difference between subagents and agent teams?

Subagents run within a single session and only report back to the main agent. Agent teams consist of separate Claude Code sessions that can communicate directly with each other and coordinate through a shared task list. Subagents are better for focused tasks where only results matter; agent teams are better when agents need to debate and iterate.

Q: How many subagents can run in parallel?

No documented hard limit. Anthropic recommends 3–5 for most workflows. For fan-out patterns with many items, batch work into groups of 5–10 rather than spawning N agents simultaneously.

Q: Can I use Claude Haiku for some subagents to reduce costs?

Yes. Specify model: claude-haiku-4-5 in the subagent definition. Use Haiku for validators, reviewers, and doc writers. Reserve Opus for orchestrators and architects. This model mixing reduces total cost by 40–60% compared to all-Opus setups.

Q: Do subagents have access to the same CLAUDE.md?

Yes. Subagents load the same project context: CLAUDE.md, MCP servers, skills. They do not inherit the main agent’s conversation history.

Q: When should I use LangGraph or CrewAI instead?

When the workflow involves non-Claude Code systems, needs persistent state across restarts, or mixes multiple LLM providers. For most Claude Code workflows, native subagents are sufficient without additional framework overhead.

Related Articles

Explore the collection

Browse all AI coding rules — CLAUDE.md, .cursorrules, AGENTS.md, and more.

Browse Rules