Getting Codex CLI running is straightforward. Getting it to do useful work reliably, at scale, without constant hand-holding — that takes deliberate workflow design.
This article picks up where the AGENTS.md setup guide left off. If you have not read that one, go there first: it covers how Codex discovers and merges instruction files, the 32 KiB budget constraint, and what to put in your AGENTS.md. Everything in this article assumes that foundation is in place.
What this article covers: seven workflow patterns that experienced Codex users have converged on, how to tune your AGENTS.md specifically for each pattern, an honest comparison of Codex CLI against Claude Code and Cursor on the axes that actually matter, practical cost optimization, and the failure modes that catch developers off guard when they try to scale up.
What Changed in 2026
Codex CLI shipped as a research preview in early 2025. By mid-year it had stabilized enough for production workflows, and the ecosystem around it — AGENTS.md tooling, CI integration, monorepo patterns — has matured considerably.
The most significant changes that affect how you design workflows:
Sandboxed execution is now the default. Codex runs in a network-disabled, write-controlled sandbox when you use --sandbox=full. In 2025, this was opt-in and noticeably slower. In 2026, the sandbox overhead has dropped to under 300ms on typical tasks, making it practical to leave on for all sessions. This changes the threat model for multi-step automation significantly — you can now let Codex run longer task chains without worrying about unintended file modifications spreading across the repo.
The approval model gained granularity. Instead of choosing between “approve everything” (--ask-for-approval never) and “approve every shell command” (always), you can now define approval policies in ~/.codex/config.toml that allow specific command classes automatically. Teams running Codex in CI use this extensively.
Sub-agent handoffs are stable. Codex can now invoke sub-agents defined in your AGENTS.md, wait for their output, and continue. In 2025, this required workarounds. In 2026, the ## Agents section in AGENTS.md is a first-class feature with documented behavior.
Context persistence across checkpoints. Sessions can checkpoint state between steps, allowing multi-hour autonomous runs that would have timed out or drifted in 2025.
These changes made the patterns below practical. Most of them were theoretically possible before — they just were not reliable enough to use in real work.
Workflow Pattern Catalog
Pattern 1: Inline Review
What it is: You write code, Codex reviews it — not for style, but for correctness, edge cases, and logical issues. The output is a structured list of specific problems with exact file locations, not generic advice.
When to use it: After completing a feature or a significant refactor, before opening a PR. Works best on code you wrote yourself (where you have blind spots) rather than unfamiliar legacy code.
Setup:
Create a review.md prompt template:
Review the diff at [DIFF_PATH] for the following, in order of priority:
1. Logic errors — conditions that evaluate incorrectly, off-by-one errors,
incorrect null checks
2. Unhandled edge cases — inputs not covered by the existing test suite
3. Boundary violations — anything that crosses the architectural lines defined
in AGENTS.md (database access outside repositories, business logic in routes,
etc.)
4. Security issues — injection vectors, credential exposure, improper auth checks
For each issue:
- File and line number
- One-sentence description of the problem
- Minimal code change that fixes it (no rewrites unless necessary)
Do not flag style issues. Do not suggest "consider using X instead of Y" unless
the current approach causes an actual bug.
Run it:
git diff main > /tmp/current.diff
codex --approval-mode never \
--model o4-mini \
"$(cat review.md | sed 's|\[DIFF_PATH\]|/tmp/current.diff|')"
AGENTS.md tuning for this pattern:
Add an explicit review scope section:
## Review Boundaries
When asked to review code:
- Reference the architectural layers defined in [Project Structure] above
- Flag violations by layer name (e.g., "business logic in route handler violates
services/ boundary")
- Test coverage gaps: list specific scenarios not covered, not a general "add
more tests"
- Security: check for [list your most common security concerns]
Do not flag:
- Formatting (handled by eslint/prettier)
- Variable naming unless it causes ambiguity
- "Could be simplified" observations unless the current version has a bug
The key insight: most developers use Codex for generation and miss its value as a reviewer. A structured review prompt that focuses on correctness rather than style produces more actionable output in less time than a human code review for catching logic errors.
Pattern 2: Spec-Driven Generation
What it is: You write a specification — in structured prose, not pseudocode — and Codex generates an implementation that satisfies it. The spec becomes the source of truth. Generated code is never committed without running it against the spec as a test.
When to use it: New features where you know what the behavior should be before you know how to implement it. Particularly effective for API endpoints, data transformations, and stateful logic where behavior is complex but writable.
Setup:
Specs use a consistent format:
# Spec: Rate Limiter
## Behavior
- Accepts: userId (string), action (string), windowMs (number), limit (number)
- Returns: { allowed: boolean, remaining: number, resetAt: Date }
- Tracks attempts per (userId, action) pair within the given window
- Window slides with each new request (not fixed calendar windows)
- Concurrent requests from the same userId are handled safely (no race conditions)
## Error cases
- userId empty string: throw ValidationError("userId required")
- limit < 1: throw ValidationError("limit must be positive")
- windowMs < 100: throw ValidationError("window too narrow")
## Constraints
- No external dependencies (Redis, etc.) — must work with in-memory storage
- Thread-safe without locks (use atomic Map operations)
- Memory: evict entries older than 2x windowMs to prevent leaks
## Test cases (must pass)
- fresh limiter: returns { allowed: true, remaining: limit - 1 }
- at limit: returns { allowed: false, remaining: 0 }
- after window expires: counter resets, returns { allowed: true, remaining: limit - 1 }
- concurrent requests at limit: exactly limit requests allowed, none extra
Generate from spec:
codex --approval-mode files-only \
--model o4-mini \
"Implement the spec at specs/rate-limiter.md.
Place the implementation in src/utils/rateLimiter.ts.
Place tests in tests/rateLimiter.test.ts.
Run the tests after generation and fix any failures before finishing."
AGENTS.md tuning:
## Spec-Driven Generation Rules
When implementing from a spec file:
1. Read the spec completely before writing any code
2. Implement only what the spec describes — do not add convenience methods
or extend behavior beyond the stated requirements
3. Generate tests that cover all listed test cases plus boundary conditions
not explicitly listed
4. Run tests before reporting completion
5. If a spec requirement is ambiguous or contradictory, stop and report the
specific conflict before guessing
Spec files are in specs/. Implementations go in src/ (matching existing
structure). Tests mirror src/ structure.
Why this beats “write me a rate limiter”: The spec approach forces you to articulate the exact behavior before generation begins. This prevents the most common failure mode of AI-generated code — technically working code that does not do what you actually needed.
Pattern 3: Test-First Refactor
What it is: Before touching existing code, Codex generates a comprehensive test suite against the current behavior. Refactoring then proceeds against this suite as a regression harness. No refactor is complete until all tests pass and coverage has not decreased.
When to use it: Refactoring legacy code where existing test coverage is sparse. Extracting a service from a monolith. Changing data models while keeping the external API stable.
The two-phase execution:
Phase 1 — Generate characterization tests:
codex --approval-mode never \
--model o4-mini \
"Generate characterization tests for src/legacy/orderProcessor.ts.
Coverage requirements:
- Every public method
- Every error path (look for try/catch blocks and conditional throws)
- At least 3 edge cases per method based on the parameter types
Do not modify the source file. Write tests to tests/legacy/orderProcessor.test.ts.
Run the test suite and confirm all tests pass against the current implementation.
Report: number of tests generated, coverage percentage achieved."
Phase 2 — Refactor against the suite:
codex --approval-mode files-only \
--model o4-mini \
"Refactor src/legacy/orderProcessor.ts to:
1. Extract database queries to src/repositories/orderRepository.ts
2. Remove all console.log statements (use the logger at src/utils/logger.ts)
3. Replace callback-style async with async/await throughout
Constraints:
- tests/legacy/orderProcessor.test.ts must continue to pass without modification
- Do not change any public method signatures
- Do not change the file's external exports
Run the test suite after each significant change. Do not proceed to the
next refactor step if tests are failing."
AGENTS.md tuning:
## Refactor Rules
Before any refactor:
- Confirm existing test coverage with: npm run test:coverage -- [file]
- If coverage < 70%, generate characterization tests first
During refactor:
- Run tests after each logical step (not just at the end)
- If a test fails, fix the implementation — do not modify the test unless
the test is clearly wrong about what the function should do
- Report coverage before and after
Refactoring means behavior-preserving change. New behavior is a feature,
not a refactor, and requires a spec.
Why test-first beats test-after: Generated tests written after a refactor tend to test the refactored implementation, not the original behavior. Characterization tests written against the original code capture actual behavior — including the undocumented behaviors that callers might depend on.
Pattern 4: Multi-Step Planning
What it is: For tasks that touch multiple files, subsystems, or require sequential decisions, Codex produces an explicit plan before executing. Each plan step is discrete and verifiable. Execution does not begin until the plan is approved.
When to use it: Adding a new entity type to a full-stack application (schema, migration, model, service, API, tests, documentation). Migrating from one library to another. Cross-cutting architectural changes.
The plan-execute protocol:
Step 1 — Generate plan only:
codex --approval-mode never \
--model o3 \
"Plan the implementation of [FEATURE_DESCRIPTION].
Output format:
- Numbered steps in execution order
- Each step: file(s) affected, type of change (create/modify/delete),
dependencies on previous steps, verification method
- Flag any steps that require external information (credentials, API schemas)
- Estimated step count and complexity
Do not write any code. Do not modify any files. Plan only."
Step 2 — Review and optionally edit the plan in your editor.
Step 3 — Execute with step-by-step approval:
codex --approval-mode files-only \
--model o4-mini \
"Execute the implementation plan at /tmp/feature-plan.md.
After each numbered step:
- Run relevant tests
- Confirm the step's verification method passes
- Briefly state what was done before moving to the next step
If a step fails its verification, stop and report the failure.
Do not attempt to compensate for a failed step by modifying other steps."
AGENTS.md tuning:
## Planning Protocol
When asked to plan (not implement):
- Output a numbered step list only
- Each step specifies: files, change type, dependencies, verification
- Mark steps that require human decisions with [DECISION REQUIRED]
- Do not write code or modify files during planning
When executing a plan:
- Treat the plan as immutable unless a step explicitly fails
- Run verification after each step
- On failure, report the step number, failure details, and stop — do not
try to recover autonomously
The planning model choice matters: Use o3 for planning (broader reasoning, better at identifying dependencies and edge cases) and o4-mini for execution (faster, cheaper, sufficient for discrete well-specified tasks). Switching models between phases is a meaningful optimization.
Pattern 5: Continuous Integration Gate
What it is: Codex runs as a CI check that reviews PRs automatically — not for style (your linter handles that) but for logic, security, and architectural boundary violations. Failed checks block merge.
When to use it: Teams where multiple developers use AI tools to generate code and want a second AI pass before human review.
CI configuration (GitHub Actions):
name: Codex Review Gate
on:
pull_request:
types: [opened, synchronize]
jobs:
codex-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generate diff
run: git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
- name: Run Codex review
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npm install -g @openai/codex
codex \
--ask-for-approval never \
--sandbox=full \
--model o4-mini \
"$(cat .codex/ci-review-prompt.md)" < /tmp/pr.diff
- name: Check review output
# Parse Codex output for severity markers and fail on CRITICAL findings
run: |
if grep -q "\[CRITICAL\]" /tmp/codex-review.md; then
echo "Critical issues found. Review output:"
cat /tmp/codex-review.md
exit 1
fi
The ci-review-prompt.md defines what counts as a CRITICAL finding in your codebase. Architectural violations, SQL injection vectors, missing auth checks — whatever your team cares about most. Everything else is advisory.
AGENTS.md for CI context:
## CI Review Context
You are running as a CI gate reviewer, not an interactive assistant.
Output format (strict):
[CRITICAL] <description> — <file>:<line>
[WARNING] <description> — <file>:<line>
[INFO] <description> — <file>:<line>
CRITICAL = blocks merge. Use for:
- Security vulnerabilities with clear exploit path
- Architectural boundary violations (defined in [Project Structure])
- Logic errors that will cause data corruption or incorrect results
WARNING = advisory. Use for:
- Missing test coverage for new logic
- Potential performance issues at scale
- Pattern inconsistencies
INFO = non-blocking observations.
Do not output anything except the classified finding list and a one-line summary.
Pattern 6: Incremental Documentation
What it is: Codex generates documentation for code as it changes — not as a batch retrospective, but triggered by specific file changes. The documentation stays in sync because it is generated immediately after each substantive change.
When to use it: Teams where documentation drift is a problem. Projects where the API surface changes frequently. Anywhere that “we’ll document it later” has a track record of not happening.
Implementation:
A simple git hook (post-commit) triggers documentation generation for changed files:
#!/bin/bash
# .git/hooks/post-commit
CHANGED=$(git diff HEAD~1 --name-only | grep -E '\.(ts|tsx|py|go|rs)$')
if [ -n "$CHANGED" ]; then
codex \
--ask-for-approval never \
--model o4-mini \
"Update JSDoc/docstring documentation for changed functions in:
$CHANGED
Rules:
- Only update docs for functions whose implementation changed
- Preserve existing doc blocks that are still accurate
- For new functions: generate full docblock (params, returns, throws, example)
- For modified functions: update the relevant sections only
Run: git add -u && git amend --no-edit"
fi
AGENTS.md tuning:
## Documentation Standards
When generating or updating documentation:
- JSDoc for TypeScript/JavaScript
- Google-style docstrings for Python
- Doc comments (///) for Go and Rust
Required sections for public functions:
- One-sentence description (what it does, not how)
- @param for each parameter with type and meaning
- @returns with type and meaning
- @throws for each error condition
- @example with the most common usage
Do not document private/internal functions unless they are complex enough
to warrant it (>30 lines, non-obvious algorithm).
Do not document getter/setter accessors.
Pattern 7: Scaffolding-Then-Implement
What it is: For large new features, Codex first generates the full file/directory structure and interface definitions — with empty implementations — and then implements each component in isolation. This prevents the common failure mode where an AI agent makes increasingly incoherent decisions as it fills in a large implementation all at once.
When to use it: New microservices. Significant new modules. Any feature touching more than 5-7 files.
Phase 1 — Scaffold:
codex --approval-mode files-only \
--model o3 \
"Scaffold the [FEATURE_NAME] module.
Create:
- Directory structure following the conventions in AGENTS.md
- TypeScript interface files for all data types
- Function signatures with return types (empty bodies with // TODO: implement)
- Test file stubs with describe blocks and it() stubs (empty)
- Export statements so modules are properly connected
Do not implement any function bodies. Do not write any logic.
The scaffold should compile with tsc --noEmit (only TODO comments in bodies)."
Phase 2 — Implement in sequence:
# Implement from dependencies inward — lowest-dependency modules first
codex --approval-mode files-only \
--model o4-mini \
"Implement src/[feature]/repository.ts.
The interfaces are defined in src/[feature]/types.ts.
The tests stubs are in tests/[feature]/repository.test.ts — fill in the
tests as you implement.
Run tests after implementation and fix any failures."
Repeat for each module in dependency order.
AGENTS.md tuning:
## Scaffolding Protocol
When scaffolding (not implementing):
- Create directory structure first
- Create interface/type files before implementation files
- Use // TODO: implement as the body of all functions
- All files must compile without errors (types must be correct even if bodies are empty)
- Report the scaffold plan (files created, dependency order for implementation)
before creating any files
Implementation follows scaffolding in separate sessions.
AGENTS.md Tuning for Codex CLI
Your AGENTS.md already covers the basics: commands, project structure, conventions. The patterns above add another layer — instructions that shape how Codex behaves in specific workflow contexts. Here is how to structure the file so it serves both purposes without becoming unmanageable.
Separate universal from workflow-specific
Universal instructions belong at the top-level of AGENTS.md. Workflow-specific instructions should be in a dedicated section that Codex can reference contextually:
# AGENTS.md
## Commands
[universal]
## Project Structure
[universal]
## Conventions
[universal]
---
## Workflow Instructions
These sections are referenced by specific prompts. If a prompt does not
reference a section, apply only the universal instructions above.
### Review Mode
[see Pattern 1 above]
### Spec-Driven Mode
[see Pattern 2 above]
### Refactor Mode
[see Pattern 3 above]
### Planning Mode
[see Pattern 4 above]
Prompts that use a specific pattern reference the relevant section explicitly: “Review this diff following the guidelines in the ‘Review Mode’ section of AGENTS.md.” This prevents Codex from mixing instructions across contexts.
The instruction budget problem
Every character in AGENTS.md is consumed before your first prompt. For a project with full workflow instructions, a well-structured AGENTS.md might run 8-12 KB. That leaves plenty of room within the 32 KiB default budget — but only if your global ~/.codex/AGENTS.md is lean.
Global AGENTS.md rule: under 3 KB. Personal preferences and defaults only. No project-specific content.
If you have a monorepo where different packages need different workflow instructions, use package-level AGENTS.md files:
repo-root/AGENTS.md # Universal: commands, structure, conventions
repo-root/packages/api/AGENTS.md # API-specific workflow instructions
repo-root/packages/ui/AGENTS.md # UI-specific workflow instructions
When Codex is in packages/api/, it merges both files — root first, package-specific second. The package-specific instructions take precedence by virtue of appearing later in the context.
Measuring instruction effectiveness
The most common question is “how do I know if my AGENTS.md is working?” The direct answer is to ask:
codex --ask-for-approval never \
"Summarize the instructions you have loaded for this session, organized by
section. Then tell me: what are the top three things I should do differently
in this project based on these instructions?"
If Codex cannot summarize the key constraints accurately, the instructions are not landing. Common reasons: file discovery failure (wrong path), size limit truncation (global file too large), or instructions too vague to parse as constraints.
Codex CLI vs Claude Code vs Cursor
This comparison uses a functional axis rather than a spec-sheet axis. The question is not “which tool has the most features” but “which tool handles each workflow pattern better.”
| Capability | Codex CLI | Claude Code | Cursor |
|---|---|---|---|
| Agentic autonomy | High — designed for long autonomous runs with sandboxing | High — native terminal + tool use, multi-session | Medium — editor-integrated, not designed for long autonomous tasks |
| Context window | ~128K (model-dependent) | Up to 1M (claude-sonnet-4-5) | ~200K (model-dependent) |
| AGENTS.md support | Native (first-class) | Via CLAUDE.md (superset format) | Via .cursorrules (limited hierarchy) |
| Sandbox execution | Yes, production-grade | Yes, via Docker or native process control | No native sandbox |
| CI/CD integration | Strong — designed for non-interactive use | Possible via headless mode | Not designed for CI |
| Inline review | Excellent — structured output, no IDE dependency | Excellent — can use slash commands | Good — in-editor context, limited output structuring |
| Spec-driven generation | Excellent — works well with file-based spec input | Excellent — handles multi-file generation well | Good — better for single-file generation |
| Test-first refactor | Good — requires explicit instructions | Excellent — native test running awareness | Good — in-editor test runner integration |
| Multi-step planning | Excellent — o3 planning + o4-mini execution model swap | Good — Claude handles planning within session | Limited — session length constraints |
| Approval granularity | High — config.toml approval policies | Medium — tool-level permissions | Low — largely auto-accept |
| Cost model | Pay per API call (OpenAI pricing) | Pro/Max subscription or API | Subscription + API costs |
| IDE required | No | No | Yes |
What Codex CLI does best: Long autonomous tasks, CI integration, multi-step workflows where you want structured plan-then-execute behavior, and anything where you need fine-grained control over what gets approved automatically. The model-swap pattern (o3 for planning, o4-mini for execution) is a genuine cost and quality optimization that neither Claude Code nor Cursor supports as cleanly.
What Claude Code does best: Tasks that require broad repository context (the 1M context window is meaningful for large codebases), CLAUDE.md configurations with Claude-specific hooks and memory systems, and teams already using Anthropic’s stack. Claude Code’s sub-agent system is more flexible than Codex’s for complex delegation patterns.
What Cursor does best: Interactive editing where you want to see changes in context, teams where IDE integration matters more than automation, and junior developers who benefit from in-editor hints rather than terminal output.
The practical answer for most teams: Codex CLI and Claude Code are complementary rather than competitive. Codex CLI handles autonomous automation and CI gates. Claude Code handles exploratory, session-based work where a developer is in the loop. Using both is a reasonable choice — AGENTS.md and CLAUDE.md serve overlapping roles and can share content.
Cost Optimization
Codex CLI costs scale with API calls, not subscription tiers. That makes cost optimization more important — and more tractable — than with subscription-based tools.
Model selection per pattern
The single biggest cost lever is matching model to task:
| Pattern | Recommended model | Why |
|---|---|---|
| Inline review | o4-mini | Review is structured output; reasoning quality doesn’t require o3 |
| Spec-driven generation | o4-mini | Spec provides sufficient context; the model follows it, not reasons about it |
| Test-first refactor (Phase 1: characterization) | o4-mini | Systematic enumeration, not creative reasoning |
| Test-first refactor (Phase 2: refactor) | o4-mini | Follows established tests; minimal reasoning overhead |
| Multi-step planning | o3 | This is the pattern where reasoning quality directly affects output quality |
| CI gate | o4-mini | Pattern matching against defined criteria, not open-ended reasoning |
| Scaffold | o3 | Interface design requires broader reasoning about system shape |
| Scaffold (implementation) | o4-mini | Execution of well-defined interfaces |
Using o3 only where it adds genuine value and o4-mini for everything else typically cuts costs by 60-70% compared to using o3 uniformly.
Prompt caching
Codex does not expose OpenAI’s prompt caching directly, but you can benefit from it by structuring your AGENTS.md and prompt templates to be consistent across calls:
- AGENTS.md content should be stable across sessions. Do not include timestamps or session-specific information.
- Prompt templates (like the review.md above) should have a static preamble that is the same every time, with only the variable portion (the diff, the spec path) changing. Cached prefixes count toward cache hits.
The sandbox overhead tradeoff
--sandbox=full adds latency but can save money on failed runs. A misconfigured Codex session that overwrites the wrong files and then has to be rolled back can waste more tokens (and developer time) than the sandbox overhead. For automated workflows, sandbox costs are justified.
For interactive sessions where you are present to catch issues, --sandbox=none is faster. The key is being consistent so you do not accidentally run with sandbox off in a context where you assumed it was on.
Scoping context aggressively
Codex reads the files you point it at. More files means more tokens. For the patterns above:
- Inline review: feed only the diff, not the full files
- Spec-driven generation: feed the spec + the interfaces it depends on, not the whole codebase
- Test-first refactor: feed only the target file and its direct imports
- Multi-step planning: feed the file list for the affected area, not all files
Use .codexignore to exclude files that should never be in context: build artifacts, generated files, node_modules (if not gitignored), large data files.
Pitfalls and FAQ
Q: Codex is not following the instructions in my AGENTS.md.
Most common causes in order of frequency:
-
File not discovered: Run the summarization check (
codex --ask-for-approval never "Summarize your loaded instructions"). If AGENTS.md content is not there, the file is not being found — check the path, working directory, and Codex home. -
Size limit truncation: Your global AGENTS.md is large enough to push project-level instructions past the 32 KiB cutoff. Check
wc -c ~/.codex/AGENTS.mdand keep it under 3 KB. -
Instructions too vague: “Follow best practices” is not an instruction. “Run
npm run lintbefore reporting completion” is an instruction. Rewrite vague rules as specific, checkable actions.
Q: Multi-step tasks drift and produce inconsistent results over a long run.
This is the most common failure mode for Pattern 4 (multi-step planning) and Pattern 7 (scaffolding-then-implement). Two fixes:
First, use checkpointing. End each phase with a Codex call that writes a brief state file: "Write the current state of the implementation to .codex/checkpoint.md — what is done, what remains, what assumptions were made." Start the next phase by reading this checkpoint.
Second, front-load constraints. If a constraint matters throughout the entire run, it should be in AGENTS.md, not just in the first prompt. Constraints only in prompts can be “forgotten” as the context fills with intermediate output.
Q: Codex sometimes modifies files I did not ask it to touch.
Three approaches in order of invasiveness:
--sandbox=files-onlyrestricts writes to files in the current directory. For most workflows this is sufficient.- Add explicit boundaries to AGENTS.md:
"Do not modify files outside src/ and tests/ unless explicitly instructed."This works well when combined with the sandbox. - Use a review phase: Pattern 1 (inline review) run against Codex’s own output before committing. This catches unintended changes before they are in git.
Q: How do I handle secrets and credentials in automated Codex workflows?
Never put credentials in AGENTS.md or prompts. The patterns above use the --sandbox=full flag specifically because it disables network access, which prevents Codex from exfiltrating data even if a prompt injection attack occurs.
For CI workflows that need service credentials (database connection, API keys), pass them as environment variables and reference them in AGENTS.md by variable name: "Database connection: $DATABASE_URL (set in environment, never hardcode)." Codex can use the variable name to inform its generated code without the actual value being in the instruction context.
Q: The test-first refactor pattern is slow. Is there a faster path?
The characterization test generation step typically takes 2-5 minutes for a mid-sized module (200-500 lines). If that is too slow, an alternative: use an existing coverage tool to find what is already tested (npm run test:coverage), then ask Codex to add tests only for uncovered lines. This is faster but produces a less complete behavioral baseline.
The longer answer is that the slowness is the point. Thorough characterization tests are the only thing preventing a refactor from changing behavior in ways neither you nor Codex noticed. If you need faster turnaround, reduce the scope of the refactor, not the tests.
Q: Should I use one AGENTS.md or multiple?
For projects up to ~50K lines: one AGENTS.md at the root is sufficient. Add a workflow instructions section as described above.
For monorepos or larger projects: root AGENTS.md for universal rules, package-level AGENTS.md for package-specific workflow instructions. Keep the root file under 5 KB so it leaves room for package files within the 32 KiB budget.
For teams with different AGENTS.md in each developer’s ~/.codex/: this is fine and expected. Personal AGENTS.md handles individual preferences; project AGENTS.md handles project specifics. Both are correct in their respective domains.
The patterns above are not exhaustive — they are the ones that have proven reliable enough to build real workflows around. As Codex CLI continues to evolve, the execution patterns will shift, but the underlying principle will not: the more precisely you define the task, the constraints, and the verification method, the more consistently Codex delivers useful output.
For more AGENTS.md examples from real repositories, browse the rules gallery. For the configuration fundamentals that underpin everything in this guide, see the AGENTS.md setup guide.