AGENTS.md has no official schema. There’s no JSON Schema file, no type definitions, no standardized section names. OpenAI’s own documentation describes the format in prose: “AGENTS.md files help Codex understand your project.” That’s it.
The absence of schema enforcement is fine for adoption — anyone can write an AGENTS.md without learning a new format. But it creates a class of silent failures that only show up when an AI agent behaves unexpectedly: missing command blocks, ambiguous instructions, references to files that don’t exist, sections that shadow each other in hierarchical setups.
This guide walks through building a practical AGENTS.md validator from scratch. The design goal is simple: catch structural and content problems early, before they reach the AI agent. All code is Node.js/TypeScript and runs without external APIs.
What “Validation” Means for a Prose Document
AGENTS.md is not structured data. You can’t parse it with a JSON parser and compare fields against a schema. What you can do is apply a set of rules to the document structure (headings, code blocks, list items) and the content within those structures.
There are three useful levels of validation:
- Structural: Does the file have the expected sections? Are code blocks properly closed? Is there a Commands section? A Code Style section?
- Semantic: Are command references internally consistent? Do referenced paths exist on disk? Are there conflicting instructions in the same file?
- Quality: Is any section suspiciously short (likely placeholder)? Are there TODOs or FIXMEs? Does the file reference tool-specific features of a tool it claims not to use?
A practical validator handles levels 1 and 2 fully, and level 3 as optional warnings.
Building the Parser
Start with a minimal Markdown AST. We only need headings, code blocks, list items, and paragraph text.
// src/parser.ts
export type NodeType = 'heading' | 'code' | 'list-item' | 'paragraph' | 'blank';
export interface MarkdownNode {
type: NodeType;
level?: number; // for headings: 1-6
lang?: string; // for code blocks: bash, typescript, etc.
content: string;
lineNumber: number;
}
export function parse(content: string): MarkdownNode[] {
const lines = content.split('\n');
const nodes: MarkdownNode[] = [];
let inCodeBlock = false;
let codeLang = '';
let codeLines: string[] = [];
let codeStart = 0;
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
if (!inCodeBlock && line.startsWith('```')) {
inCodeBlock = true;
codeLang = line.slice(3).trim();
codeLines = [];
codeStart = i + 1;
continue;
}
if (inCodeBlock && line.startsWith('```')) {
nodes.push({
type: 'code',
lang: codeLang,
content: codeLines.join('\n'),
lineNumber: codeStart,
});
inCodeBlock = false;
codeLines = [];
continue;
}
if (inCodeBlock) {
codeLines.push(line);
continue;
}
const headingMatch = line.match(/^(#{1,6})\s+(.+)$/);
if (headingMatch) {
nodes.push({
type: 'heading',
level: headingMatch[1].length,
content: headingMatch[2].trim(),
lineNumber: i + 1,
});
continue;
}
if (line.match(/^[-*+]\s+/) || line.match(/^\d+\.\s+/)) {
nodes.push({
type: 'list-item',
content: line.replace(/^[-*+\d.]+\s+/, '').trim(),
lineNumber: i + 1,
});
continue;
}
if (line.trim() === '') {
nodes.push({ type: 'blank', content: '', lineNumber: i + 1 });
} else {
nodes.push({ type: 'paragraph', content: line.trim(), lineNumber: i + 1 });
}
}
if (inCodeBlock) {
// Unclosed code block
nodes.push({
type: 'code',
lang: codeLang,
content: codeLines.join('\n'),
lineNumber: codeStart,
});
}
return nodes;
}
Defining Rules
Each rule is a function that takes the parsed nodes and returns zero or more diagnostics:
// src/rules.ts
import { MarkdownNode } from './parser';
import * as fs from 'fs';
import * as path from 'path';
export type Severity = 'error' | 'warning' | 'info';
export interface Diagnostic {
rule: string;
severity: Severity;
message: string;
lineNumber?: number;
}
export type Rule = (
nodes: MarkdownNode[],
filePath: string
) => Diagnostic[];
// Rule: Commands section must exist
export const requireCommandsSection: Rule = (nodes) => {
const hasCommands = nodes.some(
(n) => n.type === 'heading' && /commands?/i.test(n.content)
);
if (!hasCommands) {
return [{
rule: 'require-commands-section',
severity: 'error',
message: 'AGENTS.md is missing a Commands section. Add ## Commands with build/test/lint commands.',
}];
}
return [];
};
// Rule: Code blocks must be closed
export const noUnclosedCodeBlocks: Rule = (nodes, filePath) => {
const content = fs.readFileSync(filePath, 'utf-8');
const backtickMatches = content.match(/^```/gm) || [];
if (backtickMatches.length % 2 !== 0) {
return [{
rule: 'no-unclosed-code-blocks',
severity: 'error',
message: `Unclosed code block detected. Count of triple-backtick markers: ${backtickMatches.length} (expected even number).`,
}];
}
return [];
};
// Rule: Referenced file paths should exist
export const referencedPathsShouldExist: Rule = (nodes, filePath) => {
const repoRoot = path.dirname(filePath);
const diagnostics: Diagnostic[] = [];
for (const node of nodes) {
if (node.type !== 'list-item' && node.type !== 'paragraph') continue;
// Extract backtick-quoted paths that look like file paths
const pathMatches = node.content.match(/`([^`]*\/[^`]*)`/g) || [];
for (const match of pathMatches) {
const p = match.slice(1, -1);
// Skip if it looks like a command with arguments
if (p.includes('$') || p.includes('*') || p.includes(' ')) continue;
// Skip URLs
if (p.startsWith('http')) continue;
const fullPath = path.resolve(repoRoot, p);
if (!fs.existsSync(fullPath)) {
diagnostics.push({
rule: 'referenced-paths-should-exist',
severity: 'warning',
message: `Referenced path does not exist: \`${p}\``,
lineNumber: node.lineNumber,
});
}
}
}
return diagnostics;
};
// Rule: Commands section should contain at least one code-formatted command
export const commandsSectionHasCommands: Rule = (nodes) => {
let inCommandsSection = false;
let commandCount = 0;
for (const node of nodes) {
if (node.type === 'heading') {
inCommandsSection = /commands?/i.test(node.content);
}
if (inCommandsSection && node.type === 'list-item') {
if (node.content.includes('`')) commandCount++;
}
}
if (commandCount === 0) {
return [{
rule: 'commands-section-has-commands',
severity: 'error',
message: 'Commands section exists but has no backtick-formatted commands. Wrap commands in backticks: `pnpm test`',
}];
}
return [];
};
// Rule: Warn on TODO/FIXME in content
export const noTodoFixme: Rule = (nodes) => {
const diagnostics: Diagnostic[] = [];
for (const node of nodes) {
if (/\b(TODO|FIXME)\b/i.test(node.content)) {
diagnostics.push({
rule: 'no-todo-fixme',
severity: 'warning',
message: `Found TODO/FIXME on line ${node.lineNumber} — resolve or remove before agents read this file.`,
lineNumber: node.lineNumber,
});
}
}
return diagnostics;
};
// Rule: Warn if no section exceeds 3 list items (likely too thin)
export const contentDensityCheck: Rule = (nodes) => {
let currentSection = '';
let itemsInSection = 0;
let hasSubstantialSection = false;
for (const node of nodes) {
if (node.type === 'heading' && node.level && node.level <= 2) {
if (itemsInSection >= 3) hasSubstantialSection = true;
currentSection = node.content;
itemsInSection = 0;
}
if (node.type === 'list-item') itemsInSection++;
}
if (itemsInSection >= 3) hasSubstantialSection = true;
if (!hasSubstantialSection) {
return [{
rule: 'content-density-check',
severity: 'warning',
message: 'AGENTS.md appears sparse. No section has 3+ list items. Consider adding more specific instructions.',
}];
}
return [];
};
export const DEFAULT_RULES: Rule[] = [
requireCommandsSection,
noUnclosedCodeBlocks,
referencedPathsShouldExist,
commandsSectionHasCommands,
noTodoFixme,
contentDensityCheck,
];
The Validator CLI
// src/cli.ts
import { parse } from './parser';
import { DEFAULT_RULES, Diagnostic } from './rules';
import * as fs from 'fs';
import * as path from 'path';
function validate(filePath: string): Diagnostic[] {
const content = fs.readFileSync(filePath, 'utf-8');
const nodes = parse(content);
const diagnostics: Diagnostic[] = [];
for (const rule of DEFAULT_RULES) {
diagnostics.push(...rule(nodes, filePath));
}
return diagnostics;
}
function main() {
const args = process.argv.slice(2);
const targetPath = args[0] || 'AGENTS.md';
const filePath = path.resolve(process.cwd(), targetPath);
if (!fs.existsSync(filePath)) {
console.error(`File not found: ${filePath}`);
process.exit(1);
}
const diagnostics = validate(filePath);
if (diagnostics.length === 0) {
console.log(`✓ ${path.basename(filePath)}: no issues found`);
process.exit(0);
}
const errors = diagnostics.filter((d) => d.severity === 'error');
const warnings = diagnostics.filter((d) => d.severity === 'warning');
for (const d of diagnostics) {
const loc = d.lineNumber ? `:${d.lineNumber}` : '';
const prefix = d.severity === 'error' ? 'error' : 'warn';
console.log(`${filePath}${loc}: ${prefix}: [${d.rule}] ${d.message}`);
}
console.log(`\n${errors.length} error(s), ${warnings.length} warning(s)`);
// Exit with error code only if there are errors (not warnings)
process.exit(errors.length > 0 ? 1 : 0);
}
main();
Wire it into package.json:
{
"scripts": {
"validate:agents": "tsx src/cli.ts",
"validate:agents:all": "find . -name 'AGENTS.md' -not -path '*/node_modules/*' | xargs -I{} tsx src/cli.ts {}"
}
}
Extending with Custom Rules
The rule interface is intentionally simple. Add domain-specific rules without touching the core:
// Custom rule: require a Boundaries section listing protected paths
const requireBoundariesSection: Rule = (nodes) => {
const hasBoundaries = nodes.some(
(n) => n.type === 'heading' && /boundar(y|ies)/i.test(n.content)
);
if (!hasBoundaries) {
return [{
rule: 'require-boundaries-section',
severity: 'warning',
message: 'Consider adding a ## Boundaries section listing directories agents should not modify.',
}];
}
return [];
};
// Custom rule: no package manager mixing
const noPackageManagerMixing: Rule = (nodes) => {
const content = nodes.map(n => n.content).join('\n');
const usesNpm = /\bnpm (install|run|exec)\b/.test(content);
const usesPnpm = /\bpnpm\b/.test(content);
const usesYarn = /\byarn\b/.test(content);
const managers = [usesNpm && 'npm', usesPnpm && 'pnpm', usesYarn && 'yarn'].filter(Boolean);
if (managers.length > 1) {
return [{
rule: 'no-package-manager-mixing',
severity: 'warning',
message: `Multiple package managers referenced: ${managers.join(', ')}. AI agents may use the wrong one.`,
}];
}
return [];
};
Running Validation Across a Monorepo
#!/bin/bash
# scripts/validate-all-agents-md.sh
ERRORS=0
WARNINGS=0
while IFS= read -r -d '' file; do
echo "Checking: $file"
output=$(node dist/cli.js "$file" 2>&1)
exit_code=$?
if [ $exit_code -ne 0 ]; then
echo "$output"
ERRORS=$((ERRORS + 1))
elif echo "$output" | grep -q "warn:"; then
echo "$output"
WARNINGS=$((WARNINGS + 1))
fi
done < <(find . -name "AGENTS.md" -not -path "*/node_modules/*" -print0)
echo ""
echo "Summary: $ERRORS file(s) with errors, $WARNINGS file(s) with warnings"
[ $ERRORS -eq 0 ]
Real-world results from running this against a 15-package monorepo with AGENTS.md files of varying age: 3 files had unclosed code blocks from copy-paste errors, 7 had references to paths that had been moved or renamed, 2 had TODO markers from initial setup. None of these caused CI failures — they were invisible until an agent produced unexpected output.
What This Won’t Catch
The validator handles structure and mechanical correctness. It won’t tell you whether your instructions are clear, whether they conflict with each other in subtle semantic ways, or whether an AI agent will actually follow them. For that, you need to run the agent and observe behavior — which is the other half of quality assurance for AGENTS.md files.