Claude Code Skills AGENTS.md Multi-Agent Career-Ops Architecture 2026

Inside Career-Ops: 14-Mode Skills Architecture Lessons for Claude Code Builders (2026)

The Prompt Shelf ·

import PostCardLink from ’../../components/PostCardLink.astro’;

When Santiago Fernández de Valderrama open-sourced career-ops in early 2026, it went from 0 to 44,554+ GitHub stars within a week (as of May 2026). The surface-level pitch is straightforward: an AI-powered job search system that evaluates job descriptions, generates ATS-optimized resumes, and tracks applications.

But that framing buries the lede.

Career-ops is one of the most carefully engineered Claude Code Skills architectures available as open source. If you’re building any kind of multi-mode agentic system on top of Claude Code, this repo is a reference implementation worth studying line by line.

This article dissects what makes the 14-mode design work, how the AGENTS.md / CLAUDE.md split is structured, and the patterns you can pull into your own Skills systems—even if you have no interest in job searching.

Most Claude Code repositories on GitHub are either single-purpose scripts or thin wrappers around the agent CLI. Career-ops sits in a different category: a production-grade multi-mode system designed for repeated, parallel, asynchronous use.

The numbers tell the operational story:

  • 631 evaluations completed
  • 354 ATS-optimized PDFs generated
  • 680 URLs deduplicated (zero re-evaluations)
  • 68 applications submitted with personalized resumes
  • 122 URLs processed in parallel batches

Santiago used the system to land his role as Head of Applied AI. The system worked—but the more interesting fact is that the same architecture generalizes far beyond job search.

The lessons inside transfer directly to:

  • Multi-mode customer-research systems
  • Document evaluation pipelines (legal, contract review, due diligence)
  • Asynchronous content production (research → outline → draft → review)
  • Investment thesis evaluation
  • Lead qualification and outbound sequencing

If you’ve been building one Claude Code skill at a time and wondering how to scale to a real system, career-ops shows you the answer.

The 14-Mode Skills Architecture

The core insight in career-ops is summarized in one line of Santiago’s own writing: “modes over one long prompt.”

Instead of a monolithic CLAUDE.md or one massive AGENTS.md trying to instruct the agent on everything, the system splits behavior into 14 narrow Skills, each with its own context, rules, and triggering condition.

The Mode Inventory

Trigger ContextModeFunction
Single URL or JD pastedauto-pipelineEnd-to-end: extract → evaluate → report → PDF → tracker
Single offer reviewofertaSingle-offer scoring (A-F across 10 dimensions)
Multiple offersofertasComparative analysis and ranking
Outreach planningcontactoLinkedIn contact strategy and message drafting
Company researchdeepDetailed company intelligence (funding, team, growth)
Interview prepinterview-prepCompany-specific question prep
CV / PDF generationpdfATS-optimized, keyword-injected resume per offer
Course / cert evaluationtrainingLearning ROI assessment
Portfolio evaluationprojectProject impact analysis for resume content
Pipeline status checktrackerApplication tracker review
Form completionapplyPlaywright-driven form filling with cached evaluations
Portal scrapingscanZero-token Greenhouse / Ashby / Lever discovery
Batch processingbatchParallel multi-offer evaluation with fault tolerance
Pattern analysispatternsRejection / targeting insights from history

Each mode is a separate file under modes/. When the user invokes a mode (or the agent infers it from context), only that file’s instructions get loaded into the prompt.

Why Modes Beat One Long Prompt

Three concrete benefits emerge from this design:

1. Precision context loading. When the agent runs apply, it doesn’t load scoring rules. When it runs oferta, it doesn’t load form-filling logic. The model receives only what’s relevant.

This matters because Claude Code’s context window, while generous, isn’t free. Loading 8,000 tokens of irrelevant instructions for every interaction degrades reasoning—the model has to filter signal from noise on every turn.

2. Isolated testability. Changing PDF rendering rules in pdf mode doesn’t risk breaking evaluation logic in oferta. Each mode evolves independently.

3. Independent feature deployment. Santiago shipped training mode three weeks after the initial launch. The mode plugs in cleanly because it doesn’t depend on other modes’ internal state.

For comparison, consider the alternative: a single 12,000-line CLAUDE.md trying to handle every possible task. Adding a new capability means risking regressions across the entire system. Skills decomposition makes scaling sustainable.

The 10-Dimensional Evaluation Framework

When oferta mode evaluates a single job posting, it doesn’t return a vague “this is a fit” or “this isn’t.” It produces a structured score across 10 dimensions with explicit weights:

Gate-pass dimensions (kill switches, binary):

  • Role Match
  • Skills Alignment

If either fails, the evaluation halts and produces an early reject. No further analysis wastes tokens.

High-weight dimensions:

  • Seniority
  • Compensation
  • Interview Likelihood

Medium-weight dimensions:

  • Geography
  • Company Stage
  • Product-Market Fit
  • Growth Trajectory

Low-weight dimensions:

  • Timeline

The output combines numeric scores (1-5) with letter grades (A-F). Santiago notes that 74% of evaluated offers scored below 4.0—meaning without the system, he would have read the full job description for hundreds of postings that turned out not to fit.

What This Pattern Teaches About Evaluation Skills

The 10-dimensional framework isn’t unique to job evaluation. Any “should I do this?” decision system benefits from:

  1. Explicit gates that short-circuit obviously bad cases
  2. Weighted dimensions that capture trade-offs honestly
  3. Standardized output (numeric + letter grade) for downstream filtering and aggregation
  4. Reasoning over keyword matching so the system adapts to context

Apply this to investment evaluation, vendor selection, partnership decisions, or content approval, and you get a system that produces durable, comparable judgments instead of ad-hoc opinions.

HITL: Where Humans and Agents Draw the Line

A central design principle in career-ops is HITL (Human-In-The-Loop). Santiago is explicit:

“AI filters noise, humans provide judgment.”

The agent does the heavy lifting—reading 631 job descriptions, generating personalized resumes, submitting forms when authorized—but never makes the final commitment without human review.

This shows up in the AGENTS.md as explicit rules:

  • Never submit applications without user review.
  • Discourage applying to sub-4.0/5 offers (“quality over speed”).
  • Always verify offer liveness via Playwright (not WebSearch—too unreliable for transactional decisions).
  • Respect recruiter time—target 5 strong fits over 50 generic blasts.

The HITL philosophy is what makes the system trustworthy enough to actually use. Pure automation in a job search context would damage the user’s reputation; pure manual operation would defeat the purpose.

For your own Skills systems, the lesson is to identify the minimum viable human checkpoint—the smallest insertion of judgment that prevents catastrophic automation while preserving most of the speed benefits.

Tech Stack: Pragmatic, Not Trendy

Career-ops makes some interesting tooling choices:

  • Claude Code — agentic reasoning, content generation, the orchestration brain
  • Playwright — browser automation for portal navigation and form completion
  • Puppeteer — PDF rendering (with Playwright also handling some PDF cases)
  • Go — the dashboard / TUI for tracker visualization
  • Node.js — utility scripts (merge-tracker, normalize-statuses, dedup-tracker)
  • tmux — parallel session management for batch processing

Notably absent: a database. The system uses TSV files and Markdown reports. No Postgres, no Redis, no vector store. This is intentional: the entire system runs locally with no infrastructure dependencies.

The implications:

  1. Zero hosting cost — runs on any laptop
  2. Full data sovereignty — your CV, evaluations, and tracker never leave your machine
  3. Easy forking — anyone can clone, customize, and own their copy
  4. Trivial portability — copy the directory, run npm install, you’re operational

This stack philosophy (“local, file-based, MIT licensed”) is part of why the system gained adoption so quickly. There’s no signup wall, no rate limiting, no subscription. If it works for you, it works.

Inside the Repo: CLAUDE.md / AGENTS.md Setup

One of the more striking discoveries when you actually clone career-ops: the CLAUDE.md is two lines, 83 bytes. It contains essentially:

@AGENTS.md

That’s it. The CLAUDE.md is a pointer to AGENTS.md. All the actual instructions live in AGENTS.md.

Why this matters: it’s a reference implementation of how to handle the Claude Code / Codex / multi-tool reality. By writing the canonical instructions in AGENTS.md and aliasing CLAUDE.md to it, the same instructions work for:

  • Claude Code (which reads CLAUDE.md by default)
  • Codex CLI (which reads AGENTS.md by default)
  • Any future agentic CLI that adopts the AGENTS.md spec

For builders, this is a clean pattern. Treat AGENTS.md as the source of truth, point CLAUDE.md to it, and your repo works across the agentic CLI ecosystem without duplication.

What’s in AGENTS.md (the actual brain)

The AGENTS.md is structured as a layered onboarding sequence:

  1. Two-layer separation rule (critical): User Layer (CV, profile, modes/_profile.md, tracker) is never auto-updated. System Layer (shared modes, scripts, templates) can be updated. Customizations always go to modes/_profile.md or config/profile.yml—never to modes/_shared.md—so updates don’t overwrite personalization.
  2. Onboarding gates: before any evaluation, verify that cv.md, config/profile.yml, modes/_profile.md, and portals.yml exist. If missing, walk the user through setup step by step.
  3. Mode descriptions: each of the 14 modes has a one-line description and trigger condition.
  4. Quality and ethics rules: never auto-submit, discourage low-fit applications, verify liveness, respect recruiter time.
  5. Pipeline management rules: 3-digit sequential report numbering, batch tracker additions, dedup workflow, canonical state machine.
  6. Language support: default English, optional German / French / Japanese with localized vocabulary in modes/de/, modes/fr/, modes/ja/.

The structure is far more detailed than typical agent instruction files. This isn’t 200 lines of “be helpful and accurate”—it’s an operational runbook for a production system.

What This Teaches About Building Your Own Skills System

Reading career-ops as a study guide rather than as a job-search tool, several reusable patterns emerge.

1. Decompose by Trigger, Not by Subject

Most people would design a job-search system with modes like “evaluation,” “writing,” “automation.” Career-ops decomposes by trigger context: what is the user actually doing right now?

  • Pasting a URL → auto-pipeline
  • Asking for company research → deep
  • Reviewing the pipeline → tracker

Trigger-based decomposition beats subject-based decomposition because users think in actions, not categories.

2. Make the Default Path Effortless

auto-pipeline is the default end-to-end flow. The user pastes a URL, and the agent runs the entire pipeline (extract → evaluate → report → PDF → tracker) without further prompting. The 13 other modes are escape hatches for when the user wants finer control.

Most automation systems get this backwards: they require the user to specify every step. Career-ops makes the default path the fastest, with manual override always available.

3. Encode Quality Gates in the Instructions

Rather than hoping the user makes good decisions, the AGENTS.md actively discourages low-quality actions: “Discourage applying to sub-4.0/5 offers.” This isn’t a soft suggestion—it’s an instruction the agent will surface to the user.

For your own systems, identify where users tend to make low-leverage decisions and have the agent push back.

4. Use Files, Not Databases, Until You Must

Career-ops uses Markdown files and TSV tables. Reports are sequential numbered files. The tracker is a TSV. The evaluation history is a directory of reports.

This makes the entire system inspectable, version-controllable, diffable, and portable. The moment you switch to a database, you lose all of this.

5. Build Onboarding into the Agent

The agent verifies that required setup files exist before running any operation. If they don’t, it walks the user through creating them. This means you don’t need a separate setup script or wizard—the agent itself handles onboarding through conversational prompts.

6. Layer User Customization Carefully

The two-layer separation (user files vs system files, with explicit override locations like modes/_profile.md) is what allows the system to update without breaking personal customizations. This is the pattern most needed when multiple users fork the same Skills system.

7. Document the Tech Stack as Non-Negotiable Defaults

The AGENTS.md is explicit about which tool to use for which job: Playwright for liveness verification, not WebSearch. Puppeteer for PDF rendering. tmux for parallel sessions. By specifying the canonical tool, the agent doesn’t waste tokens choosing.

7 Patterns You Can Steal Today

If you’re building Skills now and want concrete, transferable patterns:

  1. Two-layer file separation (user/system) with explicit override locations.
  2. Trigger-based mode decomposition with one mode per primary user action.
  3. Default-end-to-end auto-pipeline with single-mode escape hatches.
  4. Multi-dimensional weighted scoring with explicit gate-pass criteria.
  5. HITL checkpoints at the smallest viable insertion of human judgment.
  6. File-based persistence (TSV, Markdown) instead of databases.
  7. AGENTS.md as the source of truth, CLAUDE.md aliased via @AGENTS.md.

Each of these patterns can be lifted out of career-ops independently. You don’t need to clone the whole repo to benefit.

Limitations and Hidden Costs

The system isn’t free of trade-offs.

Setup complexity is real. Despite the 5-step quick start, fully customizing the archetypes, scoring weights, and tracker schemas to your situation takes meaningful time. Plan on 2-4 hours of upfront customization before you get production-quality output.

Playwright dependencies are heavy. Cloning the repo and running playwright install chromium adds ~300MB. If you don’t actually need browser automation, the modes that depend on Playwright are dead weight.

Mode authoring is its own discipline. Adding new modes that integrate cleanly with existing infrastructure requires understanding the file conventions, the shared rules, the personalization layer, and the tracker schema. It’s not as simple as “write another markdown file.”

Localization is partial. The system supports German, French, and Japanese, but the depth of localization varies. Don’t assume non-English versions match English in capability.

The pipeline assumes job-search semantics. Generalizing the system to non-job-search domains requires rewriting most of oferta, pdf, apply, and the scoring framework. The architecture transfers; the prompts don’t.

These aren’t reasons to avoid the system—they’re reasons to be honest about the work involved.

Frequently Asked Questions

Related Articles

Explore the collection

Browse all AI coding rules — CLAUDE.md, .cursorrules, AGENTS.md, and more.

Browse Rules