Career-ops is an open-source AI-powered job search system built on Claude Code by Santiago Fernandez de Valderrama. It includes 14 specialized Skills modes for evaluating job offers, generating ATS-optimized PDFs, automating form completion, and tracking applications. The repository reached 44,554+ GitHub stars within a week of release.

Do I need to be looking for a job to learn from career-ops?

No. Career-ops is also a reference implementation for production Claude Code Skills architecture. The patterns it demonstrates—14-mode trigger-based decomposition, two-layer file separation, file-based persistence, HITL checkpoints, and the AGENTS.md / CLAUDE.md alias setup—transfer to any multi-mode agentic system.

Why does career-ops use 14 modes instead of one prompt?

Three reasons: precision context loading (only relevant instructions get loaded per task), isolated testability (changing PDF logic does not break evaluation logic), and independent feature deployment (new modes ship without touching existing ones). Santiago summarizes this as 'modes over one long prompt'.

What is HITL in the career-ops design?

HITL stands for Human-In-The-Loop. The system automates analysis (evaluating job descriptions, drafting resumes, completing forms) but never submits applications without explicit user review. The principle is 'AI filters noise, humans provide judgment'. This is encoded as instruction-level rules in AGENTS.md.

Why is the CLAUDE.md only one line?

Career-ops treats AGENTS.md as the source of truth and aliases CLAUDE.md to it via '@AGENTS.md'. This pattern lets the same instructions work for Claude Code (which reads CLAUDE.md by default), Codex CLI (which reads AGENTS.md by default), and any future agentic CLI that adopts the AGENTS.md spec, without duplication.

What does the 10-dimensional evaluation framework measure?

Career-ops scores each offer across 10 weighted dimensions: Role Match and Skills Alignment as gate-pass kill switches; Seniority, Compensation, and Interview Likelihood as high-weight; Geography, Company Stage, Product-Market Fit, and Growth Trajectory as medium-weight; and Timeline as low-weight. Output combines numeric scores (1-5) with letter grades (A-F).

Can career-ops be adapted for non-job-search use cases?

The architecture transfers, but the prompts do not. The 14-mode decomposition pattern, two-layer separation, file-based persistence, HITL checkpoints, and AGENTS.md alias structure all generalize to multi-mode agentic systems for document evaluation, customer research, content production, and similar workflows. Mode-level prompts (oferta, pdf, apply) are job-search specific and require rewriting.

Inside Career-Ops: 14-Mode Skills Architecture Lessons for Claude Code Builders (2026)

import PostCardLink from ’../../components/PostCardLink.astro’;

New to Career-Ops? Start with What Is Career-Ops? for the 30-second definition. This article is the deep-dive into the architecture.

When Santiago Fernández de Valderrama open-sourced career-ops in early 2026, it went from 0 to 44,554+ GitHub stars within a week (as of May 2026). The surface-level pitch is straightforward: an AI-powered job search system that evaluates job descriptions, generates ATS-optimized resumes, and tracks applications.

But that framing buries the lede.

Career-ops is one of the most carefully engineered Claude Code Skills architectures available as open source. If you’re building any kind of multi-mode agentic system on top of Claude Code, this repo is a reference implementation worth studying line by line.

This article dissects what makes the 14-mode design work, how the AGENTS.md / CLAUDE.md split is structured, and the patterns you can pull into your own Skills systems—even if you have no interest in job searching.

Why Career-Ops Matters Beyond Job Search

Most Claude Code repositories on GitHub are either single-purpose scripts or thin wrappers around the agent CLI. Career-ops sits in a different category: a production-grade multi-mode system designed for repeated, parallel, asynchronous use.

The numbers tell the operational story:

631 evaluations completed
354 ATS-optimized PDFs generated
680 URLs deduplicated (zero re-evaluations)
68 applications submitted with personalized resumes
122 URLs processed in parallel batches

Santiago used the system to land his role as Head of Applied AI. The system worked—but the more interesting fact is that the same architecture generalizes far beyond job search.

The lessons inside transfer directly to:

Multi-mode customer-research systems
Document evaluation pipelines (legal, contract review, due diligence)
Asynchronous content production (research → outline → draft → review)
Investment thesis evaluation
Lead qualification and outbound sequencing

If you’ve been building one Claude Code skill at a time and wondering how to scale to a real system, career-ops shows you the answer.

The 14-Mode Skills Architecture

The core insight in career-ops is summarized in one line of Santiago’s own writing: “modes over one long prompt.”

Instead of a monolithic CLAUDE.md or one massive AGENTS.md trying to instruct the agent on everything, the system splits behavior into 14 narrow Skills, each with its own context, rules, and triggering condition.

The Mode Inventory

Trigger Context	Mode	Function
Single URL or JD pasted	`auto-pipeline`	End-to-end: extract → evaluate → report → PDF → tracker
Single offer review	`oferta`	Single-offer scoring (A-F across 10 dimensions)
Multiple offers	`ofertas`	Comparative analysis and ranking
Outreach planning	`contacto`	LinkedIn contact strategy and message drafting
Company research	`deep`	Detailed company intelligence (funding, team, growth)
Interview prep	`interview-prep`	Company-specific question prep
CV / PDF generation	`pdf`	ATS-optimized, keyword-injected resume per offer
Course / cert evaluation	`training`	Learning ROI assessment
Portfolio evaluation	`project`	Project impact analysis for resume content
Pipeline status check	`tracker`	Application tracker review
Form completion	`apply`	Playwright-driven form filling with cached evaluations
Portal scraping	`scan`	Zero-token Greenhouse / Ashby / Lever discovery
Batch processing	`batch`	Parallel multi-offer evaluation with fault tolerance
Pattern analysis	`patterns`	Rejection / targeting insights from history

Each mode is a separate file under modes/. When the user invokes a mode (or the agent infers it from context), only that file’s instructions get loaded into the prompt.

Why Modes Beat One Long Prompt

Three concrete benefits emerge from this design:

1. Precision context loading. When the agent runs apply, it doesn’t load scoring rules. When it runs oferta, it doesn’t load form-filling logic. The model receives only what’s relevant.

This matters because Claude Code’s context window, while generous, isn’t free. Loading 8,000 tokens of irrelevant instructions for every interaction degrades reasoning—the model has to filter signal from noise on every turn.

2. Isolated testability. Changing PDF rendering rules in pdf mode doesn’t risk breaking evaluation logic in oferta. Each mode evolves independently.

3. Independent feature deployment. Santiago shipped training mode three weeks after the initial launch. The mode plugs in cleanly because it doesn’t depend on other modes’ internal state.

For comparison, consider the alternative: a single 12,000-line CLAUDE.md trying to handle every possible task. Adding a new capability means risking regressions across the entire system. Skills decomposition makes scaling sustainable.

The 10-Dimensional Evaluation Framework

When oferta mode evaluates a single job posting, it doesn’t return a vague “this is a fit” or “this isn’t.” It produces a structured score across 10 dimensions with explicit weights:

Gate-pass dimensions (kill switches, binary):

Role Match
Skills Alignment

If either fails, the evaluation halts and produces an early reject. No further analysis wastes tokens.

High-weight dimensions:

Seniority
Compensation
Interview Likelihood

Medium-weight dimensions:

Geography
Company Stage
Product-Market Fit
Growth Trajectory

Low-weight dimensions:

Timeline

The output combines numeric scores (1-5) with letter grades (A-F). Santiago notes that 74% of evaluated offers scored below 4.0—meaning without the system, he would have read the full job description for hundreds of postings that turned out not to fit.

What This Pattern Teaches About Evaluation Skills

The 10-dimensional framework isn’t unique to job evaluation. Any “should I do this?” decision system benefits from:

Explicit gates that short-circuit obviously bad cases
Weighted dimensions that capture trade-offs honestly
Standardized output (numeric + letter grade) for downstream filtering and aggregation
Reasoning over keyword matching so the system adapts to context

Apply this to investment evaluation, vendor selection, partnership decisions, or content approval, and you get a system that produces durable, comparable judgments instead of ad-hoc opinions.

HITL: Where Humans and Agents Draw the Line

A central design principle in career-ops is HITL (Human-In-The-Loop). Santiago is explicit:

“AI filters noise, humans provide judgment.”

The agent does the heavy lifting—reading 631 job descriptions, generating personalized resumes, submitting forms when authorized—but never makes the final commitment without human review.

This shows up in the AGENTS.md as explicit rules:

Never submit applications without user review.
Discourage applying to sub-4.0/5 offers (“quality over speed”).
Always verify offer liveness via Playwright (not WebSearch—too unreliable for transactional decisions).
Respect recruiter time—target 5 strong fits over 50 generic blasts.

The HITL philosophy is what makes the system trustworthy enough to actually use. Pure automation in a job search context would damage the user’s reputation; pure manual operation would defeat the purpose.

For your own Skills systems, the lesson is to identify the minimum viable human checkpoint—the smallest insertion of judgment that prevents catastrophic automation while preserving most of the speed benefits.

Tech Stack: Pragmatic, Not Trendy

Career-ops makes some interesting tooling choices:

Claude Code — agentic reasoning, content generation, the orchestration brain
Playwright — browser automation for portal navigation and form completion
Puppeteer — PDF rendering (with Playwright also handling some PDF cases)
Go — the dashboard / TUI for tracker visualization
Node.js — utility scripts (merge-tracker, normalize-statuses, dedup-tracker)
tmux — parallel session management for batch processing

Notably absent: a database. The system uses TSV files and Markdown reports. No Postgres, no Redis, no vector store. This is intentional: the entire system runs locally with no infrastructure dependencies.

The implications:

Zero hosting cost — runs on any laptop
Full data sovereignty — your CV, evaluations, and tracker never leave your machine
Easy forking — anyone can clone, customize, and own their copy
Trivial portability — copy the directory, run npm install, you’re operational

This stack philosophy (“local, file-based, MIT licensed”) is part of why the system gained adoption so quickly. There’s no signup wall, no rate limiting, no subscription. If it works for you, it works.

Inside the Repo: CLAUDE.md / AGENTS.md Setup

One of the more striking discoveries when you actually clone career-ops: the CLAUDE.md is two lines, 83 bytes. It contains essentially:

@AGENTS.md

That’s it. The CLAUDE.md is a pointer to AGENTS.md. All the actual instructions live in AGENTS.md.

Why this matters: it’s a reference implementation of how to handle the Claude Code / Codex / multi-tool reality. By writing the canonical instructions in AGENTS.md and aliasing CLAUDE.md to it, the same instructions work for:

Claude Code (which reads CLAUDE.md by default)
Codex CLI (which reads AGENTS.md by default)
Any future agentic CLI that adopts the AGENTS.md spec

For builders, this is a clean pattern. Treat AGENTS.md as the source of truth, point CLAUDE.md to it, and your repo works across the agentic CLI ecosystem without duplication.

What’s in AGENTS.md (the actual brain)

The AGENTS.md is structured as a layered onboarding sequence:

Two-layer separation rule (critical): User Layer (CV, profile, modes/_profile.md, tracker) is never auto-updated. System Layer (shared modes, scripts, templates) can be updated. Customizations always go to modes/_profile.md or config/profile.yml—never to modes/_shared.md—so updates don’t overwrite personalization.
Onboarding gates: before any evaluation, verify that cv.md, config/profile.yml, modes/_profile.md, and portals.yml exist. If missing, walk the user through setup step by step.
Mode descriptions: each of the 14 modes has a one-line description and trigger condition.
Quality and ethics rules: never auto-submit, discourage low-fit applications, verify liveness, respect recruiter time.
Pipeline management rules: 3-digit sequential report numbering, batch tracker additions, dedup workflow, canonical state machine.
Language support: default English, optional German / French / Japanese with localized vocabulary in modes/de/, modes/fr/, modes/ja/.

The structure is far more detailed than typical agent instruction files. This isn’t 200 lines of “be helpful and accurate”—it’s an operational runbook for a production system.

What This Teaches About Building Your Own Skills System

Reading career-ops as a study guide rather than as a job-search tool, several reusable patterns emerge.

1. Decompose by Trigger, Not by Subject

Most people would design a job-search system with modes like “evaluation,” “writing,” “automation.” Career-ops decomposes by trigger context: what is the user actually doing right now?

Pasting a URL → auto-pipeline
Asking for company research → deep
Reviewing the pipeline → tracker

Trigger-based decomposition beats subject-based decomposition because users think in actions, not categories.

2. Make the Default Path Effortless

auto-pipeline is the default end-to-end flow. The user pastes a URL, and the agent runs the entire pipeline (extract → evaluate → report → PDF → tracker) without further prompting. The 13 other modes are escape hatches for when the user wants finer control.

Most automation systems get this backwards: they require the user to specify every step. Career-ops makes the default path the fastest, with manual override always available.

3. Encode Quality Gates in the Instructions

Rather than hoping the user makes good decisions, the AGENTS.md actively discourages low-quality actions: “Discourage applying to sub-4.0/5 offers.” This isn’t a soft suggestion—it’s an instruction the agent will surface to the user.

For your own systems, identify where users tend to make low-leverage decisions and have the agent push back.

4. Use Files, Not Databases, Until You Must

Career-ops uses Markdown files and TSV tables. Reports are sequential numbered files. The tracker is a TSV. The evaluation history is a directory of reports.

This makes the entire system inspectable, version-controllable, diffable, and portable. The moment you switch to a database, you lose all of this.

5. Build Onboarding into the Agent

The agent verifies that required setup files exist before running any operation. If they don’t, it walks the user through creating them. This means you don’t need a separate setup script or wizard—the agent itself handles onboarding through conversational prompts.

6. Layer User Customization Carefully

The two-layer separation (user files vs system files, with explicit override locations like modes/_profile.md) is what allows the system to update without breaking personal customizations. This is the pattern most needed when multiple users fork the same Skills system.

7. Document the Tech Stack as Non-Negotiable Defaults

The AGENTS.md is explicit about which tool to use for which job: Playwright for liveness verification, not WebSearch. Puppeteer for PDF rendering. tmux for parallel sessions. By specifying the canonical tool, the agent doesn’t waste tokens choosing.

7 Patterns You Can Steal Today

If you’re building Skills now and want concrete, transferable patterns:

Two-layer file separation (user/system) with explicit override locations.
Trigger-based mode decomposition with one mode per primary user action.
Default-end-to-end auto-pipeline with single-mode escape hatches.
Multi-dimensional weighted scoring with explicit gate-pass criteria.
HITL checkpoints at the smallest viable insertion of human judgment.
File-based persistence (TSV, Markdown) instead of databases.
AGENTS.md as the source of truth, CLAUDE.md aliased via @AGENTS.md.

Each of these patterns can be lifted out of career-ops independently. You don’t need to clone the whole repo to benefit.

Limitations and Hidden Costs

The system isn’t free of trade-offs.

Setup complexity is real. Despite the 5-step quick start, fully customizing the archetypes, scoring weights, and tracker schemas to your situation takes meaningful time. Plan on 2-4 hours of upfront customization before you get production-quality output.

Playwright dependencies are heavy. Cloning the repo and running playwright install chromium adds ~300MB. If you don’t actually need browser automation, the modes that depend on Playwright are dead weight.

Mode authoring is its own discipline. Adding new modes that integrate cleanly with existing infrastructure requires understanding the file conventions, the shared rules, the personalization layer, and the tracker schema. It’s not as simple as “write another markdown file.”

Localization is partial. The system supports German, French, and Japanese, but the depth of localization varies. Don’t assume non-English versions match English in capability.

The pipeline assumes job-search semantics. Generalizing the system to non-job-search domains requires rewriting most of oferta, pdf, apply, and the scoring framework. The architecture transfers; the prompts don’t.

These aren’t reasons to avoid the system—they’re reasons to be honest about the work involved.