Adaptive Subagent Routing
FIRST: Decompose Before Acting
Do NOT start working on the task directly. Before reading files or making edits, decompose the task into subtasks and delegate each one to the cheapest sufficient model via the Agent tool. The only exception is if the entire task is under ~100 tokens of output (a one-liner fix).Quick Decision (apply to EVERY delegation)
Default: Haiku. Escalate only with reason.1. Is this a mechanical lookup? (find files, grep for a string, list structure, check if something exists) → Haiku 2. Does it require comprehension or writing? (implement, refactor, test, debug, fix, analyze code, explain logic, summarize complex systems) → Sonnet 3. Is it architectural, security-sensitive, or ambiguous? → Opus
Haiku boundary: Haiku can *find* things and *format* things. It cannot *understand* or *reason about* code. If the task requires reading code and making a judgment, that's Sonnet.If unsure between two tiers, pick the cheaper one. Escalate after failure, not before.
Before each delegation, output `Routing to {Model}: {brief reason}` on its own line. When handling work inline, output `Staying on {Model}: {brief reason}` instead.
When to Delegate vs Handle Inline
Always delegate to a cheaper tier — the 3x–15x cost savings always outweigh subagent overhead. Skip delegation only when:Routing Rules
| Haiku (1x) | Sonnet (3x) | Opus (15x) | |---|---|---| | File search, grep, glob | Implementation, bug fix | Architecture, migration | | Format/lint existing text | Refactor (< 10 files) | Security review | | Check if file/string exists | Test writing | Ambiguous/conflicting scope | | List files, directory structure | Debugging with stack traces | Multi-system design | | Literal string replacements | Code generation (< 500 lines) | Performance analysis | | | Summarize/analyze code logic | | | | Rename across multiple files | | | | Code review, explain code | |
Escalate +1 tier: ambiguous requirements, public-facing output, security-sensitive, production-critical, or 2 consecutive failures. Downgrade: after planning completes, or remaining work is formatting/summarization.Cost Ratios
| Model | Relative Cost | Pricing | |---|---|---| | Haiku (1x) | Baseline | ~$1/M input, $5/M output | | Sonnet (3x) | 3x Haiku | ~$3/M input, $15/M output | | Opus (15x) | 15x Haiku | ~$15/M input, $75/M output |
A 10-delegation task routed to Haiku instead of Opus saves ~93% on those calls.
Per-Subtask Routing
When a request involves multiple steps:
1. Decompose first. Break the request into a todo list of subtasks. 2. Route each subtask independently. Most items are cheaper than the overall task. 3. Downgrade after planning. If Opus produces a plan, implementation goes to Sonnet. Lookups and cleanup go to Haiku.
Example — "design and implement a caching layer":
Delegation
Choose agent type by task and model by complexity independently.
| Agent Type | Use When | Tools Available | |---|---|---| | `Explore` | Search, grep, read-only exploration | Read, Glob, Grep (no Edit/Write) | | `general-purpose` | Implementation, edits, multi-step work | All tools | | `Plan` | Architecture design, implementation planning | Read, Glob, Grep (no Edit/Write) |
Combine freely — `Explore` + `haiku` for file lookups, `general-purpose` + `sonnet` for implementation, `Explore` + `sonnet` for complex investigation, `Plan` + `sonnet` for straightforward design.
``` Agent(subagent_type: "Explore", model: "haiku", description: "...", prompt: "...") Agent(subagent_type: "general-purpose", model: "sonnet", description: "...", prompt: "...") Agent(subagent_type: "Explore", model: "sonnet", description: "...", prompt: "...") Agent(subagent_type: "Plan", model: "sonnet", description: "...", prompt: "...") ```
Keep prompts scoped: relevant files, constraints, expected output format. The `description` parameter is required — use a short 3-5 word summary.
Expected Output Formats
| Tier | Expected Output | |---|---| | Haiku | File paths, grep results, existence checks, formatted text | | Sonnet | Code diffs, implementation files, test files, error analysis | | Opus | Numbered plan steps, architecture decisions with rationale, risk assessment |
Parallel Fan-Out
When a request contains independent subtasks, launch multiple subagents in a single message.
Before splitting, build a dependency graph. Does any subtask need another's output, or do they modify the same file? If yes, run them sequentially.| Pattern | Parallel? | Why | |---|---|---| | Separate files, separate concerns | Yes | No shared state | | Research + unrelated implementation | Yes | Read-only doesn't conflict with writes | | Implementation + its tests | No | Tests depend on the implementation | | Feature + docs describing that feature | No | Docs need to reflect what was built | | Two features touching different modules | Yes | Independent code paths | | Refactor + anything in same files | No | Refactor changes the baseline |
After all phases return, review outputs for conflicts before applying.
User Overrides
If the user says any of the following, respect it for the current request:
Log the override with `Override: {what the user requested}` and resume normal routing on the next request.
Routing Log
After each routing decision, append a line to `routing.log` in the project root using the Bash tool:
``` echo "{model} | {agent_type or inline} | {estimated savings vs Opus, e.g. ~93%} | {brief task description}" >> routing.log ```
Log every `Routing to` and `Staying on` decision.