Coding Agent Swarms
The two-tier architecture for one-person engineering teams: an AI orchestrator with business context managing a fleet of specialized coding agents.
A single coding agent — Claude Code, Codex, or any CLI-based AI — sees code. It does not see your business. It does not know what the customer said in yesterday’s call, which feature requests matter most, or why a previous approach failed. It operates in a narrow context window filled with source files, and it does that well. But the gap between “write code” and “build the right thing” is bridged by something else entirely: an orchestration layer that holds the broader picture and translates business intent into precise coding tasks.
The coding agent swarm pattern separates these concerns into two tiers. The orchestrator holds business context — customer data, meeting notes, past decisions, domain knowledge — and translates that into well-scoped prompts. The coding agents receive those prompts and execute in isolated environments, focused entirely on implementation. The result is a system where a single person can drive the output of a small engineering team.
Fill the context window with code and you lose room for business context. Fill it with customer history and you lose the codebase. The two-tier architecture solves this by giving each agent exactly the context it needs — nothing more.
The Two-Tier Architecture
┌─────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR │
│ │
│ Business Context: │
│ ├── Customer data & meeting notes │
│ ├── Product roadmap & priorities │
│ ├── Past decisions & what failed │
│ └── Domain knowledge & compliance rules │
│ │
│ Responsibilities: │
│ ├── Scope tasks from high-level requests │
│ ├── Write detailed prompts with full context │
│ ├── Pick the right model/agent for each task │
│ ├── Monitor progress and intervene when stuck │
│ └── Notify humans when PRs are ready │
└──────────┬──────────────┬──────────────┬────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ AGENT 1 │ │ AGENT 2 │ │ AGENT 3 │
│ Backend │ │ Frontend │ │ Bug fix │
│ feature │ │ component │ │ Sentry #42 │
│ │ │ │ │ │
│ Worktree A │ │ Worktree B │ │ Worktree C │
│ Branch A │ │ Branch B │ │ Branch C │
└────────────┘ └────────────┘ └────────────┘
│ │ │
▼ ▼ ▼
PR #101 PR #102 PR #103 The orchestrator never writes code. The coding agents never make product decisions. This separation is the key insight — specialization through context, not through different model capabilities.
Isolation: One Agent, One Worktree
Each coding agent runs in its own isolated environment. In practice, this means a separate git worktree with its own branch, its own installed dependencies, and its own terminal session. Isolation prevents agents from stepping on each other — no merge conflicts mid-task, no shared state corruption, no one agent’s build breaking another’s tests.
The setup for each agent follows a consistent pattern:
- Create a worktree branching from main
- Install dependencies in the isolated directory
- Launch the agent in a dedicated terminal session with its prompt
- Register the task in a tracking file for monitoring
| Strategy | Isolation Level | Trade-off |
|---|---|---|
| Git worktrees | Branch-level | Light on disk (shares .git), heavy on RAM (separate node_modules) |
| Docker containers | Process-level | Full isolation but higher setup overhead per agent |
| Cloud sandboxes | Machine-level | Unlimited parallelism but network latency and cost |
| Shared workspace | None | Simple but agents collide on file writes and git state |
Each agent with its own worktree needs its own dependency install, its own build process, and its own test runner. Five parallel agents means five TypeScript compilers, five test suites, five sets of dependencies in memory. Budget 3-4 GB per agent on a typical Node.js project. A machine with 16 GB tops out at 4-5 concurrent agents before swapping.
The Monitoring Loop
Agents run autonomously, but they need supervision. A monitoring loop — running on a cron schedule or as a background process — checks on all active agents without expensive LLM calls. The monitor is deterministic: it reads the task registry, checks terminal sessions, inspects CI status, and only escalates to the orchestrator (or the human) when something needs attention.
Every N minutes: ┌─────────────────────────────────────────────────┐ │ READ task registry (JSON file) │ │ │ │ │ ├── For each active task: │ │ │ ├── Is the terminal session alive? │ │ │ ├── Has a PR been opened? │ │ │ ├── Is CI passing? │ │ │ └── Have code reviews completed? │ │ │ │ │ ├── Agent idle / crashed? │ │ │ └── Respawn with adjusted prompt │ │ │ (max 3 retries) │ │ │ │ │ ├── CI failing? │ │ │ └── Respawn agent with error context │ │ │ │ │ ├── All checks passed? │ │ │ └── Notify human: "PR ready for review" │ │ │ │ │ └── Max retries exceeded? │ │ └── Notify human: "Needs manual help" │ └─────────────────────────────────────────────────┘
The critical detail: the monitor script itself uses zero LLM tokens. It checks terminal status, git branch state, and CI APIs — all deterministic operations. The orchestrator LLM only gets involved when a retry requires rewriting the prompt with new context.
Definition of Done
A PR is not “done” when the agent creates it. An unsupervised agent will happily open a PR with failing tests and move on. The system needs a clear, automated definition of done before the human gets notified.
| Check | How It’s Verified | Automated? |
|---|---|---|
| PR created | GitHub API / gh CLI | Yes |
| Branch synced to main | No merge conflicts detected | Yes |
| CI passing | Lint, types, unit tests, E2E | Yes |
| AI code review | One or more model-based reviewers approve | Yes |
| Screenshots included | Required for UI changes | Yes (CI rule) |
| Human review | Final merge approval | No — intentionally manual |
Using multiple AI models for code review catches more issues than a single reviewer. Different models have different strengths — one may excel at logic errors and edge cases, another at security issues, a third at architecture concerns. The reviews post as PR comments, so the human reviewer sees a pre-analyzed summary.
Context-Aware Retries
When an agent fails, the naive approach is to respawn it with the same prompt. This works for transient failures but wastes tokens on systematic ones. The orchestrator should diagnose the failure and adjust the prompt accordingly, because it has context the coding agent does not.
Common failure modes and orchestrator responses:
- Agent ran out of context window: Narrow the scope. “Focus only on these three files.”
- Agent went in the wrong direction: Redirect with business context. “The customer wanted X, not Y. Here is what they said.”
- Agent needs information it does not have: Inject missing context. “The schema is defined in
src/types/template.ts. Use that as the source of truth.” - CI failed on the agent’s PR: Include the error output in the retry prompt. “The E2E test for the billing flow is failing with this error. Fix the root cause.”
This is what makes the orchestrator more than a task dispatcher — it is a context-aware supervisor that improves the prompt on each retry based on what went wrong and why.
Track which prompt structures and context patterns lead to successful outcomes (CI passing, reviews approved, human merge). Over time, the orchestrator learns what works: “always include type definitions upfront for billing tasks,” “Codex needs the test file paths explicitly,” “frontend tasks ship faster with a screenshot spec.”
Agent Routing
Not all tasks suit the same model or tool. An effective orchestrator routes tasks based on the nature of the work:
| Task Type | Best Agent Profile | Why |
|---|---|---|
| Backend logic, multi-file refactors | Strong reasoning model | Requires cross-file understanding and careful logic |
| Frontend components, styling | Fast model with good code generation | More pattern-matching than deep reasoning |
| UI/UX design specs | Multimodal model | Can generate visual specs that coding agents implement |
| Bug investigation | Strong reasoning model with error context | Needs to trace through call stacks and reproduce issues |
| Documentation updates | Fast, inexpensive model | Low complexity, high volume |
| Security-sensitive changes | Most capable model + stricter review | Higher stakes justify higher cost |
The routing does not need to be perfect. Even approximate routing — sending backend tasks to stronger reasoners and frontend tasks to faster generators — meaningfully improves throughput and cost efficiency compared to using one model for everything.
Proactive Task Discovery
The most advanced orchestrators do not wait for human instructions. They scan available signals and generate tasks autonomously:
- Error monitoring: Scan Sentry or equivalent for new exceptions, spawn agents to investigate and fix
- Meeting notes: Parse recent meeting transcripts for feature requests and bug reports, create tasks
- Git activity: After merges, update changelogs, regenerate documentation, run regression tests
- Customer feedback: Scan support tickets for patterns, prioritize fixes
Proactive task discovery is powerful but risky. An overeager orchestrator can burn through API credits on low-priority work or make changes nobody asked for. Set clear boundaries: which signal sources the orchestrator can act on, spending limits per time window, and which task categories require human approval before spawning agents.
The Full Workflow
Putting it all together, a typical end-to-end workflow looks like this:
- Input arrives — a customer request, a bug report, a feature idea
- Orchestrator scopes the work — using business context to translate the request into a specific, implementable task
- Orchestrator spawns agent(s) — each in an isolated worktree with a tailored prompt containing exactly the context needed
- Agents execute autonomously — writing code, running tests, creating PRs
- Monitor loop supervises — checking health, CI status, and review state on a schedule
- Failed agents get retried — with adjusted prompts based on the failure mode
- Automated reviews run — multiple AI models review the PR and post findings
- Human gets notified — only when everything passes: CI green, reviews approved, screenshots attached
- Human reviews and merges — a 5-10 minute review of a pre-validated PR
- Cleanup — orphaned worktrees and task records are pruned on a schedule
The human’s role shifts from writing code to reviewing PRs and making product decisions. The orchestrator handles the translation from intent to implementation. The coding agents handle the implementation itself.
Practical Limits
Memory constraints. Each parallel agent consumes significant RAM — its own dependency tree, build tools, and test runners. Plan for 3-4 GB per agent on a typical JavaScript/TypeScript project. Machines with 16 GB support 4-5 concurrent agents; scaling beyond that requires 64-128 GB or cloud-based sandboxes.
Coordination overhead. If two agents touch overlapping files, their PRs will conflict at merge time. The orchestrator should partition work to minimize overlap, or sequence dependent tasks rather than parallelizing them.
Cost management. Each agent consumes API tokens. With 5+ agents running simultaneously, costs can escalate quickly. Route simple tasks to cheaper models, set per-task token budgets, and track cost-per-merged-PR as an efficiency metric.
Diminishing returns. Not every task benefits from this architecture. A one-line bug fix does not need an orchestrator, a worktree, and three AI reviewers. Reserve the full swarm workflow for medium-to-large tasks where the setup overhead is amortized across meaningful implementation work.
When to Use This Pattern
The coding agent swarm pattern delivers the most value when:
- You are a solo developer or small team with more work than hands
- Tasks are parallelizable — multiple features, bug fixes, or refactors that do not depend on each other
- You have strong CI/CD — automated tests and linting that catch regressions without manual testing
- The codebase is well-structured — clear module boundaries make it safe for independent agents to work in different areas
- You can afford the infrastructure cost — RAM, API tokens, and the time to set up the orchestration layer
Start simple: one orchestrator, one coding agent, one worktree. Get the monitoring loop and definition of done right. Then scale the number of parallel agents as you build confidence in the system’s reliability.