Coding Agent Swarms | Agent Engineering

The two-tier architecture for one-person engineering teams: an AI orchestrator with business context managing a fleet of specialized coding agents.

A single coding agent — Claude Code, Codex, or any CLI-based AI — sees code. It does not see your business. It does not know what the customer said in yesterday’s call, which feature requests matter most, or why a previous approach failed. It operates in a narrow context window filled with source files, and it does that well. But the gap between “write code” and “build the right thing” is bridged by something else entirely: an orchestration layer that holds the broader picture and translates business intent into precise coding tasks.

The coding agent swarm pattern separates these concerns into two tiers. The orchestrator holds business context — customer data, meeting notes, past decisions, domain knowledge — and translates that into well-scoped prompts. The coding agents receive those prompts and execute in isolated environments, focused entirely on implementation. The result is a system where a single person can drive the output of a small engineering team.

Context Windows Are Zero-Sum

Fill the context window with code and you lose room for business context. Fill it with customer history and you lose the codebase. The two-tier architecture solves this by giving each agent exactly the context it needs — nothing more.

The Two-Tier Architecture

Orchestrator + Agent Swarm

┌─────────────────────────────────────────────────────────────┐
│                      ORCHESTRATOR                            │
│                                                              │
│  Business Context:                                           │
│  ├── Customer data & meeting notes                           │
│  ├── Product roadmap & priorities                            │
│  ├── Past decisions & what failed                            │
│  └── Domain knowledge & compliance rules                     │
│                                                              │
│  Responsibilities:                                           │
│  ├── Scope tasks from high-level requests                    │
│  ├── Write detailed prompts with full context                │
│  ├── Pick the right model/agent for each task                │
│  ├── Monitor progress and intervene when stuck               │
│  └── Notify humans when PRs are ready                        │
└──────────┬──────────────┬──────────────┬────────────────────┘
         │              │              │
         ▼              ▼              ▼
  ┌────────────┐ ┌────────────┐ ┌────────────┐
  │  AGENT 1   │ │  AGENT 2   │ │  AGENT 3   │
  │  Backend   │ │  Frontend  │ │  Bug fix    │
  │  feature   │ │  component │ │  Sentry #42 │
  │            │ │            │ │             │
  │ Worktree A │ │ Worktree B │ │ Worktree C  │
  │ Branch A   │ │ Branch B   │ │ Branch C    │
  └────────────┘ └────────────┘ └────────────┘
         │              │              │
         ▼              ▼              ▼
      PR #101        PR #102        PR #103

The orchestrator never writes code. The coding agents never make product decisions. This separation is the key insight — specialization through context, not through different model capabilities.

Isolation: One Agent, One Worktree

Each coding agent runs in its own isolated environment. In practice, this means a separate git worktree with its own branch, its own installed dependencies, and its own terminal session. Isolation prevents agents from stepping on each other — no merge conflicts mid-task, no shared state corruption, no one agent’s build breaking another’s tests.

The setup for each agent follows a consistent pattern:

Create a worktree branching from main
Install dependencies in the isolated directory
Launch the agent in a dedicated terminal session with its prompt
Register the task in a tracking file for monitoring

Isolation Strategies
Strategy	Isolation Level	Trade-off
Git worktrees	Branch-level	Light on disk (shares .git), heavy on RAM (separate node_modules)
Docker containers	Process-level	Full isolation but higher setup overhead per agent
Cloud sandboxes	Machine-level	Unlimited parallelism but network latency and cost
Shared workspace	None	Simple but agents collide on file writes and git state

RAM Is the Real Bottleneck

Each agent with its own worktree needs its own dependency install, its own build process, and its own test runner. Five parallel agents means five TypeScript compilers, five test suites, five sets of dependencies in memory. Budget 3-4 GB per agent on a typical Node.js project. A machine with 16 GB tops out at 4-5 concurrent agents before swapping.

The Monitoring Loop

Agents run autonomously, but they need supervision. A monitoring loop — running on a cron schedule or as a background process — checks on all active agents without expensive LLM calls. The monitor is deterministic: it reads the task registry, checks terminal sessions, inspects CI status, and only escalates to the orchestrator (or the human) when something needs attention.

Agent Monitoring Loop

Every N minutes:
┌─────────────────────────────────────────────────┐
│ READ task registry (JSON file)                   │
│    │                                             │
│    ├── For each active task:                     │
│    │   ├── Is the terminal session alive?        │
│    │   ├── Has a PR been opened?                 │
│    │   ├── Is CI passing?                        │
│    │   └── Have code reviews completed?          │
│    │                                             │
│    ├── Agent idle / crashed?                     │
│    │   └── Respawn with adjusted prompt          │
│    │       (max 3 retries)                       │
│    │                                             │
│    ├── CI failing?                               │
│    │   └── Respawn agent with error context      │
│    │                                             │
│    ├── All checks passed?                        │
│    │   └── Notify human: "PR ready for review"   │
│    │                                             │
│    └── Max retries exceeded?                     │
│        └── Notify human: "Needs manual help"     │
└─────────────────────────────────────────────────┘

The critical detail: the monitor script itself uses zero LLM tokens. It checks terminal status, git branch state, and CI APIs — all deterministic operations. The orchestrator LLM only gets involved when a retry requires rewriting the prompt with new context.

Definition of Done

A PR is not “done” when the agent creates it. An unsupervised agent will happily open a PR with failing tests and move on. The system needs a clear, automated definition of done before the human gets notified.

Definition of Done Checklist
Check	How It’s Verified	Automated?
PR created	GitHub API / gh CLI	Yes
Branch synced to main	No merge conflicts detected	Yes
CI passing	Lint, types, unit tests, E2E	Yes
AI code review	One or more model-based reviewers approve	Yes
Screenshots included	Required for UI changes	Yes (CI rule)
Human review	Final merge approval	No — intentionally manual

Multi-Model Code Review

Using multiple AI models for code review catches more issues than a single reviewer. Different models have different strengths — one may excel at logic errors and edge cases, another at security issues, a third at architecture concerns. The reviews post as PR comments, so the human reviewer sees a pre-analyzed summary.

Context-Aware Retries

When an agent fails, the naive approach is to respawn it with the same prompt. This works for transient failures but wastes tokens on systematic ones. The orchestrator should diagnose the failure and adjust the prompt accordingly, because it has context the coding agent does not.

Common failure modes and orchestrator responses:

Agent ran out of context window: Narrow the scope. “Focus only on these three files.”
Agent went in the wrong direction: Redirect with business context. “The customer wanted X, not Y. Here is what they said.”
Agent needs information it does not have: Inject missing context. “The schema is defined in src/types/template.ts. Use that as the source of truth.”
CI failed on the agent’s PR: Include the error output in the retry prompt. “The E2E test for the billing flow is failing with this error. Fix the root cause.”

This is what makes the orchestrator more than a task dispatcher — it is a context-aware supervisor that improves the prompt on each retry based on what went wrong and why.

Reward Signals

Track which prompt structures and context patterns lead to successful outcomes (CI passing, reviews approved, human merge). Over time, the orchestrator learns what works: “always include type definitions upfront for billing tasks,” “Codex needs the test file paths explicitly,” “frontend tasks ship faster with a screenshot spec.”

Agent Routing

Not all tasks suit the same model or tool. An effective orchestrator routes tasks based on the nature of the work:

Task Routing Heuristics
Task Type	Best Agent Profile	Why
Backend logic, multi-file refactors	Strong reasoning model	Requires cross-file understanding and careful logic
Frontend components, styling	Fast model with good code generation	More pattern-matching than deep reasoning
UI/UX design specs	Multimodal model	Can generate visual specs that coding agents implement
Bug investigation	Strong reasoning model with error context	Needs to trace through call stacks and reproduce issues
Documentation updates	Fast, inexpensive model	Low complexity, high volume
Security-sensitive changes	Most capable model + stricter review	Higher stakes justify higher cost

The routing does not need to be perfect. Even approximate routing — sending backend tasks to stronger reasoners and frontend tasks to faster generators — meaningfully improves throughput and cost efficiency compared to using one model for everything.

Proactive Task Discovery

The most advanced orchestrators do not wait for human instructions. They scan available signals and generate tasks autonomously:

Error monitoring: Scan Sentry or equivalent for new exceptions, spawn agents to investigate and fix
Meeting notes: Parse recent meeting transcripts for feature requests and bug reports, create tasks
Git activity: After merges, update changelogs, regenerate documentation, run regression tests
Customer feedback: Scan support tickets for patterns, prioritize fixes

Guardrails on Autonomy

Proactive task discovery is powerful but risky. An overeager orchestrator can burn through API credits on low-priority work or make changes nobody asked for. Set clear boundaries: which signal sources the orchestrator can act on, spending limits per time window, and which task categories require human approval before spawning agents.

The Full Workflow

Putting it all together, a typical end-to-end workflow looks like this:

Input arrives — a customer request, a bug report, a feature idea
Orchestrator scopes the work — using business context to translate the request into a specific, implementable task
Orchestrator spawns agent(s) — each in an isolated worktree with a tailored prompt containing exactly the context needed
Agents execute autonomously — writing code, running tests, creating PRs
Monitor loop supervises — checking health, CI status, and review state on a schedule
Failed agents get retried — with adjusted prompts based on the failure mode
Automated reviews run — multiple AI models review the PR and post findings
Human gets notified — only when everything passes: CI green, reviews approved, screenshots attached
Human reviews and merges — a 5-10 minute review of a pre-validated PR
Cleanup — orphaned worktrees and task records are pruned on a schedule

The human’s role shifts from writing code to reviewing PRs and making product decisions. The orchestrator handles the translation from intent to implementation. The coding agents handle the implementation itself.

Practical Limits

Memory constraints. Each parallel agent consumes significant RAM — its own dependency tree, build tools, and test runners. Plan for 3-4 GB per agent on a typical JavaScript/TypeScript project. Machines with 16 GB support 4-5 concurrent agents; scaling beyond that requires 64-128 GB or cloud-based sandboxes.

Coordination overhead. If two agents touch overlapping files, their PRs will conflict at merge time. The orchestrator should partition work to minimize overlap, or sequence dependent tasks rather than parallelizing them.

Cost management. Each agent consumes API tokens. With 5+ agents running simultaneously, costs can escalate quickly. Route simple tasks to cheaper models, set per-task token budgets, and track cost-per-merged-PR as an efficiency metric.

Diminishing returns. Not every task benefits from this architecture. A one-line bug fix does not need an orchestrator, a worktree, and three AI reviewers. Reserve the full swarm workflow for medium-to-large tasks where the setup overhead is amortized across meaningful implementation work.

When to Use This Pattern

The coding agent swarm pattern delivers the most value when:

You are a solo developer or small team with more work than hands
Tasks are parallelizable — multiple features, bug fixes, or refactors that do not depend on each other
You have strong CI/CD — automated tests and linting that catch regressions without manual testing
The codebase is well-structured — clear module boundaries make it safe for independent agents to work in different areas
You can afford the infrastructure cost — RAM, API tokens, and the time to set up the orchestration layer

Start simple: one orchestrator, one coding agent, one worktree. Get the monitoring loop and definition of done right. Then scale the number of parallel agents as you build confidence in the system’s reliability.