danielhuber.dev@proton.me Saturday, April 4, 2026

Coding Agent Swarms

The two-tier architecture for one-person engineering teams: an AI orchestrator with business context managing a fleet of specialized coding agents.


February 24, 2026

A single coding agent — Claude Code, Codex, or any CLI-based AI — sees code. It does not see your business. It does not know what the customer said in yesterday’s call, which feature requests matter most, or why a previous approach failed. It operates in a narrow context window filled with source files, and it does that well. But the gap between “write code” and “build the right thing” is bridged by something else entirely: an orchestration layer that holds the broader picture and translates business intent into precise coding tasks.

The coding agent swarm pattern separates these concerns into two tiers. The orchestrator holds business context — customer data, meeting notes, past decisions, domain knowledge — and translates that into well-scoped prompts. The coding agents receive those prompts and execute in isolated environments, focused entirely on implementation. The result is a system where a single person can drive the output of a small engineering team.

Context Windows Are Zero-Sum

Fill the context window with code and you lose room for business context. Fill it with customer history and you lose the codebase. The two-tier architecture solves this by giving each agent exactly the context it needs — nothing more.

The Two-Tier Architecture

Orchestrator + Agent Swarm
┌─────────────────────────────────────────────────────────────┐
│                      ORCHESTRATOR                            │
│                                                              │
│  Business Context:                                           │
│  ├── Customer data & meeting notes                           │
│  ├── Product roadmap & priorities                            │
│  ├── Past decisions & what failed                            │
│  └── Domain knowledge & compliance rules                     │
│                                                              │
│  Responsibilities:                                           │
│  ├── Scope tasks from high-level requests                    │
│  ├── Write detailed prompts with full context                │
│  ├── Pick the right model/agent for each task                │
│  ├── Monitor progress and intervene when stuck               │
│  └── Notify humans when PRs are ready                        │
└──────────┬──────────────┬──────────────┬────────────────────┘
         │              │              │
         ▼              ▼              ▼
  ┌────────────┐ ┌────────────┐ ┌────────────┐
  │  AGENT 1   │ │  AGENT 2   │ │  AGENT 3   │
  │  Backend   │ │  Frontend  │ │  Bug fix    │
  │  feature   │ │  component │ │  Sentry #42 │
  │            │ │            │ │             │
  │ Worktree A │ │ Worktree B │ │ Worktree C  │
  │ Branch A   │ │ Branch B   │ │ Branch C    │
  └────────────┘ └────────────┘ └────────────┘
         │              │              │
         ▼              ▼              ▼
      PR #101        PR #102        PR #103

The orchestrator never writes code. The coding agents never make product decisions. This separation is the key insight — specialization through context, not through different model capabilities.

Isolation: One Agent, One Worktree

Each coding agent runs in its own isolated environment. In practice, this means a separate git worktree with its own branch, its own installed dependencies, and its own terminal session. Isolation prevents agents from stepping on each other — no merge conflicts mid-task, no shared state corruption, no one agent’s build breaking another’s tests.

The setup for each agent follows a consistent pattern:

  1. Create a worktree branching from main
  2. Install dependencies in the isolated directory
  3. Launch the agent in a dedicated terminal session with its prompt
  4. Register the task in a tracking file for monitoring
Isolation Strategies
StrategyIsolation LevelTrade-off
Git worktreesBranch-levelLight on disk (shares .git), heavy on RAM (separate node_modules)
Docker containersProcess-levelFull isolation but higher setup overhead per agent
Cloud sandboxesMachine-levelUnlimited parallelism but network latency and cost
Shared workspaceNoneSimple but agents collide on file writes and git state

The Monitoring Loop

Agents run autonomously, but they need supervision. A monitoring loop — running on a cron schedule or as a background process — checks on all active agents without expensive LLM calls. The monitor is deterministic: it reads the task registry, checks terminal sessions, inspects CI status, and only escalates to the orchestrator (or the human) when something needs attention.

Agent Monitoring Loop
Every N minutes:
┌─────────────────────────────────────────────────┐
│ READ task registry (JSON file)                   │
│    │                                             │
│    ├── For each active task:                     │
│    │   ├── Is the terminal session alive?        │
│    │   ├── Has a PR been opened?                 │
│    │   ├── Is CI passing?                        │
│    │   └── Have code reviews completed?          │
│    │                                             │
│    ├── Agent idle / crashed?                     │
│    │   └── Respawn with adjusted prompt          │
│    │       (max 3 retries)                       │
│    │                                             │
│    ├── CI failing?                               │
│    │   └── Respawn agent with error context      │
│    │                                             │
│    ├── All checks passed?                        │
│    │   └── Notify human: "PR ready for review"   │
│    │                                             │
│    └── Max retries exceeded?                     │
│        └── Notify human: "Needs manual help"     │
└─────────────────────────────────────────────────┘

The critical detail: the monitor script itself uses zero LLM tokens. It checks terminal status, git branch state, and CI APIs — all deterministic operations. The orchestrator LLM only gets involved when a retry requires rewriting the prompt with new context.

Definition of Done

A PR is not “done” when the agent creates it. An unsupervised agent will happily open a PR with failing tests and move on. The system needs a clear, automated definition of done before the human gets notified.

Definition of Done Checklist
CheckHow It’s VerifiedAutomated?
PR createdGitHub API / gh CLIYes
Branch synced to mainNo merge conflicts detectedYes
CI passingLint, types, unit tests, E2EYes
AI code reviewOne or more model-based reviewers approveYes
Screenshots includedRequired for UI changesYes (CI rule)
Human reviewFinal merge approvalNo — intentionally manual
Multi-Model Code Review

Using multiple AI models for code review catches more issues than a single reviewer. Different models have different strengths — one may excel at logic errors and edge cases, another at security issues, a third at architecture concerns. The reviews post as PR comments, so the human reviewer sees a pre-analyzed summary.

Context-Aware Retries

When an agent fails, the naive approach is to respawn it with the same prompt. This works for transient failures but wastes tokens on systematic ones. The orchestrator should diagnose the failure and adjust the prompt accordingly, because it has context the coding agent does not.

Common failure modes and orchestrator responses:

  • Agent ran out of context window: Narrow the scope. “Focus only on these three files.”
  • Agent went in the wrong direction: Redirect with business context. “The customer wanted X, not Y. Here is what they said.”
  • Agent needs information it does not have: Inject missing context. “The schema is defined in src/types/template.ts. Use that as the source of truth.”
  • CI failed on the agent’s PR: Include the error output in the retry prompt. “The E2E test for the billing flow is failing with this error. Fix the root cause.”

This is what makes the orchestrator more than a task dispatcher — it is a context-aware supervisor that improves the prompt on each retry based on what went wrong and why.

Reward Signals

Track which prompt structures and context patterns lead to successful outcomes (CI passing, reviews approved, human merge). Over time, the orchestrator learns what works: “always include type definitions upfront for billing tasks,” “Codex needs the test file paths explicitly,” “frontend tasks ship faster with a screenshot spec.”

Agent Routing

Not all tasks suit the same model or tool. An effective orchestrator routes tasks based on the nature of the work:

Task Routing Heuristics
Task TypeBest Agent ProfileWhy
Backend logic, multi-file refactorsStrong reasoning modelRequires cross-file understanding and careful logic
Frontend components, stylingFast model with good code generationMore pattern-matching than deep reasoning
UI/UX design specsMultimodal modelCan generate visual specs that coding agents implement
Bug investigationStrong reasoning model with error contextNeeds to trace through call stacks and reproduce issues
Documentation updatesFast, inexpensive modelLow complexity, high volume
Security-sensitive changesMost capable model + stricter reviewHigher stakes justify higher cost

The routing does not need to be perfect. Even approximate routing — sending backend tasks to stronger reasoners and frontend tasks to faster generators — meaningfully improves throughput and cost efficiency compared to using one model for everything.

Proactive Task Discovery

The most advanced orchestrators do not wait for human instructions. They scan available signals and generate tasks autonomously:

  • Error monitoring: Scan Sentry or equivalent for new exceptions, spawn agents to investigate and fix
  • Meeting notes: Parse recent meeting transcripts for feature requests and bug reports, create tasks
  • Git activity: After merges, update changelogs, regenerate documentation, run regression tests
  • Customer feedback: Scan support tickets for patterns, prioritize fixes

The Full Workflow

Putting it all together, a typical end-to-end workflow looks like this:

  1. Input arrives — a customer request, a bug report, a feature idea
  2. Orchestrator scopes the work — using business context to translate the request into a specific, implementable task
  3. Orchestrator spawns agent(s) — each in an isolated worktree with a tailored prompt containing exactly the context needed
  4. Agents execute autonomously — writing code, running tests, creating PRs
  5. Monitor loop supervises — checking health, CI status, and review state on a schedule
  6. Failed agents get retried — with adjusted prompts based on the failure mode
  7. Automated reviews run — multiple AI models review the PR and post findings
  8. Human gets notified — only when everything passes: CI green, reviews approved, screenshots attached
  9. Human reviews and merges — a 5-10 minute review of a pre-validated PR
  10. Cleanup — orphaned worktrees and task records are pruned on a schedule

The human’s role shifts from writing code to reviewing PRs and making product decisions. The orchestrator handles the translation from intent to implementation. The coding agents handle the implementation itself.

Practical Limits

Memory constraints. Each parallel agent consumes significant RAM — its own dependency tree, build tools, and test runners. Plan for 3-4 GB per agent on a typical JavaScript/TypeScript project. Machines with 16 GB support 4-5 concurrent agents; scaling beyond that requires 64-128 GB or cloud-based sandboxes.

Coordination overhead. If two agents touch overlapping files, their PRs will conflict at merge time. The orchestrator should partition work to minimize overlap, or sequence dependent tasks rather than parallelizing them.

Cost management. Each agent consumes API tokens. With 5+ agents running simultaneously, costs can escalate quickly. Route simple tasks to cheaper models, set per-task token budgets, and track cost-per-merged-PR as an efficiency metric.

Diminishing returns. Not every task benefits from this architecture. A one-line bug fix does not need an orchestrator, a worktree, and three AI reviewers. Reserve the full swarm workflow for medium-to-large tasks where the setup overhead is amortized across meaningful implementation work.

When to Use This Pattern

The coding agent swarm pattern delivers the most value when:

  • You are a solo developer or small team with more work than hands
  • Tasks are parallelizable — multiple features, bug fixes, or refactors that do not depend on each other
  • You have strong CI/CD — automated tests and linting that catch regressions without manual testing
  • The codebase is well-structured — clear module boundaries make it safe for independent agents to work in different areas
  • You can afford the infrastructure cost — RAM, API tokens, and the time to set up the orchestration layer

Start simple: one orchestrator, one coding agent, one worktree. Get the monitoring loop and definition of done right. Then scale the number of parallel agents as you build confidence in the system’s reliability.

Tags: multi-agentorchestrationcoding-agentsautomation