Coding Agent Runtimes Go Full-Stack

In a single week, sandboxes, subagents, deployment CLIs, and control planes all shipped across major platforms — tracing the shape of a full managed runtime.

Six months ago, a coding agent was an LLM with access to a shell. This week, in the span of five days, sandboxed execution environments, first-class subagent orchestration, deployment CLIs, local control planes, and enterprise fleet management all shipped — converging on the same architectural shape. The coding agent is becoming a managed runtime, and that shift has practical implications for how you build on top of it.

The stack that materialized

LangChain shipped Deep Agents (an opinionated harness with planning, filesystem access, and sub-agents baked in), Open SWE (an internal coding agent framework built on that harness), LangSmith Sandboxes (isolated code execution spun up in a single line), a deployment CLI (langgraph deploy), and Fleet (an enterprise workspace for managing teams of agents). That’s five infrastructure layers in one week from a single vendor: harness, sandbox, orchestration, deployment, and fleet management.

They weren’t alone. OpenAI promoted Codex subagents to general availability with default agent roles — explorer, worker, default — that mirror the same decomposition pattern. Anthropic shipped Claude Code channels, letting external events from Telegram and Discord push into running sessions. An independent project, LACP, delivered a local control plane for Claude Code and Codex with quality gates, session provenance, and sandbox policies. Even Stirrup, a new lightweight framework, ships with built-in code execution, web browsing, and MCP support as table stakes.

No one coordinated this. Everyone arrived at the same architecture independently.

The shape of the runtime

What’s converging isn’t a framework — it’s a runtime contract. The coding agent runtime that emerged this week has five layers:

Harness: The outer loop that manages planning, tool dispatch, and context. Deep Agents and Stirrup both treat this as an opinionated shell around the model, not a library you compose.
Sandbox: Isolated execution for code the agent writes. LangSmith Sandboxes and LACP both treat this as a first-class primitive, not an afterthought. The agent doesn’t run code in your environment; it runs code in its own.
Subagents: Task decomposition through spawning focused child agents. Codex ships three default roles. Open SWE orchestrates subagents for different phases of a coding task. Simon Willison’s guide frames subagents as the primary mechanism for working within context limits.
Control plane: Session management, quality gates, provenance tracking. LACP does this locally; Fleet does it at enterprise scale. Either way, someone needs to manage agent sessions the way someone manages container lifecycles.
Deployment: A way to push an agent from development to production. langgraph deploy is the most explicit signal — treating agent deployment as a CLI workflow analogous to docker push or serverless deploy.

This is a runtime stack in the same sense that containers have a runtime stack. The OCI runtime spec didn’t emerge because someone designed it top-down; it emerged because everyone building containers needed the same five things. We’re watching the same convergence happen for coding agents.

Note

The parallel to containers is precise: harness is the runtime (containerd), sandbox is the isolation boundary (runc), subagents are multi-container orchestration (compose), control plane is lifecycle management (kubelet), and deployment is the push-to-production workflow (kubectl apply). The analogy isn’t decorative — it predicts what comes next: a spec layer that lets you swap components.

Why subagents are the inflection point

Subagents crossing into general availability at both OpenAI and LangChain in the same week isn’t a coincidence — it’s the point where coding agents stop being single-process programs and become distributed systems. And this changes what reliability means.

A single agent failing is a retry problem. A tree of subagents failing is a coordination problem. When Codex spawns an explorer to understand a codebase, a worker to implement a change, and another worker to write tests, you now have partial failure modes: the explorer succeeded but the worker hallucinated a function signature the explorer found. The context is correct in one branch and stale in another.

Open SWE’s architecture handles this by building on LangGraph’s state machine orchestration — subagents share a graph with typed state transitions. LACP handles it with session provenance and quality gates. But neither approach has been battle-tested at the scale that enterprise Fleet deployments will demand. The teams that invest in subagent observability now — tracing parent-child relationships, detecting context divergence between branches, establishing rollback semantics — will be the ones that ship reliable coding agents in six months.

The model layer is adapting to the runtime

The runtime convergence is happening from both directions. OpenAI’s GPT-5.4 mini and nano are explicitly optimized for sub-agent tasks and high-volume API workloads — smaller, faster models designed to be the workers in a subagent tree, not the orchestrator. NVIDIA’s Nemotron-3-Super ships with configurable reasoning modes and 1M token context, purpose-built for agentic workflows that need to switch between deep analysis and fast execution. Mistral Small 4 unifies reasoning, multimodal, and coding capabilities into one model specifically because agents need all three without model-switching overhead.

This is the feedback loop that signals a runtime is real: the compute layer starts optimizing for the runtime’s execution patterns. When model providers ship variants tuned for subagent workloads, they’re telling you the subagent pattern is now an assumed deployment topology, not an experimental one.

Warning

If you’re building coding agents by wrapping a single model call with tool access, you’re building a CGI script in a world that just shipped application servers. The harness-sandbox-subagent stack isn’t optional infrastructure — it’s becoming the minimum viable architecture for production coding agents.

What to do differently on Monday

First, stop treating sandboxing as a security add-on. It’s a core execution primitive. If your agent runs code in the host environment, you’re one hallucinated rm -rf from a production incident. LangSmith and LACP both treat sandbox-per-execution as the default, and so should you.

Second, design for subagent trees, not single agents. This means typed interfaces between parent and child agents, explicit context handoff (not shared mutable state), and tracing that captures the full tree. The debugging experience for a four-deep subagent tree with no structured tracing is indistinguishable from debugging distributed microservices with no request IDs.

Third, treat agent deployment as a first-class workflow. The langgraph deploy CLI is a signal that agents are moving from notebooks to CI/CD pipelines. If you don’t have a repeatable deployment process for your agents today, start building one. The gap between “works in development” and “runs in production” for agents is at least as wide as it was for web services, and the tooling to close it just started shipping.

The coding agent runtime stack crystallized once before in outline — what happened this week is that every layer got a concrete, shippable implementation from multiple vendors simultaneously. The architecture is no longer theoretical. Build to it.