The Sandbox Becomes a Runtime Primitive
Isolated code execution environments are emerging as a distinct layer of the agent stack, separable from both the harness and the model — with implications for security, portability, and cost.
Most Read
Requirements as Code: Git-Native Business Documents for Agentic Workflows
Exploring the idea of putting business requirements, architecture diagrams, and domain models in Git — and how this could enable agentic pipelines from requirement change to deployed code.
February 26, 2026What's Happening
Analysis
Context as a Deployable Artifact: The Third Layer of the Agent Stack
Agent context files are being pulled out of repos and into versioned, governed runtime stores — creating a third deployment surface alongside harness code and model weights.
May 16, 2026Alignment Is Splitting Into Two Layers: Midtraining and Runtime
Recent work from Anthropic, OpenAI, and Mozilla suggests alignment is no longer a single fine-tuning step — it's becoming a layered system spanning training stages and execution infrastructure.
May 9, 2026The Trace Becomes the Primary Artifact of Agent Engineering
Across evals, debugging, failure attribution, and self-improvement, the execution trace is consolidating as the central object practitioners build around — with consequences for tooling, storage, and team workflow.
May 2, 2026Each Role Owns a Contract: A Team Operating Model for Agentic Delivery
AI is making code faster to write but not teams faster to deliver. The bottleneck moves to handoffs — sign-offs, requirement changes, validation. This article explores a model where each role owns a versioned contract, and asks honestly where it helps and where it just relocates the problem.
April 29, 2026The Harness Is Now a Managed Surface — and a Managed Liability
Claude Code's quality regression, Gemini's Enterprise Agent Platform, and Anthropic's memory stores all point to the same shift: the harness is moving from something you build to something you consume — with consequences for debugging, eval reporting, and vendor lock-in.
April 25, 2026Why Memory Ownership Is Becoming a Harness Decision
As harnesses absorb session management, context compaction, and persistent memory, the choice of harness is increasingly a choice about who owns your agent's memory.
April 18, 2026From the Archive
The Self-Improving Harness: When Agent Infrastructure Learns to Optimize Itself
Agent harnesses are evolving from static scaffolding into self-modifying systems that mine their own failures, generate evals, and hill-climb their own performance — reshaping what it means to build and maintain agents in production.
April 11, 2026The Harness-Model Training Loop: Why the Boundary Between Agent Infrastructure and Model Weights Is Collapsing
Open models reaching agent parity, task-specific harness engineering, and trace-driven fine-tuning are merging what used to be separate concerns into a single iterative loop — with major implications for how teams build and operate agents.
April 4, 2026Sequenced Pipelines: How Structured Handoffs Improve Multi-Agent Systems
How sequenced specialist agents with defined handoff contracts and backward feedback loops produce more reliable results than flat swarms or orchestrator/worker splits.
April 2, 2026