danielhuber.dev@proton.me Sunday, May 24, 2026
Agent Engineering
Practical ideas and concepts around agent engineering

The Sandbox Becomes a Runtime Primitive

Isolated code execution environments are emerging as a distinct layer of the agent stack, separable from both the harness and the model — with implications for security, portability, and cost.

May 23, 2026

Most Read

Requirements as Code: Git-Native Business Documents for Agentic Workflows

Exploring the idea of putting business requirements, architecture diagrams, and domain models in Git — and how this could enable agentic pipelines from requirement change to deployed code.

February 26, 2026

What's Happening


Analysis

Context as a Deployable Artifact: The Third Layer of the Agent Stack

Agent context files are being pulled out of repos and into versioned, governed runtime stores — creating a third deployment surface alongside harness code and model weights.

May 16, 2026

Alignment Is Splitting Into Two Layers: Midtraining and Runtime

Recent work from Anthropic, OpenAI, and Mozilla suggests alignment is no longer a single fine-tuning step — it's becoming a layered system spanning training stages and execution infrastructure.

May 9, 2026

The Trace Becomes the Primary Artifact of Agent Engineering

Across evals, debugging, failure attribution, and self-improvement, the execution trace is consolidating as the central object practitioners build around — with consequences for tooling, storage, and team workflow.

May 2, 2026

Each Role Owns a Contract: A Team Operating Model for Agentic Delivery

AI is making code faster to write but not teams faster to deliver. The bottleneck moves to handoffs — sign-offs, requirement changes, validation. This article explores a model where each role owns a versioned contract, and asks honestly where it helps and where it just relocates the problem.

April 29, 2026

The Harness Is Now a Managed Surface — and a Managed Liability

Claude Code's quality regression, Gemini's Enterprise Agent Platform, and Anthropic's memory stores all point to the same shift: the harness is moving from something you build to something you consume — with consequences for debugging, eval reporting, and vendor lock-in.

April 25, 2026

Why Memory Ownership Is Becoming a Harness Decision

As harnesses absorb session management, context compaction, and persistent memory, the choice of harness is increasingly a choice about who owns your agent's memory.

April 18, 2026

From the Archive

The Self-Improving Harness: When Agent Infrastructure Learns to Optimize Itself

Agent harnesses are evolving from static scaffolding into self-modifying systems that mine their own failures, generate evals, and hill-climb their own performance — reshaping what it means to build and maintain agents in production.

April 11, 2026

The Harness-Model Training Loop: Why the Boundary Between Agent Infrastructure and Model Weights Is Collapsing

Open models reaching agent parity, task-specific harness engineering, and trace-driven fine-tuning are merging what used to be separate concerns into a single iterative loop — with major implications for how teams build and operate agents.

April 4, 2026

Sequenced Pipelines: How Structured Handoffs Improve Multi-Agent Systems

How sequenced specialist agents with defined handoff contracts and backward feedback loops produce more reliable results than flat swarms or orchestrator/worker splits.

April 2, 2026