Agent Engineering | Agent Engineering

Foundational

The ReAct Pattern

Reasoning plus Acting — the foundational loop that enables AI agents to think through problems and take targeted action in the world.

By the Editors February 18, 2026

Evaluation

Context-Bench: Benchmarking Agentic Context Engineering

A look at Context-Bench, Letta's benchmark for measuring how well language models perform context engineering tasks including filesystem traversal and dynamic skill loading.

February 20, 2026

Advanced

Dynamic Filtering for Web Search Agents

How agents use code execution to filter retrieved web content before it enters the context window, improving accuracy and reducing token costs.

February 20, 2026

Advanced

Programmatic Tool Calling

How agents can execute tool calls inside a sandboxed code environment to reduce round-trip latency and token overhead in multi-step workflows.

February 20, 2026

Protocols

Agent2Agent Protocol (A2A)

Google's open protocol enabling AI agents to discover, communicate, and collaborate across organizational boundaries using standardized task exchange.

February 18, 2026

Advanced

Agent-Assisted Fine-Tuning

How coding agents automate the entire LLM fine-tuning workflow from GPU selection to model deployment using natural language instructions.

February 18, 2026

Advanced

Agentic RAG

Beyond simple retrieve-then-generate: intelligent agents that decide when, what, and how to retrieve, then critique and correct their own retrieval.

February 18, 2026

Recent Dispatches

Context Layer

Context Bloat & Context Rot

How performance degrades within supported context limits, and practical strategies to detect, measure, and mitigate both failure modes.

February 18, 2026

Context Layer

Context Engineering

The discipline of optimizing what enters the context window — a key skill for practitioners building reliable agents alongside prompt engineering.

February 18, 2026

Evaluation

Evaluation & Metrics

Measuring agent performance across component accuracy, task completion, trajectory quality, and system-level metrics with benchmarks and LLM-as-judge.

February 18, 2026

View all 21 articles →

Latest from the Field

How we built Agent Builder's memory system

LangChain describes their implementation of a memory system for Agent Builder, covering the technical architecture and rationale for prioritizing persistent memory in agent workflows.

LangChain Blog

Agent Observability Powers Agent Evaluation

LangChain emphasizes that reliable agent development requires understanding agent reasoning through observability and systematic evaluation approaches.

LangChain Blog

0-Days: Evaluating and mitigating the growing risk of LLM-discovered vulnerabilities

Claude Opus 4.6 demonstrates significant capability in finding high-severity vulnerabilities in well-tested codebases by reading and reasoning about code like human researchers. Anthropic has found over 500 high-severity vulnerabilities in open source software using Claude.

@trq212 on X

Context-Bench: A benchmark for agentic context engineering

Letta Research introduces Context-Bench, a benchmark measuring agents' ability to perform filesystem operations, entity relationship tracing, and skill discovery/loading from libraries.

@Letta_AI on X