The ReAct Pattern
Reasoning plus Acting — the foundational loop that enables AI agents to think through problems and take targeted action in the world.
Context-Bench: Benchmarking Agentic Context Engineering
A look at Context-Bench, Letta's benchmark for measuring how well language models perform context engineering tasks including filesystem traversal and dynamic skill loading.
February 20, 2026Dynamic Filtering for Web Search Agents
How agents use code execution to filter retrieved web content before it enters the context window, improving accuracy and reducing token costs.
February 20, 2026Programmatic Tool Calling
How agents can execute tool calls inside a sandboxed code environment to reduce round-trip latency and token overhead in multi-step workflows.
February 20, 2026Agent2Agent Protocol (A2A)
Google's open protocol enabling AI agents to discover, communicate, and collaborate across organizational boundaries using standardized task exchange.
February 18, 2026Agent-Assisted Fine-Tuning
How coding agents automate the entire LLM fine-tuning workflow from GPU selection to model deployment using natural language instructions.
February 18, 2026Agentic RAG
Beyond simple retrieve-then-generate: intelligent agents that decide when, what, and how to retrieve, then critique and correct their own retrieval.
February 18, 2026Recent Dispatches
Context Bloat & Context Rot
How performance degrades within supported context limits, and practical strategies to detect, measure, and mitigate both failure modes.
February 18, 2026Context Engineering
The discipline of optimizing what enters the context window — a key skill for practitioners building reliable agents alongside prompt engineering.
February 18, 2026Evaluation & Metrics
Measuring agent performance across component accuracy, task completion, trajectory quality, and system-level metrics with benchmarks and LLM-as-judge.
February 18, 2026Latest from the Field
How we built Agent Builder's memory system
LangChain describes their implementation of a memory system for Agent Builder, covering the technical architecture and rationale for prioritizing persistent memory in agent workflows.
LangChain BlogAgent Observability Powers Agent Evaluation
LangChain emphasizes that reliable agent development requires understanding agent reasoning through observability and systematic evaluation approaches.
LangChain Blog0-Days: Evaluating and mitigating the growing risk of LLM-discovered vulnerabilities
Claude Opus 4.6 demonstrates significant capability in finding high-severity vulnerabilities in well-tested codebases by reading and reasoning about code like human researchers. Anthropic has found over 500 high-severity vulnerabilities in open source software using Claude.
@trq212 on XContext-Bench: A benchmark for agentic context engineering
Letta Research introduces Context-Bench, a benchmark measuring agents' ability to perform filesystem operations, entity relationship tracing, and skill discovery/loading from libraries.
@Letta_AI on X