Resources
Curated research papers, frameworks, protocols, and tools for the practitioner.
Latest Dispatches
Updated Feb 22, 2026How we built Agent Builder's memory system
LangChain describes their implementation of a memory system for Agent Builder, covering the technical architecture and rationale for prioritizing persistent memory in agent workflows.
LangChain Blog frameworksAgent Observability Powers Agent Evaluation
LangChain emphasizes that reliable agent development requires understanding agent reasoning through observability and systematic evaluation approaches.
LangChain Blog evaluation0-Days: Evaluating and mitigating the growing risk of LLM-discovered vulnerabilities
Claude Opus 4.6 demonstrates significant capability in finding high-severity vulnerabilities in well-tested codebases by reading and reasoning about code like human researchers. Anthropic has found over 500 high-severity vulnerabilities in open source software using Claude.
@trq212 on X modelsContext-Bench: A benchmark for agentic context engineering
Letta Research introduces Context-Bench, a benchmark measuring agents' ability to perform filesystem operations, entity relationship tracing, and skill discovery/loading from libraries.
@Letta_AI on X evaluationzeitzeuge — AI-Powered Performance Analysis for Web & Tests
A performance analysis tool that uses a LangChain Deep Agent to autonomously analyze V8 heap snapshots, Chrome runtime traces, and CPU profiles to suggest code-level fixes.
@bromann on X frameworksProgrammatic tool calling
Claude API introduces programmatic tool calling, allowing Claude to write Python code that calls tools within a code execution container, reducing latency and token consumption for multi-tool workflows.
@RLanceMartin on X protocolsImproved Web Search with Dynamic Filtering
Claude's web search and web fetch tools now automatically write and execute code to filter search results before they reach the context window, improving accuracy by 11% and reducing token usage by 24%.
@RLanceMartin on X toolsHow to Use Memory in Agent Builder
LangChain's Agent Builder incorporates memory that retains user feedback, corrections, and preferences to improve agent performance over time.
LangChain Blog frameworksIBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST
IBM and UC Berkeley developed IT-Bench and MAST to diagnose why enterprise agents fail.
Hugging Face Blog evaluationResearch Papers
Foundational
Introduces the ReAct paradigm combining reasoning traces with actions.
Demonstrates self-supervised tool use learning in LLMs.
Foundational work on prompting LLMs for step-by-step reasoning.
Extends CoT with exploration of multiple reasoning paths.
Memory
Agents with memory for believable social simulation.
Hierarchical memory management for unbounded context.
Agents that learn from self-reflection and memory.
Multi-Agent
Framework for multi-agent conversation and collaboration.
Role-based multi-agent system for software development.
Multiple agents debate to improve reasoning quality.
RAG
Original RAG paper combining retrieval with generation.
Agents that decide when and what to retrieve.
Self-correcting retrieval with web search fallback.
Knowledge graph-based RAG for complex queries.
Safety
Self-supervision for safe AI behavior.
Automated red teaming for safety evaluation.
Frameworks
Framework for LLM-powered applications. Large ecosystem of integrations.
Stateful, multi-actor applications with LLMs. Graph-based control flow.
Unifying AutoGen and Semantic Kernel for multi-agent workflows.
Protocols
Anthropic's open protocol for connecting AI with tools and data sources.
Google's protocol for agent-to-agent communication and discovery.
Open standard for agentic commerce from discovery to purchase.
Evaluation Tools
Open-source evaluation framework for LLMs. Agent-specific metrics.
Evaluation framework for RAG applications. Component-level metrics.
CLI tool for testing and evaluating prompts. CI/CD integration.
Platform for debugging, testing, and monitoring LLM applications.
Enterprise platform for AI product development. Evals and logging.