Resources

Curated research papers, frameworks, protocols, and tools for the practitioner.

Latest Dispatches

Updated Feb 22, 2026

How we built Agent Builder's memory system

LangChain describes their implementation of a memory system for Agent Builder, covering the technical architecture and rationale for prioritizing persistent memory in agent workflows.

LangChain Blog frameworks

Agent Observability Powers Agent Evaluation

LangChain emphasizes that reliable agent development requires understanding agent reasoning through observability and systematic evaluation approaches.

LangChain Blog evaluation

0-Days: Evaluating and mitigating the growing risk of LLM-discovered vulnerabilities

Claude Opus 4.6 demonstrates significant capability in finding high-severity vulnerabilities in well-tested codebases by reading and reasoning about code like human researchers. Anthropic has found over 500 high-severity vulnerabilities in open source software using Claude.

@trq212 on X models

Research Papers

Foundational

ReAct: Synergizing Reasoning and Acting in Language Models

Yao et al. · ICLR 2023

Introduces the ReAct paradigm combining reasoning traces with actions.

2023

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick et al. · NeurIPS 2023

Demonstrates self-supervised tool use learning in LLMs.

2023

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei et al. · NeurIPS 2022

Foundational work on prompting LLMs for step-by-step reasoning.

2022

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Yao et al. · NeurIPS 2023

Extends CoT with exploration of multiple reasoning paths.

2023

Memory

Generative Agents: Interactive Simulacra of Human Behavior

Park et al. · UIST 2023

Agents with memory for believable social simulation.

2023

MemGPT: Towards LLMs as Operating Systems

Packer et al. · arXiv

Hierarchical memory management for unbounded context.

2023

Reflexion: Language Agents with Verbal Reinforcement Learning

Shinn et al. · NeurIPS 2023

Agents that learn from self-reflection and memory.

2023

Multi-Agent

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Wu et al. · arXiv

Framework for multi-agent conversation and collaboration.

2023

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Hong et al. · arXiv

Role-based multi-agent system for software development.

2023

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Liang et al. · arXiv

Multiple agents debate to improve reasoning quality.

2023

RAG

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis et al. · NeurIPS 2020

Original RAG paper combining retrieval with generation.

2020

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Asai et al. · arXiv

Agents that decide when and what to retrieve.

2023

Corrective Retrieval Augmented Generation

Yan et al. · arXiv

Self-correcting retrieval with web search fallback.

2024

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Edge et al. · arXiv

Knowledge graph-based RAG for complex queries.

2024

Safety

Constitutional AI: Harmlessness from AI Feedback

Bai et al. · arXiv

Self-supervision for safe AI behavior.

2022

Red Teaming Language Models with Language Models

Perez et al. · EMNLP 2022

Automated red teaming for safety evaluation.

2022

Frameworks

LangChain Python/JS

Framework for LLM-powered applications. Large ecosystem of integrations.

LangGraph Python

Stateful, multi-actor applications with LLMs. Graph-based control flow.

Microsoft Agent Framework Python/C#

Unifying AutoGen and Semantic Kernel for multi-agent workflows.

Protocols

Model Context Protocol (MCP)

Anthropic's open protocol for connecting AI with tools and data sources.

Anthropic

Agent2Agent Protocol (A2A)

Google's protocol for agent-to-agent communication and discovery.

Google

Universal Commerce Protocol (UCP)

Open standard for agentic commerce from discovery to purchase.

Google + Shopify

Evaluation Tools

DeepEval

Open-source evaluation framework for LLMs. Agent-specific metrics.

RAGAS

Evaluation framework for RAG applications. Component-level metrics.

Promptfoo

CLI tool for testing and evaluating prompts. CI/CD integration.

LangSmith

Platform for debugging, testing, and monitoring LLM applications.

Braintrust

Enterprise platform for AI product development. Evals and logging.

Resources

Latest Dispatches

How we built Agent Builder's memory system

Agent Observability Powers Agent Evaluation

0-Days: Evaluating and mitigating the growing risk of LLM-discovered vulnerabilities

Context-Bench: A benchmark for agentic context engineering

zeitzeuge — AI-Powered Performance Analysis for Web & Tests

Programmatic tool calling

Improved Web Search with Dynamic Filtering

How to Use Memory in Agent Builder

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

Research Papers

Foundational

Memory

Multi-Agent

RAG

Safety

Frameworks

Protocols

Evaluation Tools