danielhuber.dev@proton.me Wednesday, April 8, 2026

Resources

Curated research papers, frameworks, protocols, and tools for the practitioner.


Research Papers

ReAct: Synergizing Reasoning and Acting in Language Models
Yao et al. · ICLR 2023

Introduces the ReAct paradigm combining reasoning traces with actions.

2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Schick et al. · NeurIPS 2023

Demonstrates self-supervised tool use learning in LLMs.

2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei et al. · NeurIPS 2022

Foundational work on prompting LLMs for step-by-step reasoning.

2022
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Yao et al. · NeurIPS 2023

Extends CoT with exploration of multiple reasoning paths.

2023
Generative Agents: Interactive Simulacra of Human Behavior
Park et al. · UIST 2023

Agents with memory for believable social simulation.

2023
MemGPT: Towards LLMs as Operating Systems
Packer et al. · arXiv

Hierarchical memory management for unbounded context.

2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Shinn et al. · NeurIPS 2023

Agents that learn from self-reflection and memory.

2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Wu et al. · arXiv

Framework for multi-agent conversation and collaboration.

2023
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Hong et al. · arXiv

Role-based multi-agent system for software development.

2023
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Liang et al. · arXiv

Multiple agents debate to improve reasoning quality.

2023
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis et al. · NeurIPS 2020

Original RAG paper combining retrieval with generation.

2020
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Asai et al. · arXiv

Agents that decide when and what to retrieve.

2023
Corrective Retrieval Augmented Generation
Yan et al. · arXiv

Self-correcting retrieval with web search fallback.

2024
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Edge et al. · arXiv

Knowledge graph-based RAG for complex queries.

2024
Constitutional AI: Harmlessness from AI Feedback
Bai et al. · arXiv

Self-supervision for safe AI behavior.

2022
Red Teaming Language Models with Language Models
Perez et al. · EMNLP 2022

Automated red teaming for safety evaluation.

2022

Frameworks

LangChain Python/JS

Framework for LLM-powered applications. Large ecosystem of integrations.

LangGraph Python

Stateful, multi-actor applications with LLMs. Graph-based control flow.

Microsoft Agent Framework Python/C#

Unifying AutoGen and Semantic Kernel for multi-agent workflows.


Protocols

Model Context Protocol (MCP)

Anthropic's open protocol for connecting AI with tools and data sources.

Anthropic
Agent2Agent Protocol (A2A)

Google's protocol for agent-to-agent communication and discovery.

Google
Universal Commerce Protocol (UCP)

Open standard for agentic commerce from discovery to purchase.

Google + Shopify

Evaluation Tools

DeepEval

Open-source evaluation framework for LLMs. Agent-specific metrics.

RAGAS

Evaluation framework for RAG applications. Component-level metrics.

Promptfoo

CLI tool for testing and evaluating prompts. CI/CD integration.

LangSmith

Platform for debugging, testing, and monitoring LLM applications.

Braintrust

Enterprise platform for AI product development. Evals and logging.