danielhuber.dev@proton.me Saturday, April 4, 2026

Glossary

A reference of terms and concepts in AI agent engineering.


A2A
Agent2Agent Protocol; Google's open protocol for agent-to-agent communication and service discovery. — See: A2A Protocol
Action Space
The set of all possible actions an agent can take at any given step, shaped by which tools, APIs, and operations are available to it.
Agent
An AI system that perceives its environment, reasons about tasks, and takes actions autonomously using language models as its core reasoning engine.
Agent Fine-Tuning
Training or adapting a language model specifically for agentic tasks — tool selection, multi-step reasoning, and action planning — beyond general instruction following. — See: Agent Fine-Tuning
Agentic Loop
The core cycle of observe, think, and act that drives agent behavior, repeated until a task is complete or a stopping condition is met.
Agentic RAG
Retrieval-Augmented Generation where an agent decides when and what to retrieve, rather than always retrieving on every query. — See: Agentic RAG
Assembly Line Pattern
A multi-agent architecture where tasks flow through a sequence of specialized agents, each performing one stage of processing before passing results to the next.
BDI (Belief-Desire-Intention)
A cognitive architecture for agents where behavior is driven by beliefs about the world, desires (goals), and intentions (committed plans), originating from philosophy of mind.
Chain-of-Thought (CoT)
A prompting technique where the model reasons step-by-step before producing a final answer, improving accuracy on complex tasks.
Context Bloat
The problem of an agent's context window filling up with irrelevant, redundant, or low-value information, degrading performance.
Context Engineering
The practice of designing what information enters and exits an agent's context window to maximize relevance and minimize noise. — See: Context Engineering
Context Window
The maximum amount of text (measured in tokens) that a language model can process in a single call.
Deep Research Agent
An agent architecture that iteratively searches, reads, and synthesizes information from multiple sources to answer complex questions requiring extended investigation.
Delegation
The act of an orchestrator or supervisory agent assigning a subtask to a specialized agent, including passing context and receiving results.
Embedding
A dense vector representation of text used for semantic search and similarity comparison, typically produced by a specialized model.
Error Cascade
A failure mode where an early mistake by one agent propagates through downstream agents or steps, compounding into larger errors.
Evaluation (Evals)
Systematic measurement of agent performance across tasks, scenarios, and metrics to ensure reliability and track regressions. — See: Evaluation
Few-Shot Learning
Providing a small number of input-output examples in the prompt to guide the model's behavior on a task, without updating model weights.
Fine-Tuning
Further training a pre-trained language model on a specific dataset to improve performance on targeted tasks or domains.
Function Calling
The model's ability to output structured requests that invoke external functions or APIs, enabling interaction with the outside world. — See: Tool Use
Goal Drift
The tendency for an agent to gradually deviate from its original objective over the course of a long task, often due to accumulated context or intermediate distractions.
Grounding
Connecting model outputs to verified external data sources to reduce hallucination and improve factual accuracy.
Guardrails
Constraints and checks placed on agent behavior to ensure safety, compliance, and adherence to intended boundaries. — See: Guardrails
Hallucination
When a model generates plausible-sounding but factually incorrect or fabricated information.
Harness
The surrounding infrastructure — runtime, tools, memory, and orchestration logic — that enables an agent to operate.
Human-in-the-Loop (HITL)
A design pattern where human review or approval is required at critical decision points before an agent can proceed, balancing autonomy with oversight.
Inference
The process of running a trained model to generate outputs from inputs, as opposed to training. Each agent step involves one or more inference calls.
Learning & Adaptation
Mechanisms that allow agents to improve their behavior over time through experience, feedback, or self-evaluation without full model retraining. — See: Learning & Adaptation
MCP
Model Context Protocol; Anthropic's open protocol for connecting AI models with external tools and data sources in a standardized way. — See: MCP
MCP Apps
Applications built on the Model Context Protocol that expose tools, resources, or prompts as composable services for agents to consume. — See: MCP Apps
Memory System
Mechanisms for agents to store, organize, and recall information across interactions, including short-term (context) and long-term (persistent) memory. — See: Memory Systems
Multi-Agent Orchestration
Coordinating multiple specialized agents to collaborate on complex tasks through defined communication patterns and delegation strategies. — See: Multi-Agent Orchestration
Observability
The ability to understand an agent's internal state and behavior through logs, traces, metrics, and other instrumentation, enabling debugging and performance analysis.
Orchestrator
A supervisory agent or component that coordinates task decomposition, delegates subtasks to specialized agents, and aggregates their results.
Planner-Observer Pattern
An architecture where a planning agent generates a step-by-step strategy while a separate observer agent monitors execution, detects deviations, and triggers re-planning.
Planning
An agent's ability to decompose a complex goal into an ordered sequence of subtasks, anticipate dependencies, and reason about execution order before acting.
Prompt Caching
A provider feature that caches repeated prompt prefixes to reduce latency and cost on subsequent calls with the same prefix. — See: Prompt Caching
Proposer-Safety Oracle
A dual-agent pattern where one agent proposes actions and a second agent independently evaluates them for safety, compliance, or correctness before execution.
RAG
Retrieval-Augmented Generation; a pattern that combines document retrieval with LLM generation so the model can reference external knowledge.
ReAct Pattern
A framework combining reasoning traces with action steps in an interleaved loop: the model thinks, acts, observes, and repeats. — See: ReAct Pattern
Red Teaming
Adversarial testing to discover vulnerabilities, failure modes, and safety issues in AI systems before deployment.
Reflection
An agent's ability to evaluate its own outputs, reasoning, or past actions and use that self-assessment to improve subsequent steps.
RLHF
Reinforcement Learning from Human Feedback; a training technique where human preferences are used to fine-tune model behavior, aligning outputs with human intent.
Safety
The discipline of ensuring AI agents operate within intended boundaries, avoid harmful actions, and remain aligned with human values and policies. — See: Safety
Sandbox
An isolated execution environment that limits what actions an agent can take, preventing unintended side effects on the host system.
Scaffolding
The external code, prompts, and infrastructure wrapped around a language model to turn it into a functioning agent — distinct from the model itself.
Skills Pattern
Organizing agent capabilities as discrete, composable skill modules that can be selected and combined for different tasks.
Streaming
Delivering model outputs incrementally as they are generated, rather than waiting for the complete response, enabling real-time feedback and progressive UI updates.
Structured Output
Constraining a model's response to follow a specific format — such as JSON, XML, or a schema — ensuring machine-readable and predictable outputs.
System Prompt
The initial instruction text provided to a language model that sets its persona, behavior rules, available tools, and task context before any user input.
Temperature
A parameter controlling the randomness of model output; lower values produce more deterministic responses, higher values increase creativity and variability.
Token
The basic unit of text processing for language models — typically a word, subword, or character — used to measure input length, output length, and cost.
Tool Use
The capability of an LLM to invoke external tools — APIs, databases, code execution, file systems — to accomplish tasks beyond text generation. — See: Tool Use
Trace
A recorded log of an agent's reasoning steps, tool calls, and outputs used for debugging, evaluation, and observability.
UCP
Universal Commerce Protocol; an open standard for agent-driven commercial transactions from discovery to purchase. — See: UCP
Vector Store
A database optimized for storing and querying high-dimensional embedding vectors, enabling fast semantic similarity search.
Worktree
An isolated working copy of a repository that allows parallel work on different branches without switching the main checkout.
Zero-Shot
Prompting a model to perform a task without providing any examples, relying entirely on the model's pre-trained knowledge and instruction following.