Glossary

A reference of terms and concepts in AI agent engineering.

A

A2A

Agent2Agent Protocol; Google's open protocol for agent-to-agent communication and service discovery. — See: A2A Protocol

Action Space

The set of all possible actions an agent can take at any given step, shaped by which tools, APIs, and operations are available to it.

Agent

An AI system that perceives its environment, reasons about tasks, and takes actions autonomously using language models as its core reasoning engine.

Agent Fine-Tuning

Training or adapting a language model specifically for agentic tasks — tool selection, multi-step reasoning, and action planning — beyond general instruction following. — See: Agent Fine-Tuning

Agentic Loop

The core cycle of observe, think, and act that drives agent behavior, repeated until a task is complete or a stopping condition is met.

Agentic RAG

Retrieval-Augmented Generation where an agent decides when and what to retrieve, rather than always retrieving on every query. — See: Agentic RAG

Assembly Line Pattern

A multi-agent architecture where tasks flow through a sequence of specialized agents, each performing one stage of processing before passing results to the next.

B

BDI (Belief-Desire-Intention)

A cognitive architecture for agents where behavior is driven by beliefs about the world, desires (goals), and intentions (committed plans), originating from philosophy of mind.

C

Chain-of-Thought (CoT)

A prompting technique where the model reasons step-by-step before producing a final answer, improving accuracy on complex tasks.

Context Bloat

The problem of an agent's context window filling up with irrelevant, redundant, or low-value information, degrading performance.

Context Engineering

The practice of designing what information enters and exits an agent's context window to maximize relevance and minimize noise. — See: Context Engineering

Context Window

The maximum amount of text (measured in tokens) that a language model can process in a single call.

D

Deep Research Agent

An agent architecture that iteratively searches, reads, and synthesizes information from multiple sources to answer complex questions requiring extended investigation.

Delegation

The act of an orchestrator or supervisory agent assigning a subtask to a specialized agent, including passing context and receiving results.

E

Embedding

A dense vector representation of text used for semantic search and similarity comparison, typically produced by a specialized model.

Error Cascade

A failure mode where an early mistake by one agent propagates through downstream agents or steps, compounding into larger errors.

Evaluation (Evals)

Systematic measurement of agent performance across tasks, scenarios, and metrics to ensure reliability and track regressions. — See: Evaluation

F

Few-Shot Learning

Providing a small number of input-output examples in the prompt to guide the model's behavior on a task, without updating model weights.

Fine-Tuning

Further training a pre-trained language model on a specific dataset to improve performance on targeted tasks or domains.

Function Calling

The model's ability to output structured requests that invoke external functions or APIs, enabling interaction with the outside world. — See: Tool Use

G

Goal Drift

The tendency for an agent to gradually deviate from its original objective over the course of a long task, often due to accumulated context or intermediate distractions.

Grounding

Connecting model outputs to verified external data sources to reduce hallucination and improve factual accuracy.

Guardrails

Constraints and checks placed on agent behavior to ensure safety, compliance, and adherence to intended boundaries. — See: Guardrails

H

Hallucination

When a model generates plausible-sounding but factually incorrect or fabricated information.

Harness

The surrounding infrastructure — runtime, tools, memory, and orchestration logic — that enables an agent to operate.

Human-in-the-Loop (HITL)

A design pattern where human review or approval is required at critical decision points before an agent can proceed, balancing autonomy with oversight.

I

Inference

The process of running a trained model to generate outputs from inputs, as opposed to training. Each agent step involves one or more inference calls.

L

Learning & Adaptation

Mechanisms that allow agents to improve their behavior over time through experience, feedback, or self-evaluation without full model retraining. — See: Learning & Adaptation

M

MCP

Model Context Protocol; Anthropic's open protocol for connecting AI models with external tools and data sources in a standardized way. — See: MCP

MCP Apps

Applications built on the Model Context Protocol that expose tools, resources, or prompts as composable services for agents to consume. — See: MCP Apps

Memory System

Mechanisms for agents to store, organize, and recall information across interactions, including short-term (context) and long-term (persistent) memory. — See: Memory Systems

Multi-Agent Orchestration

Coordinating multiple specialized agents to collaborate on complex tasks through defined communication patterns and delegation strategies. — See: Multi-Agent Orchestration

O

Observability

The ability to understand an agent's internal state and behavior through logs, traces, metrics, and other instrumentation, enabling debugging and performance analysis.

Orchestrator

A supervisory agent or component that coordinates task decomposition, delegates subtasks to specialized agents, and aggregates their results.

P

Planner-Observer Pattern

An architecture where a planning agent generates a step-by-step strategy while a separate observer agent monitors execution, detects deviations, and triggers re-planning.

Planning

An agent's ability to decompose a complex goal into an ordered sequence of subtasks, anticipate dependencies, and reason about execution order before acting.

Prompt Caching

A provider feature that caches repeated prompt prefixes to reduce latency and cost on subsequent calls with the same prefix. — See: Prompt Caching

Proposer-Safety Oracle

A dual-agent pattern where one agent proposes actions and a second agent independently evaluates them for safety, compliance, or correctness before execution.

R

RAG

Retrieval-Augmented Generation; a pattern that combines document retrieval with LLM generation so the model can reference external knowledge.

ReAct Pattern

A framework combining reasoning traces with action steps in an interleaved loop: the model thinks, acts, observes, and repeats. — See: ReAct Pattern

Red Teaming

Adversarial testing to discover vulnerabilities, failure modes, and safety issues in AI systems before deployment.

Reflection

An agent's ability to evaluate its own outputs, reasoning, or past actions and use that self-assessment to improve subsequent steps.

RLHF

Reinforcement Learning from Human Feedback; a training technique where human preferences are used to fine-tune model behavior, aligning outputs with human intent.

S

Safety

The discipline of ensuring AI agents operate within intended boundaries, avoid harmful actions, and remain aligned with human values and policies. — See: Safety

Sandbox

An isolated execution environment that limits what actions an agent can take, preventing unintended side effects on the host system.

Scaffolding

The external code, prompts, and infrastructure wrapped around a language model to turn it into a functioning agent — distinct from the model itself.

Skills Pattern

Organizing agent capabilities as discrete, composable skill modules that can be selected and combined for different tasks.

Streaming

Delivering model outputs incrementally as they are generated, rather than waiting for the complete response, enabling real-time feedback and progressive UI updates.

Structured Output

Constraining a model's response to follow a specific format — such as JSON, XML, or a schema — ensuring machine-readable and predictable outputs.

System Prompt

The initial instruction text provided to a language model that sets its persona, behavior rules, available tools, and task context before any user input.

T

Temperature

A parameter controlling the randomness of model output; lower values produce more deterministic responses, higher values increase creativity and variability.

Token

The basic unit of text processing for language models — typically a word, subword, or character — used to measure input length, output length, and cost.

Tool Use

The capability of an LLM to invoke external tools — APIs, databases, code execution, file systems — to accomplish tasks beyond text generation. — See: Tool Use

Trace

A recorded log of an agent's reasoning steps, tool calls, and outputs used for debugging, evaluation, and observability.

U

UCP

Universal Commerce Protocol; an open standard for agent-driven commercial transactions from discovery to purchase. — See: UCP

V

Vector Store

A database optimized for storing and querying high-dimensional embedding vectors, enabling fast semantic similarity search.

W

Worktree

An isolated working copy of a repository that allows parallel work on different branches without switching the main checkout.

Z

Zero-Shot

Prompting a model to perform a task without providing any examples, relying entirely on the model's pre-trained knowledge and instruction following.