Streaming Patterns for AI Agent Graphs

A practical guide to the streaming modes available in agent graph frameworks, covering state updates, LLM token streams, tool lifecycle events, and subgraph outputs.

Latency is the enemy of perceived intelligence. When an agent is mid-execution — calling tools, reasoning through steps, generating a response — a silent UI feels broken. Streaming lets you surface real-time progress at multiple granularities: raw LLM tokens, graph state transitions, tool lifecycle events, and custom signals from within nodes. Understanding which streaming mode to use, and when, is a core infrastructure decision for production agent systems.

Why Streaming Is More Complex in Agent Graphs

In a simple chat completion, streaming means forwarding token deltas from the model to the client. In an agent graph, the picture is richer. A single user request may fan out through multiple nodes, invoke several tools, traverse nested subgraphs, and accumulate state across many steps. Each of those layers can produce meaningful real-time information — and your users or downstream systems may care about different layers for different reasons.

This complexity motivates a taxonomy of stream modes rather than a single stream flag. You need to decide: do you want the full graph state after every step, just the delta, individual LLM tokens, tool invocation events, or arbitrary custom signals? Each choice carries different bandwidth, latency, and parsing overhead.

Core Stream Modes and Their Trade-offs

Most graph-based agent frameworks expose a small set of canonical streaming modes:

values emits the entire state object after every node completes. It is the easiest to reason about — any consumer always has a consistent snapshot — but produces the most data. For large state schemas this can be wasteful.

updates emits only the delta returned by each node. Output is keyed by node name, so consumers know which node produced which change. This is the right default for most applications: low bandwidth, easy to attribute progress to a specific step.

messages emits (token, metadata) tuples as LLMs generate them, from anywhere in the graph — inside nodes, inside tools called by nodes, inside nested subgraphs. This is what powers character-by-character streaming in chat UIs.

tools emits lifecycle events for every tool call: on_tool_start, on_tool_event (for mid-execution progress), on_tool_end, and on_tool_error. This is distinct from messages — it operates at the tool invocation level, not the token level.

custom lets graph nodes emit arbitrary structured data via a writer handle, independent of state updates or LLM output. Useful for progress bars, status messages, or intermediate computation results.

debug is a catch-all that emits everything: node entry/exit, LLM calls, tool calls, state transitions. Treat it as a development-time diagnostic, not a production stream mode.

Note

Most frameworks let you combine stream modes in a single call by passing an array. The stream then emits [mode, chunk] tuples so consumers can route each event to the right handler. This is especially useful when you need both updates (for progress indicators) and messages (for token streaming) simultaneously.

// Combining updates + messages in a single stream
for await (const [mode, chunk] of await graph.stream(inputs, {
  streamMode: ["updates", "messages"],
})) {
  if (mode === "messages") {
    process.stdout.write(chunk[0].content); // stream tokens
  } else if (mode === "updates") {
    console.log("Node completed:", Object.keys(chunk));
  }
}

Tool Progress Streaming

Tool calls are often the longest-running operations in an agent graph — a web search, a code execution sandbox, a database query. The tools stream mode surfaces their lifecycle so your UI can show meaningful feedback during that wait.

The pattern requires two things: tools that emit progress events during execution, and a consumer that listens for on_tool_event between on_tool_start and on_tool_end.

// A tool that emits mid-execution progress
const searchTool = tool(
  async (input, config) => {
    const writer = config?.configurable?.streamWriter;
    writer?.({ status: "Fetching search results...", progress: 0.1 });
    const results = await fetchPage(input.query);
    writer?.({ status: "Parsing results", progress: 0.6 });
    const ranked = rankResults(results);
    writer?.({ status: "Done", progress: 1.0 });
    return ranked;
  },
  { name: "search", schema: searchInputSchema }
);

// Consumer side
for await (const [mode, chunk] of await graph.stream(inputs, {
  streamMode: ["tools", "updates"],
})) {
  if (mode === "tools") {
    if (chunk.type === "on_tool_start") {
      console.log(`Tool started: ${chunk.name}`);
    } else if (chunk.type === "on_tool_event") {
      console.log(`Progress: ${chunk.data.progress * 100}%`);
    } else if (chunk.type === "on_tool_end") {
      console.log(`Tool finished: ${chunk.name}`);
    }
  }
}

Tip

For React frontends, many frameworks provide a useStream hook that maps tool lifecycle events directly to component state. The tools stream mode is the right backend primitive to pair with these hooks — it gives you named events with structured payloads rather than raw token strings.

Subgraph Streaming and Nested Architectures

Multi-agent systems often compose graphs hierarchically: an orchestrator graph calls specialist subgraphs. By default, streaming only surfaces events from the top-level graph. To receive events from nested subgraphs, you need to opt in explicitly.

Orchestrator Graph
       │
       ├─── Node A ──► SubgraphX
       │                  ├─── Node X1  (tokens, tools emitted here)
       │                  └─── Node X2  (tokens, tools emitted here)
       │
       └─── Node B ──► SubgraphY
                          └─── Node Y1  (tokens, tools emitted here)

With subgraphs: true → all events surface to parent stream
Without         → only top-level node updates visible

When you enable subgraph streaming, event chunks carry a namespace field indicating which graph they originated from. This lets consumers apply different rendering logic — for example, displaying orchestrator state updates in a step tracker while piping subgraph tokens directly into a chat bubble.

for await (const chunk of await parentGraph.stream(inputs, {
  streamMode: "updates",
  subgraphs: true,
})) {
  const [namespace, data] = chunk;
  if (namespace.length === 0) {
    // top-level graph event
  } else {
    // event from subgraph identified by namespace array
    console.log(`From subgraph [${namespace.join(" > ")}]:`, data);
  }
}

Engineering Considerations for Production

A few practical decisions matter when wiring streaming into a real system:

Backpressure and buffering. Streaming over HTTP requires either SSE or chunked transfer encoding. Make sure your server framework doesn’t buffer stream chunks before flushing — a single large buffer defeats the purpose entirely. Set explicit flush intervals or use native streaming response APIs.

Selective streaming per model. Not every LLM in your graph needs to stream tokens. For nodes that call a model whose output feeds only into internal state (not the UI), disabling streaming can reduce overhead. Most frameworks let you configure streaming per model instance.

Error propagation. The on_tool_error event in tools mode is not a replacement for exception handling in your graph. It signals that a tool call failed, but your graph’s error recovery logic still needs to handle the resulting state. Treat tool error events as observability signals, not control flow.

Mode selection by consumer. Different consumers of the same agent graph often need different stream modes: a chat UI needs messages, an admin dashboard needs updates, a logging pipeline needs debug. Design your streaming endpoint to accept a streamMode parameter rather than hardcoding one mode server-side.

Warning

The debug stream mode can produce very high event volumes in complex graphs. Never use it as your default production mode — it is a diagnostic tool. Profile the event volume in staging before enabling it anywhere near a rate-limited downstream service.

Streaming in agent graphs is not an afterthought — it is the mechanism by which a system that may take 30 seconds to complete a task feels interactive rather than frozen. Choosing the right granularity of events, composing modes thoughtfully, and propagating them cleanly through nested subgraphs are the engineering decisions that separate a polished agent experience from one that feels opaque.