Tool Use & Function Calling | Agent Engineering

The bridge between language models and real-world actions, enabling agents to query APIs, execute code, and interact with external systems.

Tool use — also called function calling — is the capability that transforms a language model from a text generator into an agent capable of acting in the world. When an agent can call a weather API, query a database, execute code, or interact with a file system, it becomes genuinely useful for tasks that require real-time information or side effects. The model decides when to invoke a tool, which tool to use, and what arguments to pass; the host application then executes that tool and returns the result for the model to incorporate into its reasoning.

The mechanics are straightforward: the application provides the model with a set of tool definitions — typically as JSON Schema — alongside the user’s request. If the model determines that a tool call is appropriate, it emits a structured response containing the tool name and arguments rather than a plain-text reply. The application executes that call, captures the result, and sends it back as a new message in the conversation. The model then generates its final answer informed by the real-world data it just obtained.

Key Insight

Tools transform LLMs from text generators into agents that can take action in the world. The model decides when to use a tool, which tool to use, and what arguments to pass.

How Tool Calling Works

The flow is a well-defined cycle: the model receives a user request with tool definitions, optionally emits one or more tool calls, the application executes them and feeds results back, and the model generates its final response. This cycle can repeat multiple times for complex tasks.

Tool Calling Flow

User Request
   │
   ▼
┌─────────────────────────────────────┐
│           Language Model            │
│   (with tool definitions loaded)    │
└─────────────────────────────────────┘
   │
   │ Model decides to call tool
   ▼
┌─────────────────────────────────────┐
│         Tool Call Response          │
│  { name: "get_weather",             │
│    arguments: { "location": "NYC" } │
│  }                                  │
└─────────────────────────────────────┘
   │
   │ Application executes tool
   ▼
┌─────────────────────────────────────┐
│          Tool Execution             │
│   get_weather("NYC") → result       │
└─────────────────────────────────────┘
   │
   │ Result sent back to model
   ▼
┌─────────────────────────────────────┐
│           Language Model            │
│    (generates final response)       │
└─────────────────────────────────────┘
   │
   ▼
Final Response to User

Basic Tool Execution

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage

# Define tools using the @tool decorator
@tool
def get_weather(location: str, unit: str = "celsius") -> str:
    """Get current weather for a location.

    Args:
        location: City name, e.g. 'San Francisco'
        unit: Temperature unit (celsius or fahrenheit)
    """
    return weather_api.get_current(location, unit)

# Create LLM with tools bound
llm = ChatOpenAI(model="gpt-4")
llm_with_tools = llm.bind_tools([get_weather])

def execute_with_tools(user_message: str) -> str:
    messages = [HumanMessage(content=user_message)]

    # Get response (may include tool calls)
    response = llm_with_tools.invoke(messages)

    if response.tool_calls:
        messages.append(response)

        for tool_call in response.tool_calls:
            result = get_weather.invoke(tool_call["args"])
            messages.append(
                ToolMessage(content=result, tool_call_id=tool_call["id"])
            )

        return llm_with_tools.invoke(messages).content

    return response.content

Tool Definition Formats

Different LLM providers use slightly different formats for defining tools, though all share common elements: a name, a description, and a parameter schema. Writing detailed, accurate tool descriptions is one of the highest-leverage improvements you can make to an agent.

Tool definition formats vary by provider but share common elements
Provider	Format	Key Features
OpenAI	JSON Schema	Parallel calls, strict mode, function descriptions
Anthropic	JSON Schema	Tool use blocks, detailed descriptions encouraged
Google	OpenAPI-style	Function declarations with protobuf types
Open Models	Varies	Often use Hermes or ChatML format

Best Practice

Write detailed tool descriptions. Models use these descriptions to decide when to call a tool. Include examples of valid inputs and explain edge cases.

Parallel Tool Execution

Modern LLMs can request multiple tool calls simultaneously, dramatically reducing latency for tasks that need several independent pieces of information. When the model emits several tool calls in a single turn, the application can execute all of them concurrently and return all results together.

import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
async def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return await weather_api.get_async(location)

@tool
async def search_web(query: str) -> str:
    """Search the web for information."""
    return await search_api.search_async(query)

llm = ChatOpenAI(model="gpt-4")
tools = [get_weather, search_web]
llm_with_tools = llm.bind_tools(tools)

async def agent_with_parallel_tools(query: str) -> str:
    messages = [HumanMessage(content=query)]
    response = await llm_with_tools.ainvoke(messages)

    if response.tool_calls:
        messages.append(response)

        tool_tasks = []
        for tool_call in response.tool_calls:
            tool_fn = next(t for t in tools if t.name == tool_call["name"])
            tool_tasks.append(tool_fn.ainvoke(tool_call["args"]))

        results = await asyncio.gather(*tool_tasks)

        for tool_call, result in zip(response.tool_calls, results):
            messages.append(
                ToolMessage(content=result, tool_call_id=tool_call["id"])
            )

        return (await llm_with_tools.ainvoke(messages)).content

    return response.content

Consideration

Not all tool calls should be parallelized. If tools have dependencies — for example, creating a record and then updating it — they must be executed sequentially.

Error Handling & Retries

Robust tool execution requires handling failures gracefully. When a tool call fails, the model should receive the error message so it can reason about alternative approaches. Without error feedback, the agent has no basis for adapting its strategy.

from langchain_core.tools import tool, ToolException
from tenacity import retry, stop_after_attempt, wait_exponential

@tool(handle_tool_error=True)
def search_database(query: str) -> str:
    """Search the database with automatic error handling."""
    try:
        return db.search(query)
    except DatabaseError as e:
        raise ToolException(f"Database error: {e}")

# Using LangGraph for agent with error recovery
from langgraph.prebuilt import create_react_agent

agent = create_react_agent(
    llm,
    tools,
    handle_tool_errors=True
)

result = agent.invoke({"messages": [("user", query)]})

Trade-offs & Approaches

Approach	Pros	Cons
Static tool list	Simple, predictable, easy to test	Context bloat with many tools
Dynamic discovery	Scales to many tools	Additional latency, complexity
Tool clustering	Balance of both approaches	Routing logic complexity
Skills pattern	Massive token savings (98%+)	Requires filesystem access

Common Pitfalls

Vague tool descriptions are among the most common and costly mistakes: the model uses these descriptions to decide when and how to call tools, so ambiguous wording leads to incorrect selection or malformed arguments. Always include examples of valid inputs and describe edge cases explicitly. When a tool fails, always return the error details to the model rather than swallowing exceptions; without this feedback, the model has no way to reason about what went wrong or try an alternative approach. Finally, always set a maximum iteration limit to prevent agents from entering infinite tool-calling loops, and treat all tool arguments as untrusted input — validate and sanitize to prevent injection attacks.

Security: Unvalidated Arguments

Never pass tool arguments directly to system commands. Validate and sanitize all inputs to prevent injection attacks.