From Chatbots to Agents: A Practical Guide to Function Calling with OpenAI and Claude

Table of Contents

The Knowledge Cutoff: Why Your LLM Needs a Window

Think of a vanilla LLM as a brilliant researcher locked in a room without internet. They possess massive knowledge up to their training cutoff, but they are blind to the present. Ask GPT-4o or Claude 3.5 Sonnet about your company’s current stock in the Singapore warehouse, and it will likely hallucinate a confident, yet entirely wrong, number. It isn’t lying; it just doesn’t have the tools to look outside.

Developers often mistake AI for a simple text generator. In reality, Function Calling (or ‘Tool Use’) transforms these models into active agents. Instead of just chatting, the model decides when to ‘reach out’ and trigger a backend script, query a database, or ping a third-party API. In my own production environments, adding a single ‘search_order’ tool reduced hallucination rates by over 70% for customer support bots.

The Core Logic: AI Doesn’t Run Your Code

There is a persistent myth that the AI model executes your Python script or SQL queries. It doesn’t. Function Calling is actually a structured negotiation between your application and the LLM. You are essentially giving the AI a menu of capabilities.

When a user asks a question, the model scans your ‘tools’ list to see if anything fits the intent. If it finds a match, the AI pauses the conversation and returns a structured JSON object. This object contains the specific function name and arguments needed. Your application then runs that code locally, fetches the result, and feeds it back to the AI. Only then does the model formulate its final response. It is a hand-off, not a takeover.

Precision Matters: The JSON Schema

The model lives and dies by your descriptions. Both OpenAI and Claude use a JSON Schema variant to understand your functions. If you describe a parameter vaguely, expect the model to send junk data that crashes your backend. Be surgical. If a function requires a `user_id`, tell the model exactly what that string looks like—for example, ‘A 10-digit UUID starting with US-‘.

The Implementation Loop: Define, Detect, Execute

To build a reliable agent, follow this three-step architectural pattern:

Instruction: Send the user prompt along with your tool definitions to the API.
Signal: The model detects the need for a tool and returns a ‘tool_call’ request instead of text.
Closing the Loop: Your code executes the logic, gets the data, and sends a second request back so the AI can interpret the findings.

Implementing OpenAI Function Calling

OpenAI currently sets the pace for tool integration. Their Chat Completions API uses a tools array to define capabilities. Below is a practical example for checking real-time order status.

import openai
import json

# 1. Define the tool with strict descriptions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_order_status",
            "description": "Retrieves real-time shipping status for a specific order ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The unique order identifier, formatted like ORD-555."
                    }
                },
                "required": ["order_id"]
            }
        }
    }
]

# 2. Initial Request
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Where is my order ORD-999?"}],
    tools=tools,
    tool_choice="auto"
)

# 3. Process the AI's intent
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        args = json.loads(tool_call.function.arguments)
        # Mock database call: status = db.fetch(args['order_id'])
        print(f"The model is requesting: {tool_call.function.name} with {args}")

GPT-4o is remarkably good at extracting variables from messy human input. Even if a user says, ‘Can you check ORD-999 for me real quick?’, the model correctly maps that string to your order_id parameter without breaking a sweat.

Working with Claude’s Tool Use

Anthropic’s Claude 3.5 Sonnet handles tools with a slightly more rigid, yet reliable, structure. While the logic remains the same, the response handling requires you to track tool IDs carefully. Claude is specifically known for following complex, multi-step instructions more accurately than most other models.

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Fetch current weather for a specific city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state/country"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

response = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the temp in Tokyo?"}]
)

print(response.content)

When Claude triggers a tool, the stop_reason will be tool_use. You must then return the result using a specific tool_result block. This explicit tracking makes debugging complex agent chains much easier because every action is tied to a unique ID.

Production Hardening: Tips for the Real World

After deploying dozens of these integrations, I’ve found that three rules separate a toy from a tool. First, fail gracefully. If your API times out, pass that error back to the AI. A good model can apologize to the user or suggest an alternative instead of just crashing.

Second, context is everything. Don’t just name a tool ‘get_data’. Call it ‘fetch_nyse_stock_prices_in_usd’. The more descriptive your metadata, the less likely the AI is to pick the wrong tool. It is the difference between a guess and a calculated decision.

Finally, never skip security. Giving an LLM the ability to delete database records or transfer funds is dangerous. Always implement a ‘Human-in-the-Loop’ confirmation for any high-stakes action. You provide the brain, but you must also provide the leash.

Summary: Building the Agentic Future

Function calling is the bedrock of the ‘Agentic’ shift in AI. By connecting OpenAI and Claude to your existing stack, you move past simple chat bubbles toward systems that actually do the work. Start small. Connect one internal API, watch how the model interacts with it, and expand from there. The ability to act is what turns a chatbot into a valuable colleague.