大模型工具调用function call原理及实现 (English)

Generated: 2026-06-24 07:15:41

---

Hey, have you ever wondered—

When you type "Order a latte for me" into a chat box, how does that big model behind it actually know to call the takeout API instead of making up a guide to getting coffee?

Honestly, it took me almost two years to fully figure out this mechanism.

In the early days, I was so naive—just brute‑forcing it with prompts and regex. I’d write "Please output strictly in JSON format" a hundred times, and the model would reply, "Sure! Let me check the weather for Hangzhou for you ☀️." I was about to lose it!

It wasn’t until 2023, when OpenAI added Function Calling to GPT‑4, that I felt this was finally on the right track. Later, when I started building Agent projects myself, I stepped into quite a few pitfalls. Today, let’s break it down inside and out, focusing on the mistakes I personally made. Remember these—they’ll save you a lot of trouble down the road.

---

1. First, Let’s Get What It Actually Is

Think of it this way: a large language model is basically a text generator.

It’s like that super smart friend who has absolutely no common sense.

If you ask, "What’s the temperature in Beijing right now?" and it tries to answer from memory, it either makes something up or says, "My data only goes up to 2021"—so frustrating, right?

Function Calling is like giving it a toolbox.

You tell it: "You’ve got a wrench called get_weather. Just pass in a city parameter, turn it, and you get the real-time temperature."

When the model sees the user’s question actually needs real‑world data, it stops bluffing and instead outputs a "manual"—basically a JSON that says, "Hey, use this tool!"


{
 "tool_calls": [{
 "name": "get_weather",
 "arguments": {"city": "Beijing"}
 }]
}

You might think the model runs the function itself? No. It only makes the decision; the actual execution is done by your code. And this division of labor makes perfect sense: the LLM is good at understanding intent, but it should never have direct permission to manipulate system resources. You do the execution—adding parameter validation, access control, even running it in a sandbox—safe and flexible.

There are just four steps:

User asks → Model outputs a function call instruction → You run the code and get the result → The result is fed back to the model to generate the final answer.

Simple, right? But the pitfalls are hidden in those steps.

---

2. The Complete Runtime Process (I Fell into Every Step)

Step 1: Stuff the Tool Manual into the Model

Every time you call the model, you need to package descriptions of all available tools into a JSON Schema and send them over. For example, using the OpenAI API:


functions = [
 {
 "name": "get_weather",
 "description": "Get the weather for a given city on a given date",
 "parameters": {
 "type": "object",
 "properties": {
 "location": {
 "type": "string",
 "description": "City name, e.g., Beijing, Shanghai"
 },
 "date": {
 "type": "string",
 "description": "Date in YYYY-MM-DD format"
 }
 },
 "required": ["location"]
 }
 }
]

It’s like putting a wrench in the toolbox with a label on it.

At first I only wrote "get weather" as the description. The model would call it even when unnecessary, and sometimes pass random parameters like "North Atlantic."

Later I changed it to plain English:

"Get real‑time weather data for a specified city on a specified date. Only use this when you need to check the weather for today or the next seven days. If the user is asking about historical weather or climate, do not call this function."

Guess what? The model instantly got smarter! Sometimes it would even ask back, "Which date’s weather would you like?"

Here’s a big pitfall: the description must be detailed and specific—ideally telling the model when to use it and when not to.

The more it reads like a work guide you’d write for a new colleague, the less the model will mess up.

---

Step 2: The Model Decides Whether to Call

So the model receives your question and the list of tools. It judges: "The user wants the weather, I have this tool, I should call it."

Then it outputs that tool_calls JSON.

The key point is—this output is the model thinking it needs to call the function, but that doesn’t mean it’s always correct. If the description is vague, it could interpret "See if this weekend is good for a hike" as a reason to call get_weather, when what you really need is air quality or rain probability. It’ll just throw a weather function at it.

So you need to spell out all boundary conditions in the description upfront. This is like a tiny rule engine—do it well and the model behaves; do it poorly and you’ll be tearing your hair out.

---

Step 3: You Execute the Function and Get the Real Result

The model only says, "Call get_weather." The actual execution is your code.

I’ve seen many beginners think the model fetches the weather online by itself—wake up! It can’t even do simple arithmetic reliably, and you expect it to call external APIs accurately?

Here’s another pitfall: the function’s result should be clean and structured.

For example, the weather API might return:


{"status": "ok", "data": {"temperature": 26, "condition": "cloudy"}}

Can you paste that raw JSON back to the model? Yes, but it’s better to add a human‑readable summary like "Query successful: Beijing is currently 26°C and cloudy." That way the model understands it faster and answers more smoothly.

---

Step 4: Feed the Result Back to the Model for the Final Answer

This step is the easiest to overlook.

You append the function return to the conversation history, usually as a new tool role message. The model reads this result, combines it with the user’s original question, and crafts a natural‑language reply:

"Beijing is currently 26°C and cloudy—feels pretty comfortable, perfect weather to go out for a latte!"

See? Now it’s no longer that text generator that just makes things up.

It has data, and it speaks with confidence.

---

3. The Pitfalls That Made Me Want to Pull My Hair Out

I’ve stepped into more traps than the number of bridges I’ve crossed. Here are a few typical ones:

Pitfall 1: Conflicting parameters.

If you have two tools—one getweather and one getair_quality—and both have a location parameter, and you don’t clearly distinguish them in the description, the model might pass the weather parameters to the air quality tool, and everything breaks.

Pitfall 2: Not calling when it should.

The user asks, "Check my flight for tomorrow," and you have a search_flights tool, but the model just makes something up. Why? Because your description is too generic: "Search flight information." It didn’t remind the model to "use this tool whenever the user mentions anything flight‑related." After I changed it to "Use this tool to query the database when the user asks about flight schedules, prices, cancellations, or any flight information," the model instantly got it.

Pitfall 3: Wrong order of multiple function calls.

In some scenarios you need to call A first to get an ID, then call B. The model sometimes fires both calls at once, before the second one’s parameter is ready. You need to manually control the flow after getting the result, step by step in a multi‑turn conversation.

---

4. Wrapping Up

Function Calling isn’t magic.

It’s a bridge between the large model and the outside world.

If you build guardrails properly, it can run safely.

Remember: The model’s job is to be smart; your job is to make the results reliable.

From making things up to using tools—it’s only a Function Calling away. I’ve sorted out these pitfalls for you

大模型工具调用function call原理及实现 (English)

大模型工具调用function call原理及实现 (English)

1. First, Let’s Get What It Actually Is

2. The Complete Runtime Process (I Fell into Every Step)

Step 1: Stuff the Tool Manual into the Model

Step 2: The Model Decides Whether to Call

Step 3: You Execute the Function and Get the Real Result

Step 4: Feed the Result Back to the Model for the Final Answer

3. The Pitfalls That Made Me Want to Pull My Hair Out

4. Wrapping Up

Cael Lee

Ready to get started?