Home / Blog / 大模型Agent的核心还是prompt? (English)

大模型Agent的核心还是prompt? (English)

By CaelLee | | 5 min read

大模型Agent的核心还是prompt? (English)

Generated: 2026-06-24 14:24:30

---

Let me tell you a story.

A while back, a friend came to me and asked: "Bro, I want to build an Agent project. How do I write the prompt to make it awesome?" I looked at his eager eyes, took a deep breath, and said something that left him speechless—

"The core of a large-model Agent isn't the prompt at all."

He was stunned. And I know a lot of people reading this are probably thinking the same thing. Ever since GPT-3 took off, the whole internet has been teaching you how to write prompts: role-setting, chain-of-thought, few-shot, format constraints… treated like writing code, full of formulas. You've definitely learned it too. I was the same a couple of years ago—thought I was a prompt-tuning wizard, demo running like a dream. Then I hit a real business scenario—and it crashed so hard even my mother wouldn't recognize it.

So what did I do? I spent three days, and re-tested everything with my project.

Three models side by side: GPT-4o, Claude 3.5 Sonnet, and Alibaba's Qwen2.5-Max. The scenario wasn't super complex—a small CLI assistant for debugging and auto-fixing code repos. I started with my old habits: writing huge system prompts for each subtask, cramming in personas, rules, examples—I felt invincible. Guess what? As soon as tasks piled up and steps got longer, the model either went off track, got stuck looping on one tool call, or flat-out forgot the result from the previous step. Dumb? No, it seemed pretty smart. But it just couldn't reliably finish one thing.

Where was the problem? It wasn't about how good the prompt was.

Think about it: an LLM call is basically "generate text based on current input." But what a real task demands is "consistently complete a chain of actions"—understand the project structure, run test commands, read error logs, modify code, re-run tests. Step by step, moving forward. The prompt has no control over that.

The model lives inside its own context. The task happens in an ever-changing external world. That gap—the prompt can't fill it.

That's when I changed my approach. Completely.

I stopped obsessing over the system prompt's little territory, and shifted my focus to the Agent's runtime design—basically, how to build a framework that can reliably execute multi-step tasks.

You've heard of the "seven core technical modules," right? Agent Loop, Prompt, Planning, Memory, Tools, Workflow, Environment. Looking back, I now spend 80% of my energy on engineering things like Loop, Memory, and Workflow. Prompt has become the lightest layer: fixed underlying instructions, dynamic content split into separate Markdown files (like SKILL.md, USER.md), loaded on demand during task execution. I call it "static-dynamic separation." Way easier to maintain than one giant blob of prompt per Agent—it's clean.

Have you seen Andrew Ng's Agentic AI course? Catch up on it now! He clearly shifted the focus from prompt optimization to multi-step process design, proposing four design patterns: Reflection, Tool Use, Planning, and Multi-Agent Collaboration. Once you watch it, you'll see that building an Agent today feels more like putting together backend microservices—just with a probabilistic black box in the middle. That old saying—"how to build reliable systems from unreliable components"—holds true in Agent architecture too.

Let me share a trap I fell into, so you can feel it yourself.

Once I built an Agent for a SaaS tool that needed to call three APIs: query a database, send an email, and update a status. I was thrilled when it worked the first time. Then the next day, it crashed in production. Why? The LLM's unpredictability led to different execution paths for the same input. One time the email was sent but the status wasn't updated—users got a notification, clicked in, and saw old data. See the problem? No matter how fancy your prompt is, you can't make a probabilistic model guarantee identical behavior twice. The solution? Build event logs into the system layer for the Agent's execution sequence, ensuring idempotency and replayability. That's pure backend data work—nothing to do with prompts.

Another example: an Agent calls three tools, crashes on the third step. If you rely only on the prompt, it has to start over from the beginning after restart. But with external state persistence—like writing execution progress into a database—you can retry from the failed step. Think about it: prompts really can't do that. You need architecture design.

So, is the core of an Agent the prompt?

I'll be clear: the prompt is the entry point, the skin—but definitely not the core.

A year or two ago, models were weaker—you really had to chant incantations, write step-by-step prompts to coax them into reasoning. But now look at GPT-4o and Claude 3.5 Sonnet: their built-in reasoning mechanisms are insanely strong. Turn on thinking mode, and they do multi-step reasoning internally, logic clearer than you could write. You don't even need to say "Let's think step by step"—just talk normally.

But here's the thing: no matter how smart the model is, it can only process the information you feed it. It doesn't know how far along the current task is, which tools have already been called, or what the external system state looks like. This information has to be managed at the system level. Pass it through the prompt? Not possible. The context window, even if huge, is still finite—and letting the model remember what happened before on its own, it will forget or misremember at any time.

Real Agent development has shifted from "how to ask the model" to "how to build the framework."

Your energy should go into workflow orchestration, data loops, state management, tool encapsulation, error handling—those engineering things. Of course, you still need to write prompts, but they're just a small piece of the whole system—and they'll get lighter over time.

If you're still spending a lot of time fussing over prompt tricks, even thinking tools like LangGraph or CrewAI are unnecessary—then I sincerely suggest you test it on a slightly more complex business scenario. Soon you'll discover the truth that cost me two years to learn:

You can make a model say beautiful things, but you can't make a task run steadily for ten minutes.

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free