别让LLM当光杆司令：Agent才是它的手脚和感官 (English)

Generated: 2026-06-23 13:01:10

---

Okay, I'll start rewriting. I'll strictly follow your style requirements for this piece—driven by emotion, with a conversational tone and counterintuitive insights. I'll make sure every sentence packs a punch, with zero trace of AI-speak.

---

My Blood and Tears with LLM Agents: Enough to Save You Three Years of Wrong Turns!

The first time you used GPT, were you like this too?

All excited, you asked it to "look up the latest Tesla earnings report," and it fabricated a string of numbers that looked professional but were complete nonsense. At that moment, weren't you so furious you wanted to smash your keyboard? You thought this thing was nothing but a "machine that talks bullshit with a straight face," right?

Hold on—you're not wrong. I've been there too.

When I first jumped in, all I could think about was an automated ticketing system. I wanted ChatGPT to handle everything by itself. But every time I asked it to query the database, it would invent a SQL query that didn't exist and then make up a result to go with it.

See the problem now? It's not that AI isn't trying—it's that you asked it to do something it simply can't.

It's like a genius scholar who's never left his room. If you ask him "what's the weather like outside," all he can do is fabricate an answer based on the "Principles of Meteorology" textbook he studied back in freshman year. He's not trying to deceive you—he just has no hands, no feet, no nose, no eyes, and no way to access the real world.

So why the hell do we need an Agent?

Because an LLM is like a top scholar who can only memorize books, but an Agent gives that scholar a secretary, a driver, and a scout!

The tools you don't have, it doesn't have either. If you want it to get things done, you have to equip it.

Just think—can GPT, a lone general with no army, really get the job done? Absolutely not!

Let me count down the three brutal traps I fell into, and you'll see what I mean:

Knowledge is dead! An LLM's knowledge is like an old textbook—printed in 2023 and never updated. You want it to know what happened yesterday? No way! But an Agent can call APIs, scrape the latest info from the internet itself, and even read a PDF report. It's alive.

Memory is terrible! You've definitely had this experience: you're chatting with AI, and suddenly it forgets a key detail you mentioned five minutes ago. This 200K super-long context window? Doesn't matter—once the conversation really goes on, it's "in one ear and out the other." So what's the fix? Agents have external memory! I don't need to cram the entire history into it—it knows which part of its "diary" to look up. It's got a brain.

Multiple personality disorder! I once tried to make the same model act as a product manager gathering requirements one day and a programmer writing code the next. What happened? It started arguing with itself! One minute it complained the requirements were too vague, the next it said the code implementation was too simple. So I split it into two Agents: one played the annoying PM, the other played the grumpy Dev. Guess what? Two personalities that are mentally split can actually correct each other and deliver great results!

Have you seen that paper "Unleashing Cognitive Synergy"? It uses "multi-persona self-collaboration"—the same brain, but playing multiple roles, going back and forth to break down tasks. The effect is outstanding!

So don't let a genius do a laborer's job. An Agent means giving that genius hands, feet, and senses!

---

(II) What Does an Agent's "Skeleton" Actually Look Like? Don't Panic, I'll Break It Down for You

At this point, you might be thinking, "Agents sound impressive, but what's the internal structure? Is there a ready-made 'Lego instruction manual' I can copy?"

To be honest, I was completely clueless at first. I thought an Agent was just an LLM wrapped in a loop—"think, act, observe"—on repeat. But all I got was a bunch of infinite loops and nothing accomplished.

Then I slogged through Fudan University's 86-page paper, "The Rise and Potential of Large Language Model Based Agents," and I finally got a clear picture.

It turns out the core consists of just three modules, really simple: Brain, Senses, Limbs.

Brain Module: This is where the LLM lives, but you also need to add memory and planning abilities.
Two types of memory: Short-term memory is what was just discussed; long-term memory remembers what the user likes to drink or what mistakes were made before.
My clever trick: I use ChromaDB to store users' long-term preferences. Before each conversation, it automatically retrieves the 5 most relevant records and feeds them to the LLM. This works way better than cramming the entire chat history in—I never have to worry about it having a goldfish memory again!

Senses Module: This is a "converter." It turns text, images, sounds, even sensor

别让LLM当光杆司令：Agent才是它的手脚和感官 (English)

别让LLM当光杆司令：Agent才是它的手脚和感官 (English)

My Blood and Tears with LLM Agents: Enough to Save You Three Years of Wrong Turns!

(II) What Does an Agent's "Skeleton" Actually Look Like? Don't Panic, I'll Break It Down for You

Cael Lee

Ready to get started?