重读ReAct发现:多写一行思考,成功率从45%翻到71% (English)
重读ReAct发现:多写一行思考,成功率从45%翻到71% (English)
Generated: 2026-06-23 01:59:28
---
You're right, I completely understand the feeling you're after—the kind of article that makes you nod along and slap your thigh as you read. I've rewritten it, hiding the hardcore insights inside the story, letting emotion and rhythm walk you through it. How does this look?
---
Rereading the ReAct Paper: I Cracked the Secret No Agent Can Escape
Not gonna lie—I've been sick of hearing the same question for the past six months: "Where did Agents even come from? How did they suddenly get so popular?"
Every time I hear that, I want to shout: The answer is right there in that 2022 ReAct paper!
But honestly—most people swipe away after that sentence. It's too dry. Who has time to dig through an academic paper?
Until last week, I couldn't sleep, so I pulled that paper off my hard drive again and chewed through it line by line. And guess what? I never actually understood it the first time around.
Today I want to share the revelations that hit me like a bolt of lightning when I reread it. And along the way, I'll tell you about the traps I fell into myself—some of them still hurt when I think about them.
Why ReAct? What Makes It the Ancestor of Every Agent?
You've probably seen a ton of Agent frameworks by now—AutoGen, CrewAI, LangGraph… flashier names every time.
But underneath all of them, the secret sauce is that ReAct "Thought-Action-Observation" infinite loop.
It all comes back to one experiment in the paper that made me sit bolt upright.
The paper compared two approaches: Act-only and ReAct.
Act-only means the model just acts, without recording any intermediate thoughts. If you tell it "go get the key, then unlock the door," it executes an action, checks the environment, and executes the next one.
Sounds fine, right?
Well, Act-only only got 45% success on the ALFWorld dataset. ReAct? Straight to 71%!
Why such a big gap? Let me tell you a personal story from the trenches.
I was using a small model to fill out web forms. To save time, I made it output actions only, no thinking. The model got stuck on step three—it filled in an address, then forgot whether it had just entered the "shipping address" or "billing address." And once it chose the wrong one, it spun in circles on that mistake forever.
Later I added ReAct's Thought output—basically letting the model mutter to itself at each step:
"Step 3 right now, I just filled in the address field. Now I need to confirm whether this is the shipping or billing address, because the instruction says to deliver to the office…"
Just that one line of "inner monologue" doubled my success rate!
You want to know the key? I'll tell you—An Act-only model is essentially an amnesiac blind person. It can only see the environmental feedback right in front of it, but not the path it's already walked. ReAct's Thought acts like footprints left in the context, so the model can look back and see where it came from and where it's going.
Why Is ReAct Really Better Than CoT? You Might Not Believe This
You've probably heard of Chain-of-Thought (CoT)—making the model reason step by step. ReAct looks like CoT with an action shell wrapped around it, right?
The difference is huge!
Let me give you a real project example. If you use CoT to ask the model "What Bluetooth earphones are good for outdoor running on Amazon?" CoT will reason out a bunch of stuff—waterproof rating, battery life, comfort—all "guessed" by the model based on training data.
How does ReAct handle it? First it reasons: "User wants outdoor running, so it needs to be waterproof, sweatproof, and fit securely." Then it executes a search, throws those criteria into the Amazon search box, gets real-time results, looks at which ones match, reasons
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.