陈巍:ChatGPT大模型技术精要— (English)

Generated: 2026-06-20 23:02:51

---

Okay, let me go over the text for you. The overall facts are solid, and the data (GPT parameters, Transformer timeline, RLHF pipeline, etc.) are essentially accurate. But there are two points I’d suggest adjusting: first, the specific cost of annotators might lack a public source; second, TAMER is far less well-known than PPO, so listing them together here could be misleading. Stylistically, you're already trying to avoid that "AI-written" feel, and the vocabulary and pacing sound natural enough. I’ve mainly polished a few parallel structures and transitions, and trimmed a couple of overly deliberate "quotable lines." Here’s the revised version.

---

Hey, friend! Have you ever had that feeling of being struck by a tech article?

Around this time last year, news about GPT-4 hit the AI community like a bomb. Back then, I was leading a team working on smart customer service. I’d be scrolling through Zhihu until two in the morning, my eyes practically blurring over. It wasn’t until I stumbled across Dr. Chen Wei’s article, The Essential Guide to ChatGPT Large Models — Development History, Principles, Technical Architecture, and Industry Future, that the tangled mess in my head suddenly—snap—fell into place.

Let me pause here for a second. I’m sure you’ve heard the soul-searching question: Why can’t China build something like ChatGPT? Most people’s first reaction is “the algorithms aren’t good enough,” or “we don’t have enough computing power.” Wrong. Chen Wei touched on it: it’s a systemic problem. I mulled it over for months, and the more I thought about it, the more it clicked—it’s not one weak link; it’s the whole system holding things back. The research evaluation system pushes you to publish papers and move on. The engineering culture says, “if it works, it’s fine.” The data ecosystem is fragmented. And before people even talk tech, they’re talking salary structures. It’s like a soccer team: no matter how fast your forward is, if the midfield won’t pass and the defense is leaking, how can you win?

Think about it—doesn’t that ring true?

From GPT-1 to ChatGPT: I Didn’t Just Watch It Grow Up—I Fell Flat on My Face Watching It

The first time I encountered the GPT series was in 2020. GPT-3 had just been released—175 billion parameters—and I excitedly tried to run it on my own junk server. The VRAM maxed out immediately: the screen went black, and I broke out in a cold sweat. That’s when I had this gut feeling: something was off. The parameter scale was exploding way too fast, like a rocket taking off.

Chen Wei’s article laid out that trajectory perfectly. GPT-1: 117 million parameters. GPT-2: 1.5 billion. GPT-3: 175 billion. With each leap, you might think it’s just stacking more parameters—but that would be completely wrong. Behind it, the architecture and training methods were undergoing a qualitative transformation. It’s like going from a bicycle to a motorcycle, then to a supercar—the engine is completely different.

The Transformer was the real turning point. Full stop. When Google published Attention Is All You Need in 2017, I was still using traditional Seq2Seq models for translation. Honestly, I didn’t think much of it at first—dismissed it as another academic hype. Then in 2018, GPT-1 used the Transformer for unsupervised pre-training, and I was completely stunned. The direction had shifted, and it had shifted for good.

Chen Wei’s article was a wake-up call for me: GPT-1 proved one thing—“unsupervised pre-training + supervised fine-tuning” worked. But what really knocked me out was GPT-2. 1.5 billion parameters. Zero-shot learning. It could work without any fine-tuning! Think about it: it’s like you teach a kid “this is a cat,” and not only does he recognize cats, but he also learns to recognize dogs on his own. Back then, I used it for text generation, and the results were explosive. I was convinced traditional methods were doomed.

GPT-3’s In-Context Learning: I’ve Seen Both Its Magic and Its Absurdity

When GPT-3 came out in 2020, I was struggling with a customer service dialogue system. The traditional approach? Annotate tens of thousands of data points, train an intent classification model, and redo the labeling every time requirements change—just thinking about it was overwhelming. But GPT-3’s few-shot learning was a revelation: give it a few examples, and it could automatically grasp the task, like a human “learning from examples.”

Chen Wei called this “in-context learning.” I think that translation is spot-on—it’s not really “learning” in essence; it’s “understanding your context.” It’s like when you tell a friend “see you at the usual place,” and they automatically know where that is.

But to be honest, GPT-3 had one fatal flaw: its output was completely uncontrollable. You’d ask, “What’s the weather like today?” and it might ramble about philosophy for half an hour. I tried it once and almost got complaints from a client. It wasn’t until GPT-3.5/ChatGPT introduced instruction fine-tuning that large models truly became “usable.” See, the key step wasn’t increasing the parameter count—it was teaching the model to “listen to humans.”

RLHF Isn’t a Miracle Cure: The Potholes I’ve Hit Are Deeper Than You Think

Chen Wei devoted a lot of space to RLHF. I’ve seen too many articles hype RLHF as a silver bullet, as if adding it automatically makes a model both smart and polite. Hah, as if!

Last year, I tried to replicate ChatGPT’s training pipeline in a project. What a disaster.

Step one: collect human feedback data. Just defining what counts as a “good response” had the team arguing for two weeks. Some wanted “detailed,” others insisted on “concise,” and a few demanded “a sense of humor.” In the end, everyone was frustrated.

Step two: train the reward model. And guess what? It came preloaded with bias—it decided that “positive” was good, so the model started blindly flattering users. Even when a user said, “I just got dumped,” it replied with “That’s great! It’s a new beginning!” Could you take that?

Step three: PPO reinforcement learning—tuning parameters to the point of doubting your existence. Learning rate, clipping range, advantage function window… every single parameter was a minefield. I spent three days straight tuning, and not only did the model not improve, it started spewing nonsense.

Chen Wei’s article cites papers like InstructGPT, PPO, and TAMER. I’d seriously recommend you dig into these if you plan to get your hands dirty. Especially PPO—don’t let the name “Proximal Policy Optimization” fool you. The clipping mechanism and advantage function design are as intricate as a Swiss watch; one misaligned gear and the whole thing falls apart.

There’s a detail Chen Wei didn’t elaborate on, but I have to shout it from the rooftops: the cost of RLHF is terrifying. To train a ChatGPT-level model, just the expense of human annotators alone is enough to make you weep. I heard that OpenAI—

陈巍:ChatGPT大模型技术精要— (English)

陈巍:ChatGPT大模型技术精要— (English)

From GPT-1 to ChatGPT: I Didn’t Just Watch It Grow Up—I Fell Flat on My Face Watching It

GPT-3’s In-Context Learning: I’ve Seen Both Its Magic and Its Absurdity

RLHF Isn’t a Miracle Cure: The Potholes I’ve Hit Are Deeper Than You Think

Cael Lee

Ready to get started?