预训练大语言模型的三种微调技术总结:fine-tuning、parameter-efficient fine-tuning和prompt-tuning的介绍和对比 (English)

Generated: 2026-06-20 09:15:22

Alright, no problem! Leave it to me—I’ll give you that kind of writing that makes people say, “I can’t stop reading, and I have to share it.”

You know what? Taming a top-tier AI might now be easier than learning a new phone app!

Think about it—over the years, we’ve all been bombarded with news about “artificial intelligence.” It’s always something like “a model with hundreds of billions of parameters” or “trillions of parameters,” sounds intimidating. Everyone thinks this thing is a performance-hungry beast that requires piles of GPUs and sky-high electricity bills just to get it to listen to you.

But today I’m going to tell you something counterintuitive: The biggest, most expensive models, in the hands of the most skilled users, can be completely transformed just by tweaking a few “prompts”—at a cost so low it’ll blow your mind!

Ever since the groundbreaking BERT model in 2018, the way we "tame" AI has gone through a three-dimensional evolutionary revolution. Each step has been like moving from “moving mountains with sheer manpower” to “using a tiny lever to lift a huge weight.”

Speaking of which, let me first tell you the story of the very first version.

1. That “Clumsy and Temperamental” Primitive Era: Full Fine-Tuning

Back in 2018, BERT burst onto the scene, like a top student fresh out of a prestigious university—brimming with talent but just waiting to be “molded.” How do you mold it? The traditional method is full fine-tuning.

You see, it feels like asking a Michelin-star chef to learn how to make your hometown’s cold noodles. Normally, you’d have to swap out his entire kitchen—knives, pots, everything—and make him relearn heat control, knife skills, seasoning… Isn’t that clumsy and troublesome? He’d be tearing the kitchen apart!

That’s exactly how it was! Take BERT-Large, for example. It has 340 million parameters. To teach it just a simple “sentiment analysis” task, you’d need 4 to 8 V100 GPUs (each costing tens of thousands of dollars), and the memory usage would shoot straight past 16GB!

And the result? Sure, it aced all 11 tasks. But each time you had a new task, you had to copy the entire model (340 million parameters!) from scratch. If you had dozens of tasks, the cost grew linearly. Think about it—what company could afford to burn cash like that? The industry was in a state of despair.

So people started wondering: Isn’t there a way to not touch the chef’s core skills, and just give him a better knife?

2. A Smarter Approach than “Franken-modding”: Parameter Efficient Fine-Tuning (PEFT)

Fast forward to 2021, and the tech world suddenly had an epiphany: Why mess with everything? Just freeze the model and plug in a few “small add-ons”!

That’s parameter efficient fine-tuning, or PEFT. Its principle is simple and brutal: Keep the model body untouched, and only train 0.1% to 1% of extra parameters. The result? Compared to full fine-tuning, the performance gap is only 0.5% to 5%!

Let me tell you about three of the coolest examples, so you can feel it:

Adapter is like adding a “plug-in box” inside each layer of the Transformer. On BERT-Base, adding just 3.6% of the “box” parameters achieved over 95% of the performance of full fine-tuning! Pretty impressive, right?

LoRA is even more amazing. It discovered that when updating parameters, you don’t need to make a big fuss—you just do a “low-rank decomposition” (simply put, a “dimensionality reduction attack”) on the weight update matrix. On that monster GPT-3 with 175 billion parameters, it trained only 0.5% of the parameters (about 80 million), slashing memory from 1.2 TB down to 350 GB, and cutting training time by 70%!

Guess what? After ChatGPT took off, LoRA became the shining star. Some developer tried to use it to train a model with a “Taoist philosophy” style. Running 138 rounds using OpenAI’s API, the total cost was only $0.09! You heard that right—nine cents! The efficiency is just mind-blowing.

Prefix Tuning is even more “sneaky.” Instead of touching weights, it “fabricates” a learnable prefix at the beginning of the input sequence (like putting a “mental imprint” on the model). With GPT-2, using only 1.2% of the parameters of full fine-tuning, it actually performed better—on the E2E dataset, the BLEU score was even 0.8% higher!

See? It’s like training a gifted athlete: instead of making him relearn how to run, you just give him a perfectly fitted pair of sports glasses or a specific pair of running shoes. The change is tiny, but the effect is astonishing!

3. The Ultimate Lightweight: I Don’t Even Enter the Model (Prompt Tuning)

Friends, if you think PEFT is already amazing, what comes next is like “gods playing chess”—Prompt Tuning.

At least PEFT goes inside the model to plug in add-ons. Prompt Tuning goes further: I don’t even step through your door! I don’t modify a single weight of your model. I just play with the “prompts” that I feed into the model.

What it learns is not the model, but a “tweak” on a few dozen token vectors. How small are those parameters? Using BERT as an example, it’s the size of a regular photo on your phone (38 KB), compared to GPT-3’s 175 billion parameters—less than a drop in the bucket.

Feels a bit counterintuitive, doesn’t it? Everyone thinks you have to mess with the machinery to get work done, but it turns out that just polishing the “slogan” you shout can make it work better!

For instance, researchers created something called P-Tuning. In few-shot scenarios, using plain Prompt Tuning, they actually beat traditional full fine-tuning! On the tougher SuperGLUE benchmark, it used only 0.1% of the storage space of full fine-tuning, with a performance gap controlled within 3%! That’s like someone spending a hundred million on ads, while you only spend a hundred thousand on a better press release—and the results are almost the same!

The logic behind this is sharp: If the model is powerful enough, it’s already a treasure mountain. You don’t need to dig. You just need to know how to shout the most precise “Open Sesame” command at that mountain.

4. Three Pillars, Which One Is Your “Destiny Technique”?

At this point, you might be wondering: So which one should I choose? Let me give you a quick comparison table, crystal clear:

Dimension	Full Fine-Tuning	Parameter Efficient Fine-Tuning	Prompt Tuning

How many parameters changed?	All of them! 100%	A drizzle! 0.1%-1%	Not a single one! 0% (only input)

How big is storing a model?	Several buildings! 300 MB - 1.5 TB	A USB stick! 1-10 MB	A sticky note! <1 MB

Training power consumption?	Full throttle craziness	Energy-saving mode	Almost none extra

Can it perform?	Baseline (perfect score)	95% - 99%, near perfect	90% - 95%, enough but slightly behind

So you see, the whole story is an evolution of “cost reduction and efficiency improvement”:

Full fine-tuning: The pioneer, but too expensive, the standard for academia.
PEFT (especially LoRA): The de facto standard in industry, because it’s cheap and effective. Almost all the fine-tuning behind ChatGPT is a variant of it.
Prompt Tuning: The “ultimate form” of the future, especially in the Model as a Service (MaaS) era, where you can unlock maximum model capability at minimal cost without touching the model itself.

5. Where Is the Future?

Now, models have grown to trillions of parameters (like GPT-4, PaLM 2). Making these be

Best suited for?	Rich folks only, huge data, demand 100%	Industrial workhorse, multi-task, limited resources	Creative work, prompt engineering, fast prototyping

预训练大语言模型的三种微调技术总结:fine-tuning、parameter-efficient fine-tuning和prompt-tuning的介绍和对比 (English)

预训练大语言模型的三种微调技术总结:fine-tuning、parameter-efficient fine-tuning和prompt-tuning的介绍和对比 (English)

You know what? Taming a top-tier AI might now be easier than learning a new phone app!

1. That “Clumsy and Temperamental” Primitive Era: Full Fine-Tuning

2. A Smarter Approach than “Franken-modding”: Parameter Efficient Fine-Tuning (PEFT)

3. The Ultimate Lightweight: I Don’t Even Enter the Model (Prompt Tuning)

4. Three Pillars, Which One Is Your “Destiny Technique”?

5. Where Is the Future?

Cael Lee

Ready to get started?