Home / Blog / 预训练大语言模型的三种微调技术总结:fine-tuning、parameter-efficient ...

预训练大语言模型的三种微调技术总结:fine-tuning、parameter-efficient fine-tuning和prompt-tuning的介绍和对比 (English)

By CaelLee | | 6 min read

预训练大语言模型的三种微调技术总结:fine-tuning、parameter-efficient fine-tuning和prompt-tuning的介绍和对比 (English)

Generated: 2026-06-20 09:15:22

Alright, no problem! Leave it to me—I’ll give you that kind of writing that makes people say, “I can’t stop reading, and I have to share it.”

You know what? Taming a top-tier AI might now be easier than learning a new phone app!

Think about it—over the years, we’ve all been bombarded with news about “artificial intelligence.” It’s always something like “a model with hundreds of billions of parameters” or “trillions of parameters,” sounds intimidating. Everyone thinks this thing is a performance-hungry beast that requires piles of GPUs and sky-high electricity bills just to get it to listen to you.

But today I’m going to tell you something counterintuitive: The biggest, most expensive models, in the hands of the most skilled users, can be completely transformed just by tweaking a few “prompts”—at a cost so low it’ll blow your mind!

Ever since the groundbreaking BERT model in 2018, the way we "tame" AI has gone through a three-dimensional evolutionary revolution. Each step has been like moving from “moving mountains with sheer manpower” to “using a tiny lever to lift a huge weight.”

Speaking of which, let me first tell you the story of the very first version.

1. That “Clumsy and Temperamental” Primitive Era: Full Fine-Tuning

Back in 2018, BERT burst onto the scene, like a top student fresh out of a prestigious university—brimming with talent but just waiting to be “molded.” How do you mold it? The traditional method is full fine-tuning.

You see, it feels like asking a Michelin-star chef to learn how to make your hometown’s cold noodles. Normally, you’d have to swap out his entire kitchen—knives, pots, everything—and make him relearn heat control, knife skills, seasoning… Isn’t that clumsy and troublesome? He’d be tearing the kitchen apart!

That’s exactly how it was! Take BERT-Large, for example. It has 340 million parameters. To teach it just a simple “sentiment analysis” task, you’d need 4 to 8 V100 GPUs (each costing tens of thousands of dollars), and the memory usage would shoot straight past 16GB!

And the result? Sure, it aced all 11 tasks. But each time you had a new task, you had to copy the entire model (340 million parameters!) from scratch. If you had dozens of tasks, the cost grew linearly. Think about it—what company could afford to burn cash like that? The industry was in a state of despair.

So people started wondering: Isn’t there a way to not touch the chef’s core skills, and just give him a better knife?

2. A Smarter Approach than “Franken-modding”: Parameter Efficient Fine-Tuning (PEFT)

Fast forward to 2021, and the tech world suddenly had an epiphany: Why mess with everything? Just freeze the model and plug in a few “small add-ons”!

That’s parameter efficient fine-tuning, or PEFT. Its principle is simple and brutal: Keep the model body untouched, and only train 0.1% to 1% of extra parameters. The result? Compared to full fine-tuning, the performance gap is only 0.5% to 5%!

Let me tell you about three of the coolest examples, so you can feel it:

Guess what? After ChatGPT took off, LoRA became the shining star. Some developer tried to use it to train a model with a “Taoist philosophy” style. Running 138 rounds using OpenAI’s API, the total cost was only $0.09! You heard that right—nine cents! The efficiency is just mind-blowing.

See? It’s like training a gifted athlete: instead of making him relearn how to run, you just give him a perfectly fitted pair of sports glasses or a specific pair of running shoes. The change is tiny, but the effect is astonishing!

3. The Ultimate Lightweight: I Don’t Even Enter the Model (Prompt Tuning)

Friends, if you think PEFT is already amazing, what comes next is like “gods playing chess”—Prompt Tuning.

At least PEFT goes inside the model to plug in add-ons. Prompt Tuning goes further: I don’t even step through your door! I don’t modify a single weight of your model. I just play with the “prompts” that I feed into the model.

What it learns is not the model, but a “tweak” on a few dozen token vectors. How small are those parameters? Using BERT as an example, it’s the size of a regular photo on your phone (38 KB), compared to GPT-3’s 175 billion parameters—less than a drop in the bucket.

Feels a bit counterintuitive, doesn’t it? Everyone thinks you have to mess with the machinery to get work done, but it turns out that just polishing the “slogan” you shout can make it work better!

For instance, researchers created something called P-Tuning. In few-shot scenarios, using plain Prompt Tuning, they actually beat traditional full fine-tuning! On the tougher SuperGLUE benchmark, it used only 0.1% of the storage space of full fine-tuning, with a performance gap controlled within 3%! That’s like someone spending a hundred million on ads, while you only spend a hundred thousand on a better press release—and the results are almost the same!

The logic behind this is sharp: If the model is powerful enough, it’s already a treasure mountain. You don’t need to dig. You just need to know how to shout the most precise “Open Sesame” command at that mountain.

4. Three Pillars, Which One Is Your “Destiny Technique”?

At this point, you might be wondering: So which one should I choose? Let me give you a quick comparison table, crystal clear:

DimensionFull Fine-TuningParameter Efficient Fine-TuningPrompt Tuning
How many parameters changed?All of them! 100%A drizzle! 0.1%-1%Not a single one! 0% (only input)
How big is storing a model?Several buildings! 300 MB - 1.5 TBA USB stick! 1-10 MBA sticky note! <1 MB
Training power consumption?Full throttle crazinessEnergy-saving modeAlmost none extra
Can it perform?Baseline (perfect score)95% - 99%, near perfect90% - 95%, enough but slightly behind

So you see, the whole story is an evolution of “cost reduction and efficiency improvement”:

5. Where Is the Future?

Now, models have grown to trillions of parameters (like GPT-4, PaLM 2). Making these be

Best suited for?Rich folks only, huge data, demand 100%Industrial workhorse, multi-task, limited resourcesCreative work, prompt engineering, fast prototyping
C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free