OpenAI罕见发论文:我们找到了AI幻觉的罪魁祸首 (English)
OpenAI罕见发论文:我们找到了AI幻觉的罪魁祸首 (English)
Generated: 2026-06-23 10:15:37
---
I Got Fooled by AI for an Entire Afternoon, Then It Forced Me to Read OpenAI's 37-Page Paper
Have you ever been fooled by AI?
I just got a painful lesson last week.
I was writing a historical article about some obscure technology, and casually asked ChatGPT: "What speech did XXX give at GTC 2019?"
It answered instantly.
The kind of answer that's super confident, super smooth—time, place, person, all there, even the title was fabricated.
I copied it straight into my draft, feeling pretty pleased with myself. Efficiency, baby. It just hits different.
Then, when I went to verify the sources, I was completely stunned.
That speech? It. Never. Existed.
It had spliced together content from 2020 with a different speaker, making it sound more real than the actual thing.
At that moment, I seriously wanted to smash my keyboard.
So when OpenAI's paper Why Language Models Hallucinate started trending, I dropped everything, downloaded it, and read it cover to cover.
What was my first reaction?
Wow. So you guys knew all along.
---
Hallucination? It's the Exam System's Fault
There was one line in the paper I read three times, slapping my thigh each time.
**"Standard training and evaluation procedures tend to reward guessing over acknowledging uncertainty."**
Got it?
In plain English: every time we score AI on accuracy, we're essentially forcing it to make stuff up.
Think about our own exams.
Multiple choice all the way. You're not sure. Leave it blank or take a guess?
Leave it blank? Zero points.
Take a guess? Maybe you get lucky!
AI is trained smart. You evaluate it on benchmarks—right answers get points, wrong answers lose points, saying "I don't know" is a zero.
What does it choose?
Of course it gambles.
Author Adam Tauman Kalai gave an example that completely sold me:
You ask the model, "What day is someone's birthday?" It doesn't know.
What does it do?
It just randomly guesses September 10.
Why? Because there's a 1 in 365 chance it's right.
But if it honestly says "I'm not sure"?
Straight zero.
Fair? I don't think so.
---
I Ran an Experiment Right Then and There
I tested this on a few models.
Guess what happened?
| I asked | GPT-4o | Claude 3.5 Sonnet | Truth |
|---|
| What is the title of Adam Tauman Kalai's PhD thesis? | Learning Hierarchical Representations | Efficient Algorithms for... | All wrong! The actual title is A Theoretical Study of... |
|---|
| What is his birthday? | March 22, 1984 | November 14, 1983 | None correct! |
|---|
| How many D's are in DEEPSEEK? | 2 | 3 | Should be 3 (D-E-E-P-S-E-E-K) |
|---|
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.