OpenAI罕见发论文:我们找到了AI幻觉的罪魁祸首 (English)

Generated: 2026-06-23 10:15:37

---

I Got Fooled by AI for an Entire Afternoon, Then It Forced Me to Read OpenAI's 37-Page Paper

Have you ever been fooled by AI?

I just got a painful lesson last week.

I was writing a historical article about some obscure technology, and casually asked ChatGPT: "What speech did XXX give at GTC 2019?"

It answered instantly.

The kind of answer that's super confident, super smooth—time, place, person, all there, even the title was fabricated.

I copied it straight into my draft, feeling pretty pleased with myself. Efficiency, baby. It just hits different.

Then, when I went to verify the sources, I was completely stunned.

That speech? It. Never. Existed.

It had spliced together content from 2020 with a different speaker, making it sound more real than the actual thing.

At that moment, I seriously wanted to smash my keyboard.

So when OpenAI's paper Why Language Models Hallucinate started trending, I dropped everything, downloaded it, and read it cover to cover.

What was my first reaction?

Wow. So you guys knew all along.

---

Hallucination? It's the Exam System's Fault

There was one line in the paper I read three times, slapping my thigh each time.

**"Standard training and evaluation procedures tend to reward guessing over acknowledging uncertainty."**

Got it?

In plain English: every time we score AI on accuracy, we're essentially forcing it to make stuff up.

Think about our own exams.

Multiple choice all the way. You're not sure. Leave it blank or take a guess?

Leave it blank? Zero points.

Take a guess? Maybe you get lucky!

AI is trained smart. You evaluate it on benchmarks—right answers get points, wrong answers lose points, saying "I don't know" is a zero.

What does it choose?

Of course it gambles.

Author Adam Tauman Kalai gave an example that completely sold me:

You ask the model, "What day is someone's birthday?" It doesn't know.

What does it do?

It just randomly guesses September 10.

Why? Because there's a 1 in 365 chance it's right.

But if it honestly says "I'm not sure"?

Straight zero.

Fair? I don't think so.

---

I Ran an Experiment Right Then and There

I tested this on a few models.

Guess what happened?

I asked	GPT-4o	Claude 3.5 Sonnet	Truth

What is the title of Adam Tauman Kalai's PhD thesis?	Learning Hierarchical Representations	Efficient Algorithms for...	All wrong! The actual title is A Theoretical Study of...

What is his birthday?	March 22, 1984	November 14, 1983	None correct!

See the pattern?

Every answer came across as totally confident. The tone was as certain as can be.

If you don't verify it yourself, you'll be led down the garden path in no time.

Confident. Certain. Internally consistent.

But—

Completely wrong.

---

Here's the Kicker: Even Clean Training Data Doesn't Fix It

The paper also revealed a fact that left me speechless:

**Even if the training data is completely clean, the model will still hallucinate.**

They simplified the generation task into a binary classification problem called IIV.

Basically, judging whether a sentence is correct.

And what did they find?

When the pattern of the problem itself can't be separated by a straight line—like counting how many times a letter appears in a word, which requires exact counting—the model develops systematic errors during pre-training.

Think about it.

What does the model learn?

It learns probability distributions.

It sees the word "DEEPSEEK" appear millions of times in the corpus, but it's never asked to count the D's.

Now you ask it to count. It can only guess.

Because statistically, "3" appears in similar contexts about as often as "2."

It picks one at random and goes for it.

It's like...

Asking someone who's never seen a math problem to guess what 1 + 1 is.

They might say 2, or they might say 3.

Because they know "1" and "2" often appear together, but they have no idea what arithmetic actually is.

It's the same with the model.

It "reasons" based on correlations between word vectors—it doesn't actually have the ability to compute.

I took the "count the D's in DEEPSEEK" example from the paper and asked DeepSeek V3.

Holy cow. It gave two different answers: once 2, once 3.

The. Same. Model.

Temperature set to zero, and the answers still drifted.

---

Guess What OpenAI Thinks About It?

Here's where it gets interesting.

Around the time the paper came out, OpenAI did an internal reorganization.

The Model Behavior team, which was responsible for defining the model's "personality," was entirely merged into the Post Training group.

The former head, Joanne Jang, was moved to a new department called OAI Labs—focused on building new kinds of human-AI interfaces.

This woman had just made the Time AI 100 list of top thinkers, ranking higher than Turing Award winner Yoshua Bengio.

And the same day, she was reassigned.

Her own words: "Surreal."

Why this shift?

Look at what the Model Behavior team was doing before—essentially "tuning model behavior."

But—

If your fundamental evaluation mechanism is designed to reward guessing,

then no matter how much post-training you do, the model will still be pushed toward hallucination.

Fixing the individual isn't enough. You have to change the whole exam system.

So what does OAI Labs want to do?

Design new ways of interacting.

Give the model room to say "I don't know."

Instead of always being forced to choose between "guess" and "lose points."

---

So, Is OpenAI's Solution Any Good?

Honestly? This time OpenAI didn't overpromise.

Their core suggestion was very concrete, very practical:

Introduce confidence thresholds into the scoring rules of mainstream benchmarks.

Here's how it works—

Tell the model: "Only answer when your confidence is above a certain threshold. Say 'I don't know' otherwise."

Then set the scoring rules clearly: getting it wrong costs more points than saying you don't know.

That way, the model's optimal strategy shifts from "gambling" to "being honest."

Sounds about right, doesn't it?

But—

Changing benchmark standards isn't something OpenAI can decide alone.

The academic community and industry have been using accuracy-based evaluations for decades. You can't just switch overnight.

Plus, some evaluation tasks don't even allow a "don't know."

Like multiple-choice sets where options A, B, C, D are fixed. How does it express uncertainty?

It's forced to pick.

A few other points I found especially interesting:

First, the paper specifically notes that RAG (Retrieval-Augmented Generation) can't completely eliminate hallucination. Why? Because if the search also fails to find an answer, under the current scoring system, the model will still fabricate. And for intrinsic hallucinations like counting digits or doing math, search doesn't help either.

Second, they suggest that in LM-as-judge scenarios, you should also score "reasonable expressions of uncertainty" higher than "confidently wrong long-winded answers." This directly challenges many current automated evaluation practices—most judges tend to reward information-rich responses, even if they contain errors.

Third—and most importantly—the paper does not claim to have solved hallucination.

It only points out the root cause and possible mitigation directions.

Some domestic media ran headlines like "OpenAI Cracks the Hallucination Problem," but the original paper says nothing of the sort.

I went and read the original paper and blog post.

The tone is extremely restrained.

It even acknowledges: > "Hallucination remains a fundamental challenge for all large language models."

---

So, What Do I Do Now?

This paper reinforced my own thinking.

Don't treat AI like a knowledge base.

**Treat it like an intern who talks a great game but

How many D's are in DEEPSEEK?	2	3	Should be 3 (D-E-E-P-S-E-E-K)

OpenAI罕见发论文:我们找到了AI幻觉的罪魁祸首 (English)

OpenAI罕见发论文:我们找到了AI幻觉的罪魁祸首 (English)

I Got Fooled by AI for an Entire Afternoon, Then It Forced Me to Read OpenAI's 37-Page Paper

Hallucination? It's the Exam System's Fault

I Ran an Experiment Right Then and There

Here's the Kicker: Even Clean Training Data Doesn't Fix It

Guess What OpenAI Thinks About It?

So, Is OpenAI's Solution Any Good?

So, What Do I Do Now?

Cael Lee

Ready to get started?