如何评价 OpenAI 最新发布的 GPT-5.5 模型? - (English)
如何评价 OpenAI 最新发布的 GPT-5.5 模型? - (English)
Generated: 2026-06-20 08:36:17
- -
Have you ever read the story of Tian Ji's horse race?
During the Warring States period, he pitted his inferior horses against the superior ones and won the race.
Today, OpenAI is using the same trick.
1. A Seeming Sweep
The official scorecard: coding 82.7%, finance 88.5%, math jumped from 65.4% to 81.2%.
Doctoral-level reasoning 85.6%, scientific charts also improved.
Any way you look at it, it's a big win.
But here's the thing—these tests were all chosen by OpenAI themselves.
Claude never took the same exams.
The phrase "Tian Ji's horse race" says it all—everyone knows what that means.
2. The Real Progress Hides Behind the Scores
Speaking of which, here's something counterintuitive.
You think the biggest upgrade is the scores? Actually, it's not.
It's "reliability."
The variation across multiple passes is only 3.2%.
In other words, the model is no longer a card draw.
Ask it ten times, and the answers are basically the same.
That's crucial.
Capability is continuous, but trust is not.
Once the error probability drops to a certain point, your behavior changes:
from "let it help me think" to "let it run first, and I'll review the output at the end."
That's what a real qualitative change looks like.
3. Cheaper,
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.