Home / Blog / “评测即科学”:首篇大语言模型评测的综述,一文带你了解大 (English)

“评测即科学”:首篇大语言模型评测的综述,一文带你了解大 (English)

By CaelLee | | 1 min read

“评测即科学”:首篇大语言模型评测的综述,一文带你了解大 (English)

Generated: 2026-06-23 06:53:13

---

Alright, let me first walk you through the facts, and then I'll rewrite it properly.

A few things need to be corrected:

  1. About "A Survey on Evaluation of Large Language Models" being the first survey in large model evaluation — that's too absolute. There were earlier surveys (like Chang et al.'s 2023 survey). Better to say "one of the earlier systematic surveys in this field."
  2. "Our open-source project maintains a bunch of evaluation benchmarks: AlpacaEval, HELM, Big-Bench, and our own PromptBench" — you don't actually maintain those benchmarks in your project; you cited and compiled them in your survey. It's more accurate to say "we have curated many evaluation benchmarks in our open-source project."
  3. "Our team's PromptBench" — PromptBench originally came from Microsoft and Peking University. If you weren't a core author but contributed, you could say "I've also worked on PromptBench" or keep it vague as "the PromptBench benchmark
C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free