国内AI大模型已近80个,哪个最有前途? (English)
国内AI大模型已近80个,哪个最有前途? (English)
Generated: 2026-06-20 15:06:39
---
80 AI Models? Don't Let Them Dazzle You. The One I'm Betting On Can Even Run on Your Phone.
Last month, at a small tech gathering, someone threw out a question: "There are almost 80 domestic AI models now—which one has the most potential?"
The room exploded.
Someone whipped out their phone and demoed Doubao writing a weekly report. Another guy swore DeepSeek was the best for coding. Someone else whispered mysteriously, "You have to look at the SuperCLUE rankings—that's the real authority."
I sat there watching them argue, faces red, and thought one thing:
You're not even arguing about the same thing.
The person who asked meant "which has the most future potential," but everyone was busy comparing "who has the highest score right now." I think the question itself was off from the start.
So I said something that made the whole room go quiet:
"My answer is Qwen (Tongyi Qianwen). Not because it's the best at everything, but because it's the least like a 'model.' It's more like a foundation that's quietly spreading."
Everyone froze. Not convinced? Fine—here are four things I've seen with my own eyes.
---
First: Qwen Plays with Size Like Nobody Else
Everyone else is pushing their flagship models, bragging about how great they are.
What's Qwen doing?
It's released everything from 0.5B, 1.5B, 32B to MoE models—including a production version that can actually run on your phone!
I tried it myself: I quantized the Qwen2.5 0.5B model and stuffed it into a five-year-old Android phone. Casual chat, summarization—ran smooth as butter!
As for the 1.5B version? Ran on a Raspberry Pi.
You might think: so what? Wrong. This isn't about technology—it's about strategy.
Full coverage from edge to cloud? Domestically, Qwen is the only one pulling it off this thoroughly. For developers, if you need offline capabilities on device, you don't have to switch tech stacks. And when you go to the cloud for complex tasks, it's the same model family. Huge time saver.
DeepSeek is indeed strong at reasoning. But have you ever seen DeepSeek release a 0.5B mini model? No. They're betting on sharpness, not coverage.
One plays Go. The other throws knives. Completely different games.
---
Second: Open Source Isn't Just a Gimmick—You Can Actually Use It
Last year, a friend of mine was building a customer service system for a small- to medium-sized enterprise. The client hit him with a deal-breaker right off the bat:
"The model has to be deployed locally. Data absolutely cannot leak."
Fine. Open-source model it is.
He tried a few: GLM-4 from Zhipu had decent agent capabilities, but the documentation and community support... let's just say it was like walking through a door and finding a minefield behind it.
Then he found Qwen.
From downloading Qwen2.5 weights to deploying with vLLM—just a few lines of code. And when he asked questions in the group, he'd usually get a reply the same day!
He later told me: "Open-source models aren't just about the tech anymore—it's also about community operations."
Think about it: a model with only a paper and no ecosystem means developers have to fill in the gaps themselves. It's like buying a car and finding out there's only one dealership in the whole world, two thousand kilometers away.
But Qwen has Alibaba Cloud behind it. The ecosystem iteration is way more sustainable than what a small team can offer.
Open source isn't just about throwing your code out there. It's about making every developer feel like "someone's in the trenches with me."
---
Third: This "Specialist" Is Actually a Well-Rounded All-Rounder
I ran a silly experiment—took the same 500-word technical document in English and had several models translate it into Chinese.
Here's what I got:
- Qwen2.5-72B: The most natural translation. Not a single technical term wrong, and even the sentence structures catered to how Chinese is actually written.
- DeepSeek-V3: Also decent, but some parts were too literal—had a bit of that "translationese" feel.
- GPT-4o: Correct, but not down-to-earth.
- Claude: Pretty similar.
My sample wasn't huge, but it matched some online evaluations for Chinese translation—Qwen is definitely top-tier for Chinese translation.
How about coding?
Well, I'll admit, Qwen can't beat DeepSeek there. In my test of implementing an "LRU cache with expiration time," DeepSeek wrote the most complete solution.
But here's the thing: for everyday development tasks, Qwen is good enough. The gap isn't large enough to be a dealbreaker.
Creative writing? Qwen is a bit weaker. But for daily Q&A, logical reasoning... the gap is practically negligible.
Qwen is like a well-balanced student scoring 85-90 in every subject. DeepSeek is the one who scores 99 in math but 80 in Chinese.
If all you do is write code, go with DeepSeek. But if you need to handle all sorts of random tasks—write a report today, translate a document tomorrow, build a customer service system the day after—who do you pick?
---
Fourth: Qwen Has a Natural Advantage for the "Last Mile" of Enterprise Deployment
Benchmarks are useless if the model can't actually be deployed.
A lot of companies are finding out that their biggest headache isn't that the model isn't smart enough—it's that they don't know how to plug it into their existing systems.
Writing 800 lines of code just to connect to ERP? Getting up at midnight to fix API docs? Just thinking about it is exhausting.
But Qwen has Alibaba Cloud and DingTalk backing it.
DingTalk's AI assistant is already deeply integrated with Tongyi Qianwen—organizational structures, approval workflows, schedules—all connected out of the box. If a company wants to build an internal knowledge base, it takes just a few minutes to set up.
DeepSeek and Zhipu can't do this nearly as smoothly. Why? Because they don't have their own office ecosystem or cloud platform.
I know someone will bring up ByteDance's Doubao: great product capabilities, massive user base on Douyin.
True. Doubao does rank among the top domestically for application capabilities, with strong precise instruction following and hallucination control.
But look closely at the evaluations—Doubao's reasoning ability is a bit weaker than DeepSeek and Qwen.
In the short term, good product experience can attract users. But in the long run? If the model's intelligence isn't deep enough, it'll eventually hit a ceiling.
ByteDance is great at engineering, but in the end, AI is about intelligence, not user numbers.
---
Let's Talk About the Opposition
But hold on, I know what you're thinking.
"DeepSeek-V3 has a crushing lead in reasoning! Who would pick anything else?"
That data is indeed from the latest SuperCLUE results—I'll give them that.
But think about it: how long will that high score last? DeepSeek's strength is reasoning, but once GPT-4o—or even stronger models in the future—catch up on reasoning, the gap shrinks. Also, DeepSeek doesn't have a complete ecosystem like Alibaba Cloud to support it. Its commercialization path is relatively narrow. On the C-end app front, its influence is far behind Doubao and Kimi. On the B-end, it's not as strong as Qwen or Wenxin with their established customer bases.
If you only look at scores without considering the survival environment, you're only looking at tech, not business.
"But isn't Kimi's long-text capability unique? Two million characters of input!"
Kimi's long-text advantage is real—I occasionally use it for reading papers and contracts.
But here's the problem: a single feature is too easily replicated. Now Qwen, DeepSeek, and others all support million-token contexts. Kimi's user peak was in the second half of 2024, and its share has been slowly eroded since then.
Relying on just one feature—long text—can't
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.