你天天在填的那个 /v1/chat/completion (English)
你天天在填的那个 /v1/chat/completion (English)
Generated: 2026-06-23 14:32:58
---
Translate to English, keep the storytelling style:
This /v1/chat/completions you fill in every day — what makes it so special, really?
Hey, I've been building AI apps for over three years, and I've seen way too many people treat this endpoint like a magic spell. You configure it in your project, in Dify, in Cursor, and when you spin up a local vLLM — you configure it all over again. Over and over, it's the same fields: model, messages, temperature, max_tokens. But if you actually ask, "Where did this even come from?" most people — including me back then — can only scratch their heads.
When I first started using the OpenAI API, I was the same. I thought it was simple: just send a request and get a response. But after getting burned three times, I finally realized — there's a whole design philosophy hidden behind this endpoint. Once you understand it, you can seamlessly switch between 90% of the AI APIs out there.
What did the earliest LLM endpoint look like?
Take a guess. It was /v1/completions.
No chat in the name. The logic was so simple it was almost brutal — you feed it a chunk of text, and the model just keeps writing from there. That's it. No roles, no system messages, no conversation history. The model only sees a string of characters, and its only job is to guess the next token.
Want it to act like an assistant and chat with you? Fine. You had to manually piece all this together in the prompt:
System: You are a helpful assistant.
User: What's the weather like today?
Assistant: Let me check the weather data.
User: Will it rain in Beijing today?
It worked, but the problems were obvious. The model had to guess which sentence was the system instruction, which was the user, and which was its own previous reply. If it guessed right, fine. If it guessed wrong, the whole conversation logic collapsed. And forget about sending an image or having it call a tool — there was no clean way to express any of that in a plain text prompt.
So /v1/completions gradually took a back seat.
I remember testing this endpoint myself with an early version of GPT-3. I had to handcraft a table in the prompt telling the model to respond in JSON, and then pray it didn't mistake the system instruction for user input. Honestly, it felt like writing code in Notepad — technically possible, but every step made you want to swear.
Then came the real turning point.
OpenAI did something that had a massive impact on the entire industry — they made conversation structured. What structure? From one messy block of prompt text to an array of messages. Each message has a clear role: system for instructions, user for input, assistant for the model's previous responses, and tool for results from external calls.
This change? It looks like just adding one word: chat.
But think about it — nearly every approach to building AI apps shifted because of this.
Look at the request body now — so clean:
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a programming assistant."},
{"role": "user", "content": "Write a Python function for me."}
]
}
Each role has its place. The system instruction won't be mistaken for user input. The model knows that what it says is the "assistant speaking." And the user doesn't need to mark up their text with special tokens.
Once the context became a structured message array, you gained precise control — what the model sees, what role it takes, what order the information comes in — all up to you. Multi-turn conversations, persona settings, tool calls all have a clear foundation.
I honestly feel that when I first used this endpoint in early 2023 to build a chatbot, the biggest relief was: finally, no more manually stuffing System:, User:, Assistant: placeholders into the prompt! The code got cleaner, not just a little — I felt so much lighter.
Why did it become the de facto standard?
You know what? This protocol was eventually adopted by the whole industry.
OpenAI uses /v1/chat/completions. Mistral uses it. xAI uses it. DeepSeek uses /v1/chat/completions with full format compatibility. Groq uses /openai/v1/chat/completions. OpenRouter uses /api/v1/chat/completions.
The path prefix might differ, but the core protocol is the same.
This brings a hugely practical benefit: if you write your code with the OpenAI SDK and want to switch to another platform, often you only need to change three variables: baseurl, apikey, model_name. The rest of your SDK call code can stay untouched.
Last month, I was working on a project using OpenAI, but then I temporarily switched to Groq for testing. I changed the baseurl from https://api.openai.com/v1 to Groq's address, swapped the apikey, set the model to mixtral-8x7b-32768, ran three lines of test code — and everything passed! When I saw the test results, I paused for a second, then smiled.
Honestly, the reason /v1/chat/completions became the de facto standard is that it came early enough and the ecosystem spread fast enough. SDKs, IDEs, chat UIs, RAG frameworks — everything revolves around it. If someone wants to push a new protocol? They'd have to convince every toolchain developer to change their code. How hard is that? About as hard as getting everyone to switch from WeChat to another messaging app.
But there's something that must be said clearly.
Compatible doesn't mean perfectly compatible.
Over the last two years, I've tested APIs from over a dozen platforms and found a pattern: everyone claims they're OpenAI-compatible, but the level of compatibility varies a lot. Basic text chat works almost everywhere. Streaming output works for most. But when it comes to tools (function calling), response_format for structured output, vision (image input), audio input — these capabilities are all over the place.
I've run into cases where, on Platform A, the tool call parameters look exactly like OpenAI's; on Platform B, there's an extra field in the tool call response; on Platform C, image input throws an error straight away.
The worst kind? Some platforms write flowery documentation, but when you actually run it, it's a different story. I got stung badly by
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.