I Made 500 API Calls to GPT-5 and Claude Opus: Here's the Real Cost
Everyone talks about pricing, but nobody shows the actual numbers. I ran 500 identical prompts through both models and tracked every cent. The results surprised me.
CaelLee
June 18, 2025 · 8 min read
TL;DR
- Claude Opus is ~30% cheaper than GPT-5 for coding tasks via third-party providers
- GPT-5 wins on short prompts (under 500 tokens output)
- Claude Opus dominates on long-context tasks (10K+ tokens)
- Third-party providers save 40-60% vs official APIs
The Setup
I didn't want to write another "X vs Y" article based on vibes. So I designed a proper benchmark: 500 API calls, split across 5 categories, with identical prompts sent to both models.
Test Categories
| Category | Prompts | Avg Input Tokens | Avg Output Tokens |
|---|---|---|---|
| Code Generation | 100 | 800 | 1,200 |
| Code Review | 100 | 2,500 | 800 |
| Debugging | 100 | 1,500 | 600 |
| Documentation | 100 | 500 | 2,000 |
| Complex Reasoning | 100 | 3,000 | 1,500 |
All calls were made through a third-party API provider to get real-world pricing that most developers actually pay. I used the same provider for both models to keep the comparison fair.
The Results: Raw Numbers
Here's what I actually spent across all 500 calls:
| Metric | GPT-5 | Claude Opus | Winner |
|---|---|---|---|
| Total Cost (500 calls) | $47.30 | $33.10 | Claude Opus |
| Avg Cost per Call | $0.095 | $0.066 | Claude Opus |
| Code Generation (100 calls) | $12.40 | $8.20 | Claude Opus |
| Code Review (100 calls) | $7.80 | $8.50 | GPT-5 |
| Debugging (100 calls) | $8.90 | $5.40 | Claude Opus |
| Documentation (100 calls) | $9.20 | $5.80 | Claude Opus |
| Complex Reasoning (100 calls) | $9.00 | $5.20 | Claude Opus |
| Avg Latency | 2.1s | 3.4s | GPT-5 |
What Surprised Me
I went into this expecting GPT-5 to be more expensive across the board. It wasn't that simple.
GPT-5 Wins on Short Outputs
For code review tasks — where the input is large but the output is relatively short — GPT-5 was actually cheaper. Its input pricing is competitive, and when you're not generating thousands of output tokens, the difference narrows.
Claude Opus Dominates Long Outputs
The moment output tokens exceed ~1,000, Claude Opus becomes significantly cheaper. Documentation and complex reasoning tasks, which require lengthy responses, showed the biggest gap — sometimes 40-50% cheaper.
Latency Tells a Different Story
GPT-5 was consistently faster. Average response time was 2.1 seconds vs Claude Opus's 3.4 seconds. For interactive coding tools like Cursor or Claude Code, this matters. A 1.3-second difference per call adds up over a full day of coding.
The Third-Party Provider Advantage
For context, here's what the same 500 calls would have cost through official APIs:
| Provider | GPT-5 Cost | Claude Opus Cost | Total |
|---|---|---|---|
| Official APIs | $78.50 | $55.00 | $133.50 |
| Third-Party Provider | $47.30 | $33.10 | $80.40 |
| Savings | 40% | 40% | $53.10 saved |
That's a 40% savings across the board. For a developer making thousands of API calls per month, this adds up to hundreds of dollars.
My Recommendation
After two weeks of testing, here's what I'd actually do:
- For coding-heavy workflows: Use Claude Opus. It's cheaper and produces better code in my experience.
- For quick code reviews: GPT-5 is slightly cheaper and faster.
- For documentation: Claude Opus, no contest. 37% cheaper and the quality is better.
- For interactive tools (Cursor, Cline): Consider GPT-5 if latency matters more than cost.
What I'd Do Differently
If I were running this benchmark again, I'd add:
- Quality scoring. Cost isn't everything — I should have rated output quality on a 1-5 scale.
- More providers. I only tested one third-party provider. Different providers have different pricing structures.
- Streaming vs non-streaming. Streaming responses might change the cost calculation for some providers.
CaelLee
Full-stack developer with 8+ years of experience. Currently researching AI-powered developer tools and API infrastructure.
View all posts →