I Made 500 API Calls to GPT-5 and Claude Opus: Here's the Real Cost

The Setup

I didn't want to write another "X vs Y" article based on vibes. So I designed a proper benchmark: 500 API calls, split across 5 categories, with identical prompts sent to both models.

Test Categories

Category	Prompts	Avg Input Tokens	Avg Output Tokens
Code Generation	100	800	1,200
Code Review	100	2,500	800
Debugging	100	1,500	600
Documentation	100	500	2,000
Complex Reasoning	100	3,000	1,500

All calls were made through a third-party API provider to get real-world pricing that most developers actually pay. I used the same provider for both models to keep the comparison fair.

The Results: Raw Numbers

Here's what I actually spent across all 500 calls:

Metric	GPT-5	Claude Opus	Winner
Total Cost (500 calls)	$47.30	$33.10	Claude Opus
Avg Cost per Call	$0.095	$0.066	Claude Opus
Code Generation (100 calls)	$12.40	$8.20	Claude Opus
Code Review (100 calls)	$7.80	$8.50	GPT-5
Debugging (100 calls)	$8.90	$5.40	Claude Opus
Documentation (100 calls)	$9.20	$5.80	Claude Opus
Complex Reasoning (100 calls)	$9.00	$5.20	Claude Opus
Avg Latency	2.1s	3.4s	GPT-5

What Surprised Me

I went into this expecting GPT-5 to be more expensive across the board. It wasn't that simple.

GPT-5 Wins on Short Outputs

For code review tasks — where the input is large but the output is relatively short — GPT-5 was actually cheaper. Its input pricing is competitive, and when you're not generating thousands of output tokens, the difference narrows.

Claude Opus Dominates Long Outputs

The moment output tokens exceed ~1,000, Claude Opus becomes significantly cheaper. Documentation and complex reasoning tasks, which require lengthy responses, showed the biggest gap — sometimes 40-50% cheaper.

Latency Tells a Different Story

GPT-5 was consistently faster. Average response time was 2.1 seconds vs Claude Opus's 3.4 seconds. For interactive coding tools like Cursor or Claude Code, this matters. A 1.3-second difference per call adds up over a full day of coding.

The Third-Party Provider Advantage

For context, here's what the same 500 calls would have cost through official APIs:

Provider	GPT-5 Cost	Claude Opus Cost	Total
Official APIs	$78.50	$55.00	$133.50
Third-Party Provider	$47.30	$33.10	$80.40
Savings	40%	40%	$53.10 saved

That's a 40% savings across the board. For a developer making thousands of API calls per month, this adds up to hundreds of dollars.

My Recommendation

After two weeks of testing, here's what I'd actually do:

For coding-heavy workflows: Use Claude Opus. It's cheaper and produces better code in my experience.
For quick code reviews: GPT-5 is slightly cheaper and faster.
For documentation: Claude Opus, no contest. 37% cheaper and the quality is better.
For interactive tools (Cursor, Cline): Consider GPT-5 if latency matters more than cost.

What I'd Do Differently

If I were running this benchmark again, I'd add:

Quality scoring. Cost isn't everything — I should have rated output quality on a 1-5 scale.
More providers. I only tested one third-party provider. Different providers have different pricing structures.
Streaming vs non-streaming. Streaming responses might change the cost calculation for some providers.