If you’re building with the Claude API, pricing will be one of your first questions. The good news: it’s entirely token-based, predictable, and — with the right techniques — very affordable even at scale. This guide breaks down exactly what you’ll pay and how to minimize it.
How Claude API Pricing Works
Claude charges per token — roughly 4 characters of English text. Every API call has two costs:
- Input tokens — your prompt, system message, and conversation history
- Output tokens — the model’s response (typically 3–5× more expensive)
Prices are listed per 1 million tokens (MTok).
Claude Model Pricing (May 2025)
| Model | Input (per MTok) | Output (per MTok) | Best for |
|---|---|---|---|
| Claude Haiku 4.5 | $0.80 | $4.00 | High-volume, simple tasks |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Balanced — most projects |
| Claude Opus 4.7 | $15.00 | $75.00 | Complex reasoning, low volume |
Rule of thumb: Start with Sonnet. Switch to Haiku for repetitive tasks (classification, extraction). Use Opus only when you need the extra reasoning power.
Real-World Cost Estimates
Let’s make this concrete. Here’s a Python helper to estimate monthly costs:
def estimate_cost(input_tokens: int, output_tokens: int, model: str = "sonnet") -> float: prices = { "haiku": (0.80, 4.00), # per 1M tokens (input, output) "sonnet": (3.00, 15.00), "opus": (15.00, 75.00), } inp, out = prices[model] return (input_tokens / 1_000_000 * inp) + (output_tokens / 1_000_000 * out)# Example: 1000 API calls, avg 500 input + 300 output tokens eachcalls = 1000cost = estimate_cost(calls * 500, calls * 300, model="sonnet")print(f"Monthly estimate: ${cost:.2f}")# → Monthly estimate: $5.99Some typical scenarios on Sonnet:
| Use Case | Calls/day | Tokens/call | Monthly cost |
|---|---|---|---|
| Simple chatbot | 1,000 | 500 in / 300 out | ~$6 |
| Document summarizer | 200 | 3,000 in / 500 out | ~$7 |
| Code review tool | 100 | 2,000 in / 1,000 out | ~$6 |
| Classification pipeline | 10,000 | 200 in / 50 out | ~$4 |
Cut Costs with Prompt Caching (Up to 90% Off)
Prompt caching is the single biggest lever for reducing API costs. When you mark a portion of your prompt with cache_control, Claude stores it server-side for 5 minutes. Subsequent requests that reuse that prefix pay only 10% of normal input cost for the cached portion.
Cache writes cost 25% more than normal input — but you break even after just 2 cache hits, and save heavily after that. Perfect for:
- Long system prompts repeated across many calls
- Large document contexts (PDFs, codebases) queried multiple times
- Few-shot example sets used in a session
import anthropicclient = anthropic.Anthropic()# System prompt cached after first request (5-minute TTL)response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system=[ { "type": "text", "text": "You are a helpful coding assistant with deep Python expertise.", "cache_control": {"type": "ephemeral"} # cache this prefix } ], messages=[{"role": "user", "content": "Explain list comprehensions."}])# Check cache performance in response headers / usageusage = response.usageprint(f"Input tokens: {usage.input_tokens}")print(f"Cache write tokens: {usage.cache_creation_input_tokens}")print(f"Cache read tokens: {usage.cache_read_input_tokens}")Batch API: 50% Discount for Async Workloads
The Batch API processes requests asynchronously (within 24 hours) at 50% off standard pricing. Ideal for:
- Overnight data processing pipelines
- Bulk content generation or classification
- Offline analysis that doesn’t need real-time responses
For Sonnet, that’s $1.50/MTok input and $7.50/MTok output — cheaper than Haiku at standard rates.
Practical Tips to Keep Costs Low
- Log token usage — every response includes
usage.input_tokensandusage.output_tokens. Track them from day one. - Use
max_tokens— always set a reasonable limit to prevent runaway outputs. - Cache aggressively — anything over 1,024 tokens that repeats is a caching candidate.
- Right-size your model — Haiku at $0.80/MTok handles most classification, extraction, and short Q&A tasks perfectly well.
- Use Batch API for anything that doesn’t need real-time responses.
- Compress history in long conversations — summarize older turns instead of passing them verbatim.
Bottom Line
Claude API pricing is straightforward and highly controllable. For most developer projects, you’re looking at $5–30/month at moderate scale. With prompt caching and the right model tier, costs stay low even as usage grows. The key is to measure from the start — token counts are always in the response, so there’s no excuse not to track them.