AI Model Cost Comparator
Compare API costs across Claude, GPT-4o, Gemini, Llama, and Mistral. Enter your usage pattern and see which model gives you the best value for your specific workload.
—
Send feedback
💡 Share your idea or report a problem
✓ Thanks! We'll take a look.
Learn more
How It Works
The formula, explained simply
This tool calculates the monthly API cost of a given usage pattern across the major LLM providers. It uses the official published per-token pricing for each model and converts your word-count estimates to tokens using the standard approximation of 0.75 words per token.
The key variables are: call volume (how many API requests), input length (your prompt + context + system instructions), and output length (the model's response). All three multiply together — doubling any one doubles your bill.
When To Use This
Right tool, right situation
Use this comparator when deciding which LLM to use for a new project, evaluating whether to switch providers as your volume grows, or understanding the cost delta between tiers before committing to a model in production.
Do not use this as your only decision criterion. Latency, rate limits, quality on your specific task, and data residency requirements all matter. Always run evals on your actual task before optimising purely for cost.
Common Mistakes
Why results sometimes look wrong
Forgetting the system prompt. A 500-word system prompt is 667 input tokens on every single call. At 10,000 calls/day, that is 6.7M extra input tokens daily.
Assuming cheaper = worse. Gemini 1.5 Flash, GPT-4o Mini, and Claude Haiku outperform GPT-4 (2023) on many standard benchmarks. The model tier that was "frontier" two years ago is now in the "budget" bracket.
Not accounting for retries and failures. Production systems retry failed calls. A 5% retry rate adds 5% to your token bill. Cache successful responses where possible.
The Math
Worked examples and deeper derivation
Monthly cost = (calls/day × 30) × [(input_tokens / 1M × input_rate) + (output_tokens / 1M × output_rate)]
Token conversion: words × (1 / 0.75) = tokens. So 300 words ≈ 400 tokens.
Output tokens cost more than input because they require the model to do more compute — each output token is generated auto-regressively, attending to all prior context on every step. Input tokens are processed in a single forward pass through the network.
Common questions
Need something this doesn't cover?
Suggest a tool — we'll build it →