AI API Budget Calculator

Set a monthly API budget and see how many calls you can make. Or enter your usage and see the bill. Works for GPT-4o, Claude, Gemini, and any LLM with per-token pricing.

Updated June 2026 · How this works

Calculate

Model

Monthly budget ($)

Avg. prompt length (words)

Avg. output length (words)

See a way to make this better?

Learn more

How It Works

The formula, explained simply

This calculator works in two modes. Budget mode starts with your spending allowance and tells you how many API calls that budget permits given your typical prompt and output length. Usage mode starts with your call volume and shows you the projected monthly bill.

Both modes use the same core formula: cost per call = (input tokens / 1M × input rate) + (output tokens / 1M × output rate). Tokens are estimated at 0.75 words per token.

When To Use This

Right tool, right situation

Use this before starting a new AI feature to establish your budget envelope. Use it when your bill unexpectedly spikes to diagnose which variable changed. Use it when evaluating whether to migrate to a cheaper model as volume scales.

Set real spending limits in your provider dashboard alongside using this calculator — a runaway loop or prompt injection can generate unlimited tokens without a hard cap.

Common Mistakes

Why results sometimes look wrong

Not budgeting for system prompts separately. System prompts repeat on every call. Budget as if your system prompt length adds to every single input.

Using "average" prompt length without variance. If some calls have 100-word prompts and others have 5,000-word RAG contexts, your average is misleading. Track p50 and p95 separately.

Forgetting that context window fills up in multi-turn chats. A 20-turn conversation with 100 words per turn sends 1,900 words of history as context on turn 20 alone.

∑

The Math

Worked examples and deeper derivation

Cost per call = [(input_words / 0.75) / 1,000,000 × input_price_per_MTok] + [(output_words / 0.75) / 1,000,000 × output_price_per_MTok]

Calls affordable = monthly_budget / cost_per_call

Monthly bill = cost_per_call × calls_per_day × 30

The ratio of input to output token cost is typically 1:4 to 1:5. If your use case generates long outputs (essays, code), output cost dominates. If outputs are short (classifications, yes/no), input cost dominates.

MVP on $50/month (Claude Haiku)

{'Mode': 'Budget → Calls', 'Model': 'Claude Haiku 3.5', 'Budget': '$50', 'Prompt': '300 words', 'Output': '150 words'}

≈71,000 calls/month (2,380/day)

Production app (GPT-4o, 5k calls/day)

{'Mode': 'Usage → Cost', 'Model': 'GPT-4o', 'Daily calls': '5,000', 'Prompt': '400 words', 'Output': '200 words'}

≈$477/month

Enterprise on Claude Opus

{'Mode': 'Budget → Calls', 'Model': 'Claude 3 Opus', 'Budget': '$500', 'Prompt': '1,000 words', 'Output': '500 words'}

≈4,200 calls/month (140/day)

Common questions

How do I set an API spending limit?

OpenAI, Anthropic, and Google all allow you to set monthly hard limits and email alerts in your account billing settings. Set a hard limit at your maximum budget and a soft limit (alert) at 80% of that. This calculator helps you figure out what those limits should be given your usage pattern.

What is a realistic AI API budget for a startup MVP?

For a lightweight MVP with a few hundred daily users and short prompts, $10–50/month is typical on mid-tier models. If you are doing document processing, RAG, or long conversations, budget $100–500/month for early-stage. Most founders are surprised how cheap LLM APIs are at pre-scale volumes.

How can I reduce my LLM API bill?

The four main levers are: (1) cache responses for repeated prompts — identical inputs need not be sent twice; (2) trim your system prompt — every token in it is paid on every call; (3) use a smaller model for simpler subtasks; (4) limit output length with max_tokens to prevent runaway responses.

Need something this doesn't cover?

Suggest a tool — we'll build it →