AI Token Cost Calculator

How much will your next AI API call actually cost?

Enter your input and output token counts alongside your provider's per-million pricing to get the exact cost of any AI API call. Works with any model that charges separately for input and output tokens.

Updated July 2026 · How this works

Input Tokens

Output Tokens

Input Price per Million Tokens ($)

Output Price per Million Tokens ($)

—

See a way to make this better?

Worth knowing

Learn more

How It Works

The formula, explained simply

Think of an AI API call like a two-part taxi ride: you pay one rate to travel to the destination (your prompt going into the model) and a different, usually higher rate for the return trip (the model's generated response coming back). The meter runs on tokens for both legs, but the fares are different. That is the entire pricing model — two rates, two token counts, one bill.

The reason providers split the pricing is computational: reading and processing your input tokens can be handled in parallel across many hardware cores, while generating each output token has to happen sequentially — the model produces one token, uses it to decide the next, and so on. Sequential generation is more expensive to run, so it costs more to buy. When you see a model listed as $5 in / $15 out, that ratio is not arbitrary — it roughly reflects the processing cost asymmetry.

Because the two costs are completely independent, the bill for the same number of total tokens varies dramatically depending on the split. A 2000-token exchange that is mostly prompt with a short answer costs far less than the same 2000 tokens split evenly, because more of them are charged at the cheaper input rate. Controlling how much the model is allowed to say is one of the most direct levers you have on cost at scale.

When To Use This

Right tool, right situation

Use this calculator any time you are deciding whether an AI API integration is financially viable, choosing between two models for the same task, or building a cost model for a product that makes API calls on behalf of users. It gives you an exact number from known inputs, so it is most useful when you already have real token counts from test runs rather than guesses.

It is also the right tool when you receive an unexpected API bill and need to reconstruct where the cost came from. Plug in the token counts from your usage dashboard and the pricing from your provider's page, and you can verify or dispute the charge without relying on the provider's own cost breakdown.

This calculator is not the right tool when your provider applies tiered volume discounts, context caching credits, or free-tier allowances that change your effective rate mid-month. In those cases, the formula still tells you the undiscounted cost — treat that as your ceiling and negotiate from there. It also does not account for latency, rate limits, or the cost of retries, which matter operationally but do not appear in a per-token pricing model.

Common Mistakes

Why results sometimes look wrong

Mistake 1: Entering per-thousand pricing as if it were per-million. Some older documentation or third-party summaries quote rates per 1,000 tokens instead of per million. If you enter a per-thousand rate into this calculator, your result will be 1,000 times too high. Always confirm the unit on the pricing page — look for the word million in the column header, not thousand. When in doubt, check that your result looks plausible: a single short API call should cost fractions of a cent, not several dollars.

Mistake 2: Ignoring output token volume when comparing models. A model priced at a lower headline rate looks cheaper until you account for how many output tokens it takes to complete your task. If a cheaper model writes verbose answers and a more expensive model writes tight, precise ones, the cheaper model's lower per-token rate may not offset the higher token count. The right comparison is total cost per task, not cost per token in isolation.

Mistake 3: Forgetting that system prompts are charged on every request. A 500-token system prompt that sets the model's behavior sounds small. Across 100000 requests, that is 500 million additional input tokens charged at your full input rate. System prompt length is a fixed cost that compounds with volume in a way that one-off testing never reveals. Audit your recurring prompt components the same way you would audit a fixed monthly subscription.

∑

The Math

Worked examples and deeper derivation

The formula has two terms, one for each token type. For input tokens: take your input token count, divide by 1000000, then multiply by the input price per million. That gives $0.005 for the example inputs loaded here. For output tokens: same structure — divide by 1000000, multiply by the output price per million, giving $0.0075. Add both terms together to get the total request cost of $0.0125.

Written compactly: Total Cost = (Input Tokens ÷ 1000000) × Input Price + (Output Tokens ÷ 1000000) × Output Price. The division by 1000000 converts the raw token count into the same unit the provider quotes — millions of tokens. Everything else is multiplication and addition.

For bulk projections, multiply the per-request cost by your expected request volume. Because each request is independent, the costs scale linearly — double the requests, double the bill. There are no compounding effects, no diminishing returns from the formula's perspective. Volume discounts, if your provider offers them, operate outside this formula and would lower your effective per-million rate below what you enter here.

Typical chatbot API call with GPT-style pricing

1,000 input tokens, 500 output tokens, $5.00 per million input, $15.00 per million output

The input cost is $0.005 and the output cost is $0.0075, giving a total of $0.0125 per request. Running 1000 such requests costs $12.50. Notice that output tokens account for 60% of the bill despite being only half the token count — output pricing is three times higher in this scenario, so token-heavy responses drive costs fast.

Prompt-only batch job with no generated output

50,000 input tokens, 25,000 output tokens, $10.00 per million input, $30.00 per million output

Input costs come to $0.5 and output costs to $0.75, for a total of $1.25. Scaling to 1000 identical requests lands at $1,250.00. This kind of document-processing workload — long prompts, moderately long answers — is where output pricing dominates: 60% of the cost comes from the generated text alone.

Developer stress-testing a cheap model at volume

2,000 input tokens, 0 output tokens, $3.00 per million input, $9.00 per million output

With zero output tokens the output cost is $0 and the entire bill is the input cost of $0.006, totalling $0.006. At 1000 requests that is $6.00 — useful for budgeting embedding or classification jobs where the model returns a short label or score rather than a full text response.

Expert Unlock

The thing most explanations skip

The per-million pricing model assumes a flat rate, but most production workloads have heavy-tailed token distributions — a small percentage of requests generate very long outputs that dwarf the average. If you budget using your median token count, your actual bill will exceed the estimate because the mean is pulled up by outliers. Build your cost model on the 95th-percentile request size, not the average, and set a max-token cap in your API call to bound your worst-case per-request cost. The formula is exact for what you enter; the uncertainty lives entirely in the token count inputs.

Why does changing output tokens change my bill so much more than input tokens?

How do I find the input and output price per million tokens for my AI model?

Every major provider publishes a pricing page listing rates per million tokens for each model — search for your provider name plus API pricing and look for a table with input and output columns. Rates are quoted per million tokens as a convention because per-token prices would be too small to read at a glance. If you see per-thousand pricing instead, multiply that figure by 1,000 to get the per-million number this calculator uses.

Prices change frequently as models update and competition increases, so check the live pricing page rather than relying on a cached search result. Some providers also offer batch or cached-input discounts that lower the effective rate below the headline figure — this calculator uses the standard on-demand rate you enter, with no discounts applied.

Why are output tokens priced higher than input tokens?

Generating each output token requires a full forward pass through the model, while processing input tokens can be parallelized more efficiently — the computational cost per token is genuinely higher for generation than for ingestion. Providers pass that asymmetry directly into their pricing tiers, which is why output rates are often two to four times the input rate for the same model.

This pricing structure has a practical implication: if you can shorten your prompts to get the same quality response, you save a small amount on input costs, but if you can instruct the model to be more concise in its reply, you save proportionally more. Long system prompts that repeat on every request add up quickly at scale precisely because they are charged as input on every single call.

How do I estimate how many tokens my text will use before sending it?

A practical rule of thumb is that one token corresponds to roughly three to four characters of English text, or about 0.75 words per token — so 1,000 tokens is approximately 750 words. Most providers also offer open-source tokenizer libraries (tiktoken for OpenAI models, for example) that count tokens exactly using the same rules the model applies, which is more reliable than the word-count estimate for non-English text or code.

For budgeting purposes, start with the rough estimate to get your ballpark, then run the exact tokenizer on a representative sample of your actual prompts. Output token counts are harder to predict in advance because they depend on the model's response length — for consistent workloads, measure a set of real responses and average the output token count to build your cost model.

Need something this doesn't cover?

Suggest a tool — we'll build it →