AI Cost Calculator

What does each AI API call actually cost you?

Enter your token counts and the model's pricing to see exactly what each AI API call costs. Works with any provider that charges per token — OpenAI, Anthropic, Google, and others.

Updated July 2026 · How this works

Input tokens

Output tokens

Input price per 1K tokens (USD)

Output price per 1K tokens (USD)

—

See a way to make this better?

Worth knowing

Learn more

How It Works

The formula, explained simply

Think of token pricing like a photocopier that charges differently for the pages you feed in versus the copies it prints out. Every AI provider that sells API access separates the cost of reading your prompt from the cost of writing the response. Those two numbers — input price and output price — multiplied by how many tokens you use, determine the exact bill for every call.

A token is not a word and not a character. It is the smallest chunk the model processes — roughly four characters of English text, which works out to about ~750 words per 1,000 tokens. Code, JSON, and non-Latin scripts tokenize differently, often producing more tokens per character. The token count you see in an API response is the authoritative figure; estimated counts from word counts are useful for budgeting but should not substitute for real usage data in production monitoring.

The separation of input and output pricing matters because the model treats them asymmetrically. Input tokens are processed in one parallelized pass; output tokens are generated one at a time in sequence. Providers price them separately to reflect that operational reality. When you are evaluating models or designing workflows, the ratio of input tokens to output tokens in your typical request is therefore a first-class variable — not just the per-token price.

When To Use This

Right tool, right situation

Use this calculator any time you need to predict, verify, or compare the cost of an AI API call. It is directly useful when evaluating whether a new workflow is financially viable, comparing two models with different pricing tiers, setting per-request cost budgets for a production application, or auditing a bill that came in higher than expected.

It is also well suited for back-of-envelope projections. If you know your average token counts from a few test requests, multiply the result by expected daily or monthly call volume to get a cost estimate you can put in front of a finance team or include in a project proposal.

This calculator is not appropriate for workloads that involve volume discounts, prepaid credit packs, enterprise agreements, or caching credits — these change the effective per-token price and require a model that accounts for those adjustments. Batch API pricing, which some providers offer at a steep discount for asynchronous jobs, also falls outside what a straight per-token calculation captures. For those cases, treat this tool's output as the list-price ceiling and adjust downward based on your actual contract terms.

Common Mistakes

Why results sometimes look wrong

Confusing tokens with words or characters. The most common budgeting error is estimating token counts from word counts alone. Code files, JSON payloads, and structured data can tokenize at two to three times the rate of plain English prose. A developer who budgets for a ~2,000-token code review and gets a ~5,000-token bill has made this mistake. Always measure real token counts from a few representative requests before projecting costs at scale.

Forgetting that conversation history is re-sent on every turn. In a multi-turn chat application, the full message history is included in each API request. A session that feels like a short conversation can accumulate thousands of input tokens because every prior turn is re-tokenized and priced on each call. Building a chatbot without accounting for context accumulation is the fastest path to a surprise invoice.

Using input pricing for both input and output. Some developers find a single headline price on a provider's landing page and apply it uniformly. Output tokens are almost always priced higher. Using the wrong rate understates output-heavy workloads — like creative writing, code generation, or long-form analysis — by a factor of two or more. Always look up both rates on the provider's dedicated pricing page before calculating.

∑

The Math

Worked examples and deeper derivation

The calculation involves two multiplications and one addition. Start with input cost: divide your input token count by 1000, then multiply by the input price per 1000 tokens. That gives the dollar cost of sending your prompt. Do the same for output: divide output tokens by 1000, multiply by output price per 1000 tokens. Add the two components together for total cost.

Using the example values: input cost = (1,500 ÷ 1000) × $0.003 = $0.0045. Output cost = (800 ÷ 1000) × $0.006 = $0.0048. Total = $0.0093. The division by 1000 converts the token count into the same unit as the pricing (which is stated per thousand), making the multiplication produce a dollar figure directly.

Because the formula is linear, cost scales exactly with token count. Doubling input tokens doubles input cost with no compounding or threshold effects. This linearity also means you can estimate monthly spend by multiplying per-call cost by expected call volume — a straightforward unit analysis that many practitioners find useful when setting API budget alerts.

Typical chatbot response — mid-tier model

1,500 input tokens, 800 output tokens, $0.003 input price per 1K, $0.006 output price per 1K

Input cost: (1,500 ÷ 1,000) × $0.003 = $0.0045. Output cost: (800 ÷ 1,000) × $0.006 = $0.0048. Total: $0.0093. At this rate, you could run roughly 10,000 similar calls for under a dollar of API spend — useful context when evaluating a new workflow.

High-volume document summarization — large input, short output

10,000 input tokens, 300 output tokens, $0.008 input price per 1K, $0.024 output price per 1K

Input cost: (10,000 ÷ 1,000) × $0.008 = 0.08. Output cost: (300 ÷ 1,000) × $0.024 = 0.0072. Total: 0.0872. This illustrates how large-context models with expensive input pricing can make document-heavy tasks significantly pricier than chat-style interactions, even when the output is brief.

Code generation assistant — small prompt, long generated output

200 input tokens, 2,000 output tokens, $0.001 input price per 1K, $0.003 output price per 1K

Input cost: (200 ÷ 1,000) × $0.001 = 0.0002. Output cost: (2,000 ÷ 1,000) × $0.003 = 0.006. Total: 0.0062. When a short instruction produces a long code block, nearly all the cost sits in output tokens. This is why code-generation and long-form writing tasks are disproportionately sensitive to output pricing changes.

Expert Unlock

The thing most explanations skip

The linear pricing model this calculator uses assumes every token costs the same regardless of position in the context window. In practice, several providers have introduced tiered or positional pricing for very long contexts — tokens beyond a certain position may cost more or less than tokens at the start of a prompt. If your prompts routinely approach or exceed ~32,000 tokens, verify whether your provider uses flat or positional pricing before treating this calculator's output as authoritative.

The formula also assumes input and output tokens are billed identically whether they appear in a single call or across batched calls. Batch APIs, when available, typically cut costs significantly by deferring execution to off-peak capacity. The per-token math is identical but the rates are not — always use the batch-specific pricing when modeling asynchronous pipelines.

What drives my AI API costs higher than expected?

Why is output priced higher than input for most AI models?

Generating tokens requires more compute than reading them. During inference, the model runs a full forward pass for every output token it produces, while input tokens are processed in a single parallel pass. Most providers charge roughly two to four times more per output token than per input token to reflect this asymmetry.

How do I find the token counts for my actual API calls?

Every API response from major providers includes a usage object — for example, OpenAI returns prompt_tokens and completion_tokens directly in the JSON response body. You can also estimate counts before calling by running your text through a tokenizer: most models using byte-pair encoding average about ~750 words per 1,000 tokens, though code and non-English text can vary significantly.

Does this calculator handle context window costs for long conversations?

Yes — just total the tokens across all turns in the conversation and enter them as your input token count. In a multi-turn chat, the entire message history is re-sent with each request, so earlier turns accumulate in your input token bill. A 10-turn conversation with ~500 tokens per turn means your final request sends ~5,000 input tokens, not ~500.

Need something this doesn't cover?

Suggest a tool — we'll build it →