Context Window Calculator

Check if your text fits in a model's context window. Paste your prompt, documents, and conversation history to see total token count and whether it fits in GPT-4o, Claude, or Gemini.

Updated June 2026 · How this works

How It Works
The formula, explained simply

A context window is the maximum amount of text an LLM can process in a single API call — including your system instructions, conversation history, any documents you attach, and your actual question. Think of it as the model's working memory.

This calculator approximates token count using the standard rule of 4 characters per token (valid for English prose). It then checks your content against the published context limits of all major models, showing a visual fill indicator and remaining headroom for each.

Note: context window ≠ output limit. Even if your input fits, most models cap output at 4K–8K tokens regardless of context size.

When To Use This
Right tool, right situation

Use this when building a RAG system (to check document chunk sizes), when debugging context overflow errors, when designing a multi-turn chatbot (to plan the conversation history budget), or when deciding which model to use for long-document tasks.

For production systems, use your framework's built-in token counter (tiktoken, anthropic-sdk) rather than the character approximation — accuracy matters when you are near the limit.

Common Mistakes
Why results sometimes look wrong

Assuming larger context = better performance. Models are not equally attentive across the full context. Cramming a 100K context window does not guarantee the model will use all of it effectively — retrieval accuracy degrades, especially in the middle.

Ignoring output headroom. Context window includes both input and output. If you send 120K input tokens to GPT-4o (128K limit), you only have 8K tokens left for output. This silently truncates long responses.

Counting only your message, not the full request. System prompt, all prior conversation turns, tool definitions, and any injected documents all count as input tokens.

The Math
Worked examples and deeper derivation

Token estimate: characters / 4. So 40,000 characters ≈ 10,000 tokens.

Fill percentage: (your_tokens / context_limit) × 100.

Remaining headroom: context_limit − your_tokens.

Context window sizes as of May 2026: GPT-3.5 Turbo = 16K, GPT-4o = 128K, Claude 3.5 Sonnet = 200K, Gemini 1.5 Flash/Pro = 1M tokens.

Important: models degrade on tasks requiring retrieval from very full context windows. The "lost in the middle" phenomenon means information in the middle of a long context is recalled less reliably than information at the beginning or end.

Single-turn Q&A
{'Content': 'System prompt (200 words) + one user question (50 words) = 250 words total'}
≈333 tokens — fits all models with >99% headroom
Book chapter analysis
{'Content': 'One chapter of a novel ≈ 5,000 words'}
≈6,667 tokens — fits all models except GPT-3.5 Turbo (16K limit, 42% used)
Full novel in context
{'Content': '80,000-word novel'}
≈106,667 tokens — fits GPT-4o (128K), Claude (200K), Gemini Pro (1M); too large for GPT-3.5

Common questions

What happens when you exceed a context window?
The API returns an error and does not process the request. You must truncate or summarise your content to fit within the model's limit before retrying. Some frameworks handle this automatically with a sliding window — keeping the most recent turns and summarising older ones.
Does a larger context window cost more?
Yes — every token in your context is an input token you are billed for. Filling a 200K context window with 150,000 tokens costs 150,000 × (input price per token). On Claude 3.5 Sonnet at $3/MTok, that is $0.45 in input costs per call — significant at scale.
Which model has the largest context window?
As of mid-2026, Gemini 1.5 Pro offers 1 million tokens (≈750,000 words), suitable for multi-book analysis or very long codebases. Claude 3.5 Sonnet offers 200K tokens (≈150,000 words). GPT-4o offers 128K tokens (≈96,000 words).

Need something this doesn't cover?

Suggest a tool — we'll build it →