Context Window Calculator
Check if your text fits in a model's context window. Paste your prompt, documents, and conversation history to see total token count and whether it fits in GPT-4o, Claude, or Gemini.
—
Send feedback
💡 Share your idea or report a problem
✓ Thanks! We'll take a look.
Learn more
How It Works
The formula, explained simply
A context window is the maximum amount of text an LLM can process in a single API call — including your system instructions, conversation history, any documents you attach, and your actual question. Think of it as the model's working memory.
This calculator approximates token count using the standard rule of 4 characters per token (valid for English prose). It then checks your content against the published context limits of all major models, showing a visual fill indicator and remaining headroom for each.
Note: context window ≠ output limit. Even if your input fits, most models cap output at 4K–8K tokens regardless of context size.
When To Use This
Right tool, right situation
Use this when building a RAG system (to check document chunk sizes), when debugging context overflow errors, when designing a multi-turn chatbot (to plan the conversation history budget), or when deciding which model to use for long-document tasks.
For production systems, use your framework's built-in token counter (tiktoken, anthropic-sdk) rather than the character approximation — accuracy matters when you are near the limit.
Common Mistakes
Why results sometimes look wrong
Assuming larger context = better performance. Models are not equally attentive across the full context. Cramming a 100K context window does not guarantee the model will use all of it effectively — retrieval accuracy degrades, especially in the middle.
Ignoring output headroom. Context window includes both input and output. If you send 120K input tokens to GPT-4o (128K limit), you only have 8K tokens left for output. This silently truncates long responses.
Counting only your message, not the full request. System prompt, all prior conversation turns, tool definitions, and any injected documents all count as input tokens.
The Math
Worked examples and deeper derivation
Token estimate: characters / 4. So 40,000 characters ≈ 10,000 tokens.
Fill percentage: (your_tokens / context_limit) × 100.
Remaining headroom: context_limit − your_tokens.
Context window sizes as of May 2026: GPT-3.5 Turbo = 16K, GPT-4o = 128K, Claude 3.5 Sonnet = 200K, Gemini 1.5 Flash/Pro = 1M tokens.
Important: models degrade on tasks requiring retrieval from very full context windows. The "lost in the middle" phenomenon means information in the middle of a long context is recalled less reliably than information at the beginning or end.
Common questions
Need something this doesn't cover?
Suggest a tool — we'll build it →