What is a Context Window?
A context window (or context length) is the maximum number of tokens a language model can process in a single interaction. It encompasses both the input prompt and the model's response, defining the "working memory" of the model.
Context Window Sizes
Historical
- GPT-2: 1,024 tokens
- GPT-3: 4,096 tokens
Current Generation
- GPT-4: 8K-128K tokens
- Claude: 100K-200K tokens
- Gemini: Up to 1M tokens
Token Basics
What is a Token?
- Roughly 4 characters in English
- ~750 words ≈ 1,000 tokens
- Varies by language
Token Counting
- Input tokens (prompt)
- Output tokens (response)
- Total must fit in window
Implications
Capabilities
- Longer documents can be processed
- More context for better responses
- Extended conversations
Limitations
- Cost increases with tokens
- Processing time increases
- "Lost in the middle" phenomenon
Strategies for Long Content
Chunking Break content into pieces.
Summarization Compress information.
RAG Retrieve relevant context.
Sliding Window Process sequentially.
Best Practices
- Use context efficiently
- Prioritize relevant information
- Consider cost implications
- Test with various lengths