The maximum amount of text an LLM can process in a single interaction.
The Context Window refers to the maximum number of tokens (roughly word pieces) that a language model can process in a single interaction. This includes both the input prompt and the generated output. It's a fundamental constraint that affects how much information you can provide to and receive from an LLM.
Context windows have grown significantly over time: early GPT-3 had 4K tokens, GPT-4 introduced 8K and 32K variants, and models like Claude now support up to 200K tokens. Larger context windows enable processing of longer documents, maintaining longer conversations, and handling more complex multi-step tasks.
For AI engineers, managing context effectively is crucial. Strategies include chunking documents for RAG, summarizing previous conversation turns, prioritizing relevant information, and understanding when to use long-context models versus retrieval-based approaches. Token counting and context management are everyday concerns when building production AI applications.
The maximum amount of text an LLM can process in a single interaction.
Join our network of elite AI-native engineers.