0
0
Prompt Engineering / GenAIml~6 mins

Context window and token limits in Prompt Engineering / GenAI - Full Explanation

Choose your learning style9 modes available
Introduction
Imagine trying to have a long conversation but only being able to remember a few sentences at a time. This is the challenge that AI language models face with their context window and token limits. Understanding these limits helps us know how much information the AI can handle at once.
Explanation
Context Window
The context window is the amount of text the AI model can look at and understand in one go. It includes both the input you give and the AI's own responses. If the conversation or text is longer than this window, the AI might forget earlier parts.
The context window sets the maximum text length the AI can process at once.
Tokens
Tokens are small pieces of text, like words or parts of words, that the AI uses to read and write. Instead of counting letters or words, the AI counts tokens to measure text length. Different words can be one or more tokens.
Tokens are the units the AI counts to manage text length.
Token Limits
Token limits define the maximum number of tokens the AI can handle in its context window. If you exceed this limit, the AI will only consider the most recent tokens within the limit, dropping older ones. This affects how much history the AI remembers.
Token limits restrict how much text the AI can keep in memory during a conversation.
Impact on Conversations
Because of token limits, very long conversations or documents may lose earlier details as the AI forgets older tokens. This means the AI might not recall all previous information, affecting the quality of responses in long chats.
Token limits can cause the AI to forget earlier parts of long conversations.
Real World Analogy

Imagine you have a small whiteboard to write notes during a meeting. You can only fit a limited number of notes on it. When the board is full, you erase the oldest notes to make space for new ones. This way, you only remember the most recent points.

Context Window → The size of the whiteboard where notes are written
Tokens → Each individual note or bullet point written on the whiteboard
Token Limits → The maximum number of notes the whiteboard can hold before erasing old ones
Impact on Conversations → Forgetting older notes when the whiteboard is full, so only recent points are visible
Diagram
Diagram
┌───────────────────────────────┐
│        Context Window          │
│ ┌───────────────┐             │
│ │ Token 1       │             │
│ │ Token 2       │             │
│ │ Token 3       │             │
│ │ ...           │             │
│ │ Token N       │             │
│ └───────────────┘             │
│ (Max tokens allowed)           │
└───────────────────────────────┘

Older tokens → dropped when limit exceeded → only recent tokens kept
This diagram shows the context window as a container holding tokens up to a maximum limit, with older tokens removed when the limit is exceeded.
Key Facts
Context WindowThe maximum amount of text the AI model can process at once.
TokenA small piece of text, like a word or part of a word, used by AI to count text length.
Token LimitThe maximum number of tokens the AI can handle in its context window.
Token OverflowWhen the input exceeds the token limit, causing older tokens to be dropped.
Memory Loss in AIThe AI forgetting earlier parts of a conversation due to token limits.
Common Confusions
Thinking tokens are the same as words.
Thinking tokens are the same as words. Tokens can be whole words or parts of words; some words split into multiple tokens.
Believing the AI remembers everything from the start of a conversation.
Believing the AI remembers everything from the start of a conversation. The AI only remembers tokens within its context window; older parts are forgotten if limits are exceeded.
Assuming token limits only apply to user input.
Assuming token limits only apply to user input. Token limits include both user input and AI-generated responses combined.
Summary
The context window limits how much text the AI can process at once, measured in tokens.
Tokens are small text pieces that the AI counts to manage input and output length.
When token limits are exceeded, the AI forgets older text, affecting long conversations.