Prompt Engineering / GenAIml~20 mins

Context window and token limits in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Context window and token limits

Problem:You are using a language model that can only process a limited number of tokens at once, called the context window. When you input text longer than this limit, the model cannot see all of it, which can reduce the quality of its answers.

Current Metrics:Input text length: 1500 tokens; Model context window: 1024 tokens; Model output relevance score: 60%

Issue:The model's context window is too small for the input text, causing it to miss important information and produce less relevant answers.

Your Task

Adjust the input text or model usage to improve the output relevance score from 60% to at least 80%, without changing the model architecture.

Do not change the model's internal architecture or increase its context window size.

You can only preprocess or split the input text before feeding it to the model.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Example long input text
input_text = """Your very long input text that exceeds 1024 tokens..."""

# Tokenize input
input_tokens = tokenizer.encode(input_text)

# Define context window size
context_window = 1024

# Split input tokens into chunks
chunks = [input_tokens[i:i+context_window] for i in range(0, len(input_tokens), context_window)]

outputs = []
for chunk in chunks:
    input_ids = torch.tensor([chunk])
    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=50)
    decoded_output = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
    outputs.append(decoded_output)

# Combine outputs
final_output = ' '.join(outputs)
print(final_output)

Split the input text into smaller chunks that fit within the model's 1024 token context window.

Processed each chunk separately through the model.

Combined the outputs from each chunk to form a complete response.

Results Interpretation

Before: The model received 1500 tokens at once, exceeding its 1024 token limit, resulting in a 60% relevance score.

After: By splitting the input into chunks within the 1024 token limit and processing separately, the relevance score improved to 85%.

This shows that respecting the model's context window by splitting or summarizing input helps the model understand better and produce more relevant outputs.

Bonus Experiment

Try using a summarization model to shorten the input text before feeding it to the language model, aiming to keep the most important information within the context window.

💡 Hint

Use a pretrained summarization model like T5 or BART to reduce input length while preserving meaning.

Practice

(1/5)

1. What does the context window in a language model refer to?

easy

A. The speed at which the model generates text

B. The maximum amount of text the model can process at once

C. The number of layers in the model

D. The size of the model's vocabulary

Context window and token limits in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the term 'context window'

Step 2: Relate to model processing limits

Final Answer:

Quick Check:

Solution

Step 1: Understand token counting

Step 2: Use tokenizer to encode text

Final Answer:

Quick Check:

Solution

Step 1: Check for defined variables

Step 2: Trace execution

Final Answer:

Quick Check:

Solution

Step 1: Trace code execution flow

Step 2: Check model.generate() input type

Final Answer:

Quick Check:

Solution

Step 1: Understand token limit constraints

Step 2: Choose a method to handle long text

Step 3: Evaluate other options

Final Answer:

Quick Check: