Challenge - 5 Problems
Text Chunking Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate2:00remaining
Why use text chunking in language models?
Which of the following best explains why text chunking is important when processing long documents with language models?
Attempts:
2 left
💡 Hint
Think about model input size limits and how chunking helps manage them.
✗ Incorrect
Language models have limits on how many words or tokens they can process at once. Chunking splits long texts into smaller pieces so the model can handle them effectively without losing important information.
❓ Predict Output
intermediate2:00remaining
Output of text chunking code
What is the output of this Python code that chunks text into parts of 5 words each?
Prompt Engineering / GenAI
text = 'Machine learning helps computers learn from data and improve over time' words = text.split() chunks = [' '.join(words[i:i+5]) for i in range(0, len(words), 5)] print(chunks)
Attempts:
2 left
💡 Hint
Look at how the range and slicing work with step 5.
✗ Incorrect
The code splits the text into words, then groups every 5 words into a chunk. The first chunk has words 0-4, second 5-9, and the last chunk has the remaining word.
❓ Model Choice
advanced2:00remaining
Choosing chunk size for a transformer model
You want to chunk a large document for a transformer model with a maximum input length of 512 tokens. Which chunk size is best to avoid losing context and stay within limits?
Attempts:
2 left
💡 Hint
Think about balancing chunk size and context overlap.
✗ Incorrect
Using chunk size equal to max input length leaves no room for overlap, which can cause loss of context between chunks. Smaller chunks with overlap help preserve context while staying within limits.
❓ Metrics
advanced2:00remaining
Evaluating chunking impact on model accuracy
After chunking text differently, you get these model accuracies on a classification task:
- Chunk size 128 tokens: 85%
- Chunk size 256 tokens: 88%
- Chunk size 512 tokens: 86%
Which chunk size likely balances context and input size best?
Attempts:
2 left
💡 Hint
Look at which chunk size yields the best accuracy.
✗ Incorrect
256 tokens chunk size yields the highest accuracy, indicating it balances enough context and manageable input size for the model.
🔧 Debug
expert2:00remaining
Debugging chunk overlap code
What error or output does this code produce?
text = 'AI models need context to understand text better'
words = text.split()
overlap = 2
chunk_size = 5
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = words[i:i+chunk_size]
chunks.append(' '.join(chunk))
print(chunks)
Attempts:
2 left
💡 Hint
Check how the loop increments and slicing work with overlap.
✗ Incorrect
The loop moves by chunk_size - overlap (3), so chunks overlap by 2 words. The slices produce overlapping chunks as in option A.