Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Text chunking strategies in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Text Chunking Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use text chunking in language models?
Which of the following best explains why text chunking is important when processing long documents with language models?
AIt helps break down long texts into smaller parts so the model can process them without losing context due to input length limits.
BIt increases the total number of words in the text to improve model accuracy.
CIt removes all punctuation to simplify the text for the model.
DIt translates the text into multiple languages before processing.
Attempts:
2 left
💡 Hint
Think about model input size limits and how chunking helps manage them.
Predict Output
intermediate
2:00remaining
Output of text chunking code
What is the output of this Python code that chunks text into parts of 5 words each?
Prompt Engineering / GenAI
text = 'Machine learning helps computers learn from data and improve over time'
words = text.split()
chunks = [' '.join(words[i:i+5]) for i in range(0, len(words), 5)]
print(chunks)
A['Machine learning helps computers', 'learn from data and improve', 'over time']
B['Machine learning helps computers learn', 'from data and improve over', 'time']
C['Machine learning helps computers learn from', 'data and improve over time']
D['Machine learning helps', 'computers learn from data', 'and improve over time']
Attempts:
2 left
💡 Hint
Look at how the range and slicing work with step 5.
Model Choice
advanced
2:00remaining
Choosing chunk size for a transformer model
You want to chunk a large document for a transformer model with a maximum input length of 512 tokens. Which chunk size is best to avoid losing context and stay within limits?
AChunk size of 50 tokens to keep chunks very small.
BChunk size of 512 tokens to exactly match the model's max input length.
CChunk size of 1000 tokens to reduce the number of chunks.
DChunk size of 256 tokens to allow some overlap and context between chunks.
Attempts:
2 left
💡 Hint
Think about balancing chunk size and context overlap.
Metrics
advanced
2:00remaining
Evaluating chunking impact on model accuracy
After chunking text differently, you get these model accuracies on a classification task: - Chunk size 128 tokens: 85% - Chunk size 256 tokens: 88% - Chunk size 512 tokens: 86% Which chunk size likely balances context and input size best?
A128 tokens, because smaller chunks always improve accuracy.
B512 tokens, because larger chunks contain more information.
C256 tokens, because it gives the highest accuracy by balancing chunk size and context.
DAll chunk sizes perform the same, so chunking does not matter.
Attempts:
2 left
💡 Hint
Look at which chunk size yields the best accuracy.
🔧 Debug
expert
2:00remaining
Debugging chunk overlap code
What error or output does this code produce? text = 'AI models need context to understand text better' words = text.split() overlap = 2 chunk_size = 5 chunks = [] for i in range(0, len(words), chunk_size - overlap): chunk = words[i:i+chunk_size] chunks.append(' '.join(chunk)) print(chunks)
A['AI models need context to', 'context to understand text better', 'text better']
B['AI models need context to', 'need context to understand', 'to understand text better']
C['AI models need context to', 'models need context to understand', 'context to understand text better']
DIndexError because the loop steps cause out-of-range slicing.
Attempts:
2 left
💡 Hint
Check how the loop increments and slicing work with overlap.

Practice

(1/5)
1. What is the main purpose of text chunking in AI models?
easy
A. To generate new text from scratch
B. To split long text into smaller, manageable pieces
C. To remove stop words from text
D. To translate text into different languages

Solution

  1. Step 1: Understand the concept of text chunking

    Text chunking means breaking a long text into smaller parts so it is easier to handle.
  2. Step 2: Identify the main goal in AI context

    This helps AI models process and understand large texts better by working on smaller pieces.
  3. Final Answer:

    To split long text into smaller, manageable pieces -> Option B
  4. Quick Check:

    Text chunking = splitting text [OK]
Hint: Chunking means breaking text into smaller parts [OK]
Common Mistakes:
  • Confusing chunking with translation
  • Thinking chunking removes words
  • Believing chunking generates new text
2. Which of the following is a correct way to create overlapping text chunks in Python?
easy
A. chunks = [text[i:i+chunk_size] for i in range(0, len(text), overlap)]
B. chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
C. chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size - overlap)]
D. chunks = [text[i:i+chunk_size] for i in range(overlap, len(text), chunk_size)]

Solution

  1. Step 1: Understand overlapping chunk logic

    To create overlapping chunks, the step size must be smaller than chunk size by the overlap amount.
  2. Step 2: Check the range step in options

    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size - overlap)] uses chunk_size - overlap as step, correctly creating overlaps.
  3. Final Answer:

    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size - overlap)] -> Option C
  4. Quick Check:

    Overlap step = chunk_size - overlap [OK]
Hint: Overlap step = chunk size minus overlap length [OK]
Common Mistakes:
  • Using chunk_size as step (no overlap)
  • Using overlap as step (too small steps)
  • Starting range at overlap instead of zero
3. Given text = 'abcdefghij', chunk_size = 4, and overlap = 2, what is the output of this code?
chunks = [text[i:i+chunk_size] for i in range(0, len(text)-overlap, chunk_size - overlap)]
print(chunks)
medium
A. ['abcd', 'cdef', 'efgh', 'ghij']
B. ['abcd', 'efgh', 'ij']
C. ['abcd', 'bcde', 'cdef', 'defg']
D. ['abcd', 'bcdf', 'cdeg', 'defh']

Solution

  1. Step 1: Calculate step size

    Step = chunk_size - overlap = 4 - 2 = 2.
  2. Step 2: Generate chunks using step 2

    Chunks are:
    i=0: text[0:4] = 'abcd'
    i=2: text[2:6] = 'cdef'
    i=4: text[4:8] = 'efgh'
    i=6: text[6:10] = 'ghij'
  3. Final Answer:

    ['abcd', 'cdef', 'efgh', 'ghij'] -> Option A
  4. Quick Check:

    Chunks overlap by 2 chars = ['abcd', 'cdef', 'efgh', 'ghij'] [OK]
Hint: Step = chunk size minus overlap; slice text accordingly [OK]
Common Mistakes:
  • Ignoring overlap and stepping by chunk size
  • Wrong slicing indices
  • Confusing overlap with chunk size
4. This code aims to chunk text with overlap but has a bug:
chunk_size = 5
overlap = 2
chunks = []
for i in range(0, len(text), chunk_size + overlap):
    chunks.append(text[i:i+chunk_size])
print(chunks)

What is the error?
medium
A. Step size should be chunk_size - overlap, not chunk_size + overlap
B. Chunk size should be increased by overlap
C. Overlap should be zero for chunking
D. The loop should start at overlap, not zero

Solution

  1. Step 1: Understand step size for overlapping chunks

    To create overlap, step size must be less than chunk size by overlap amount.
  2. Step 2: Identify incorrect step in code

    Code uses chunk_size + overlap which skips overlap, causing gaps.
  3. Final Answer:

    Step size should be chunk_size - overlap, not chunk_size + overlap -> Option A
  4. Quick Check:

    Overlap step = chunk_size - overlap [OK]
Hint: Overlap step = chunk size minus overlap, not plus [OK]
Common Mistakes:
  • Adding overlap instead of subtracting
  • Setting overlap to zero incorrectly
  • Changing loop start index wrongly
5. You have a very long document and want to chunk it for an AI model. You want each chunk to have 100 words and overlap by 20 words to keep context. Which strategy balances chunk size and context best?
hard
A. Use chunk size 80 and step size 100 to create non-overlapping chunks
B. Use chunk size 100 and step size 100 to create overlapping chunks
C. Use chunk size 120 and step size 100 to create overlapping chunks
D. Use chunk size 100 and step size 80 (100 - 20) to create overlapping chunks

Solution

  1. Step 1: Define chunk and step sizes for overlap

    Chunk size is 100 words, overlap is 20 words, so step size = 100 - 20 = 80.
  2. Step 2: Choose correct step size to maintain overlap

    Step size 80 means each chunk starts 80 words after previous, overlapping 20 words.
  3. Final Answer:

    Use chunk size 100 and step size 80 (100 - 20) to create overlapping chunks -> Option D
  4. Quick Check:

    Step = chunk size - overlap = 80 [OK]
Hint: Step size = chunk size minus overlap for best context [OK]
Common Mistakes:
  • Using step size larger than chunk size
  • Setting overlap to zero accidentally
  • Confusing chunk size with step size