0
0
Prompt Engineering / GenAIml~20 mins

Text splitters in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Text Splitter Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding the purpose of text splitters

Why do we use text splitters when preparing data for language models?

ATo break long text into smaller chunks that fit model input limits
BTo translate text into different languages automatically
CTo remove all punctuation from the text
DTo convert text into numerical vectors directly
Attempts:
2 left
💡 Hint

Think about the input size limits of language models.

Predict Output
intermediate
1:30remaining
Output of a simple character-based text splitter

What is the output of this code that splits text into chunks of 5 characters?

Prompt Engineering / GenAI
text = 'HelloWorld!'
chunks = [text[i:i+5] for i in range(0, len(text), 5)]
print(chunks)
A['Hello', 'World', '']
B['Hello', 'World!']
C['Hello', 'World', '!']
D['HelloW', 'orld!']
Attempts:
2 left
💡 Hint

Look at how slicing works with step size 5.

Model Choice
advanced
2:00remaining
Choosing the best text splitter for semantic search

You want to split a long document into chunks that keep sentences intact for semantic search. Which splitter is best?

ARandom splitter that splits text at random positions
BCharacter splitter that cuts every 100 characters
CWord splitter that splits text every 10 words regardless of sentence
DSentence splitter that splits text by sentence boundaries
Attempts:
2 left
💡 Hint

Think about preserving meaning in chunks.

Hyperparameter
advanced
1:30remaining
Effect of chunk overlap in text splitting

What is the main effect of increasing chunk overlap when splitting text?

AIt increases redundancy but helps preserve context across chunks
BIt reduces the total number of chunks but loses context between chunks
CIt speeds up processing by reducing chunk size drastically
DIt removes stopwords automatically from each chunk
Attempts:
2 left
💡 Hint

Overlap means repeating some text in adjacent chunks.

🔧 Debug
expert
2:00remaining
Identifying error in recursive text splitter code

What error does this recursive text splitter code raise?

Prompt Engineering / GenAI
def recursive_split(text, max_len):
    if len(text) <= max_len:
        return [text]
    else:
        split_point = text.rfind('.', 0, max_len)
        if split_point == -1:
            split_point = max_len
        return [text[:split_point]] + recursive_split(text[split_point:], max_len)

chunks = recursive_split('This is a sentence. This is another sentence.', 10)
print(chunks)
AValueError because rfind returns -1
BRecursionError due to infinite recursion
CTypeError because of wrong data type
DNo error, outputs correct chunks
Attempts:
2 left
💡 Hint

Check what happens when split_point equals max_len and slicing text[split_point:] repeatedly.