Practice

(1/5)

1. What is the main purpose of a text splitter in AI applications?

easy

A. To translate text into different languages

B. To generate new text from a prompt

C. To break long text into smaller, manageable pieces

D. To summarize text into a single sentence

Solution

Step 1: Understand the role of text splitters
Text splitters are designed to divide long text into smaller parts for easier processing.
Step 2: Compare options to the definition
Only To break long text into smaller, manageable pieces describes breaking text into smaller pieces, which matches the purpose of text splitters.
Final Answer:
To break long text into smaller, manageable pieces -> Option C
Quick Check:
Text splitter purpose = break text [OK]

Hint: Text splitters cut text into chunks for easier handling [OK]

Common Mistakes:

Confusing splitting with translation
Thinking splitters summarize text
Assuming splitters generate new text

2. Which of the following is the correct way to set chunk size and overlap in a text splitter?

easy

A. chunk_size='100', overlap=20

B. chunkSize=100, overlap=20

C. chunk_size=100, overlap=twenty

D. chunk_size=100, overlap=20

Solution

Step 1: Identify correct parameter names and types
Parameters should be named with underscores and numeric values for size and overlap.
Step 2: Check each option for syntax and type correctness
chunk_size=100, overlap=20 uses correct parameter names and numeric values; others have wrong names or types.
Final Answer:
chunk_size=100, overlap=20 -> Option D
Quick Check:
Correct param names and numeric values = chunk_size=100, overlap=20 [OK]

Hint: Use underscores and numbers for chunk size and overlap [OK]

Common Mistakes:

Using camelCase instead of snake_case
Passing string instead of number for overlap
Misspelling parameter names

3. Given the following Python code using a text splitter:

text = "Hello world! This is a test of text splitting."
chunk_size = 12
overlap = 4
splitter = TextSplitter(chunk_size=chunk_size, overlap=overlap)
chunks = splitter.split(text)
print(chunks)

What is the expected output?

medium

A. ["Hello world!", "world! This is", "This is a test", "a test of text", "of text splitting."]

B. ["Hello world! This", "This is a test of", "test of text splitting."]

C. ["Hello world! This is a test of text splitting."]

D. ["Hello", "world!", "This", "is", "a", "test", "of", "text", "splitting."]

Solution

Step 1: Understand chunk size and overlap effect
Chunk size 12 means each piece has up to 12 characters; overlap 4 means next chunk starts 4 characters before previous ends.
Step 2: Apply splitting logic to the text
Chunks are: "Hello world!" (12 chars), then start 8 chars in (12-4=8) at "world! This is", and so on, producing the listed chunks in ["Hello world!", "world! This is", "This is a test", "a test of text", "of text splitting."].
Final Answer:
["Hello world!", "world! This is", "This is a test", "a test of text", "of text splitting."] -> Option A
Quick Check:
Chunk size 12 + overlap 4 = overlapping chunks [OK]

Hint: Chunk size limits length; overlap repeats last part [OK]

Common Mistakes:

Ignoring overlap and making chunks non-overlapping
Using wrong chunk sizes
Returning entire text as one chunk

4. Consider this code snippet that tries to split text but raises an error:

text = "Sample text for splitting."
splitter = TextSplitter(chunk_size='10', overlap=3)
chunks = splitter.split(text)

What is the most likely cause of the error?

medium

A. chunk_size should be an integer, not a string

B. overlap cannot be less than 5

C. TextSplitter requires a minimum chunk_size of 20

D. The text variable is too short to split

Solution

Step 1: Check parameter types
chunk_size is given as a string '10' instead of an integer 10, which causes a type error.
Step 2: Validate other options
Overlap 3 is valid; no minimum chunk size of 20 is required; text length is sufficient.
Final Answer:
chunk_size should be an integer, not a string -> Option A
Quick Check:
Parameter type mismatch = chunk_size should be an integer, not a string [OK]

Hint: Use numbers, not strings, for chunk size and overlap [OK]

Common Mistakes:

Passing chunk_size as string
Assuming overlap minimum is 5
Thinking text length causes error

5. You have a very long document and want to split it for an AI model that can only process 500 tokens at a time. You want some context overlap to keep meaning. Which approach best balances chunk size and overlap?

hard

A. Set chunk_size to 600 tokens and overlap to 0 tokens

B. Set chunk_size to 500 tokens and overlap to 100 tokens

C. Set chunk_size to 400 tokens and overlap to 200 tokens

D. Set chunk_size to 100 tokens and overlap to 50 tokens

Solution

Step 1: Understand model token limit and overlap purpose
The model can process 500 tokens max; overlap adds repeated context to help understanding.
Step 2: Evaluate options for chunk size and overlap
Set chunk_size to 500 tokens and overlap to 100 tokens uses chunk size 500 (max allowed) and overlap 100 (reasonable context). Others exceed limit or have too small chunks.
Final Answer:
Set chunk_size to 500 tokens and overlap to 100 tokens -> Option B
Quick Check:
Chunk size ≤ 500 with overlap for context = Set chunk_size to 500 tokens and overlap to 100 tokens [OK]

Hint: Keep chunk size at max limit, add moderate overlap [OK]

Common Mistakes:

Exceeding model token limit
Setting overlap too large or zero
Using very small chunk sizes unnecessarily

Text splitters in Prompt Engineering / GenAI - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of text splitters

Step 2: Compare options to the definition

Final Answer:

Quick Check:

Solution

Step 1: Identify correct parameter names and types

Step 2: Check each option for syntax and type correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand chunk size and overlap effect

Step 2: Apply splitting logic to the text

Final Answer:

Quick Check:

Solution

Step 1: Check parameter types

Step 2: Validate other options

Final Answer:

Quick Check:

Solution

Step 1: Understand model token limit and overlap purpose

Step 2: Evaluate options for chunk size and overlap

Final Answer:

Quick Check: