Bird
Raised Fist0
Prompt Engineering / GenAIml~10 mins

Text splitters in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to split the text into words using spaces.

Prompt Engineering / GenAI
words = text[1](" ")
Drag options to blanks, or click blank then click option'
A.replace
B.split
C.join
D.strip
Attempts:
3 left
💡 Hint
Common Mistakes
Using join instead of split
Forgetting to specify the separator
2fill in blank
medium

Complete the code to split the text into sentences using the period character.

Prompt Engineering / GenAI
sentences = text[1](".")
Drag options to blanks, or click blank then click option'
A.join
B.replace
C.strip
D.split
Attempts:
3 left
💡 Hint
Common Mistakes
Using join instead of split
Using replace which changes text but does not split
3fill in blank
hard

Fix the error in the code to split text by commas.

Prompt Engineering / GenAI
parts = text[1](",")
Drag options to blanks, or click blank then click option'
A.split
B.join
C.replace
D.strip
Attempts:
3 left
💡 Hint
Common Mistakes
Using join which combines text
Using replace which changes text but does not split
4fill in blank
hard

Fill both blanks to create a dictionary with words as keys and their lengths as values, only for words longer than 3 letters.

Prompt Engineering / GenAI
lengths = {word[1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
A: len(word)
B>
C<
D==
Attempts:
3 left
💡 Hint
Common Mistakes
Using < instead of >
Forgetting the colon between key and value
5fill in blank
hard

Fill all three blanks to create a dictionary with uppercase words as keys, their counts as values, only for words appearing more than once.

Prompt Engineering / GenAI
result = [1]: [2] for [3], count in word_counts.items() if count > 1}
Drag options to blanks, or click blank then click option'
Aword.upper()
Bcount
Cword
Dcount.upper()
Attempts:
3 left
💡 Hint
Common Mistakes
Using count.upper() which is invalid
Swapping key and value positions

Practice

(1/5)
1. What is the main purpose of a text splitter in AI applications?
easy
A. To translate text into different languages
B. To generate new text from a prompt
C. To break long text into smaller, manageable pieces
D. To summarize text into a single sentence

Solution

  1. Step 1: Understand the role of text splitters

    Text splitters are designed to divide long text into smaller parts for easier processing.
  2. Step 2: Compare options to the definition

    Only To break long text into smaller, manageable pieces describes breaking text into smaller pieces, which matches the purpose of text splitters.
  3. Final Answer:

    To break long text into smaller, manageable pieces -> Option C
  4. Quick Check:

    Text splitter purpose = break text [OK]
Hint: Text splitters cut text into chunks for easier handling [OK]
Common Mistakes:
  • Confusing splitting with translation
  • Thinking splitters summarize text
  • Assuming splitters generate new text
2. Which of the following is the correct way to set chunk size and overlap in a text splitter?
easy
A. chunk_size='100', overlap=20
B. chunkSize=100, overlap=20
C. chunk_size=100, overlap=twenty
D. chunk_size=100, overlap=20

Solution

  1. Step 1: Identify correct parameter names and types

    Parameters should be named with underscores and numeric values for size and overlap.
  2. Step 2: Check each option for syntax and type correctness

    chunk_size=100, overlap=20 uses correct parameter names and numeric values; others have wrong names or types.
  3. Final Answer:

    chunk_size=100, overlap=20 -> Option D
  4. Quick Check:

    Correct param names and numeric values = chunk_size=100, overlap=20 [OK]
Hint: Use underscores and numbers for chunk size and overlap [OK]
Common Mistakes:
  • Using camelCase instead of snake_case
  • Passing string instead of number for overlap
  • Misspelling parameter names
3. Given the following Python code using a text splitter:
text = "Hello world! This is a test of text splitting."
chunk_size = 12
overlap = 4
splitter = TextSplitter(chunk_size=chunk_size, overlap=overlap)
chunks = splitter.split(text)
print(chunks)

What is the expected output?
medium
A. ["Hello world!", "world! This is", "This is a test", "a test of text", "of text splitting."]
B. ["Hello world! This", "This is a test of", "test of text splitting."]
C. ["Hello world! This is a test of text splitting."]
D. ["Hello", "world!", "This", "is", "a", "test", "of", "text", "splitting."]

Solution

  1. Step 1: Understand chunk size and overlap effect

    Chunk size 12 means each piece has up to 12 characters; overlap 4 means next chunk starts 4 characters before previous ends.
  2. Step 2: Apply splitting logic to the text

    Chunks are: "Hello world!" (12 chars), then start 8 chars in (12-4=8) at "world! This is", and so on, producing the listed chunks in ["Hello world!", "world! This is", "This is a test", "a test of text", "of text splitting."].
  3. Final Answer:

    ["Hello world!", "world! This is", "This is a test", "a test of text", "of text splitting."] -> Option A
  4. Quick Check:

    Chunk size 12 + overlap 4 = overlapping chunks [OK]
Hint: Chunk size limits length; overlap repeats last part [OK]
Common Mistakes:
  • Ignoring overlap and making chunks non-overlapping
  • Using wrong chunk sizes
  • Returning entire text as one chunk
4. Consider this code snippet that tries to split text but raises an error:
text = "Sample text for splitting."
splitter = TextSplitter(chunk_size='10', overlap=3)
chunks = splitter.split(text)

What is the most likely cause of the error?
medium
A. chunk_size should be an integer, not a string
B. overlap cannot be less than 5
C. TextSplitter requires a minimum chunk_size of 20
D. The text variable is too short to split

Solution

  1. Step 1: Check parameter types

    chunk_size is given as a string '10' instead of an integer 10, which causes a type error.
  2. Step 2: Validate other options

    Overlap 3 is valid; no minimum chunk size of 20 is required; text length is sufficient.
  3. Final Answer:

    chunk_size should be an integer, not a string -> Option A
  4. Quick Check:

    Parameter type mismatch = chunk_size should be an integer, not a string [OK]
Hint: Use numbers, not strings, for chunk size and overlap [OK]
Common Mistakes:
  • Passing chunk_size as string
  • Assuming overlap minimum is 5
  • Thinking text length causes error
5. You have a very long document and want to split it for an AI model that can only process 500 tokens at a time. You want some context overlap to keep meaning. Which approach best balances chunk size and overlap?
hard
A. Set chunk_size to 600 tokens and overlap to 0 tokens
B. Set chunk_size to 500 tokens and overlap to 100 tokens
C. Set chunk_size to 400 tokens and overlap to 200 tokens
D. Set chunk_size to 100 tokens and overlap to 50 tokens

Solution

  1. Step 1: Understand model token limit and overlap purpose

    The model can process 500 tokens max; overlap adds repeated context to help understanding.
  2. Step 2: Evaluate options for chunk size and overlap

    Set chunk_size to 500 tokens and overlap to 100 tokens uses chunk size 500 (max allowed) and overlap 100 (reasonable context). Others exceed limit or have too small chunks.
  3. Final Answer:

    Set chunk_size to 500 tokens and overlap to 100 tokens -> Option B
  4. Quick Check:

    Chunk size ≤ 500 with overlap for context = Set chunk_size to 500 tokens and overlap to 100 tokens [OK]
Hint: Keep chunk size at max limit, add moderate overlap [OK]
Common Mistakes:
  • Exceeding model token limit
  • Setting overlap too large or zero
  • Using very small chunk sizes unnecessarily