Bird
Raised Fist0
Agentic AIml~20 mins

Document loading and chunking strategies in Agentic AI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Document Chunking Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Why chunk documents before processing?

Imagine you have a very long document to analyze with an AI model. Why is it useful to split this document into smaller chunks before processing?

ABecause chunking removes important information from the document.
BBecause chunking increases the total size of the document for better accuracy.
CBecause smaller chunks reduce memory use and help the model focus on manageable parts.
DBecause chunking merges multiple documents into one large file.
Attempts:
2 left
💡 Hint

Think about how computers handle large amounts of data and model input limits.

Predict Output
intermediate
1:30remaining
Output of chunking code snippet

What is the output of this Python code that chunks a text into pieces of 5 words?

Agentic AI
text = 'Machine learning helps computers learn from data and improve over time'
words = text.split()
chunks = [' '.join(words[i:i+5]) for i in range(0, len(words), 5)]
print(chunks)
A['Machine learning helps computers learn from', 'data and improve over time']
B['Machine learning helps computers learn', 'from data and improve over', 'time']
C['Machine learning helps computers', 'learn from data and improve', 'over time']
D['Machine learning helps', 'computers learn from', 'data and improve', 'over time']
Attempts:
2 left
💡 Hint

Look at how the range steps by 5 and how words are joined.

Model Choice
advanced
2:00remaining
Choosing chunk size for embedding models

You want to create vector embeddings from a large document for a search system. Which chunk size strategy is best to balance context and model limits?

AUse moderate chunk sizes (100-300 words) to balance context and model input limits.
BUse very small chunks (1-5 words) to get detailed embeddings.
CUse very large chunks (thousands of words) to keep full context.
DDo not chunk; embed the entire document at once.
Attempts:
2 left
💡 Hint

Consider model input size limits and the need for meaningful context.

Metrics
advanced
1:30remaining
Evaluating chunking impact on retrieval accuracy

You test two chunking strategies for document search: small chunks (50 words) and large chunks (500 words). Which metric would best show if chunking size affects search accuracy?

ARecall@k measuring how many relevant documents are retrieved in top k results.
BNumber of chunks created from the document.
CTraining loss of the embedding model.
DMean Squared Error (MSE) between chunk lengths.
Attempts:
2 left
💡 Hint

Think about how to measure search quality and relevance.

🔧 Debug
expert
2:00remaining
Debugging chunk overlap code error

What error does this code raise when trying to create overlapping chunks of size 4 with step 2?

text = 'AI models learn patterns from data to make predictions'
words = text.split()
chunks = [words[i:i+4] for i in range(0, len(words), 2)]
print(chunks[10])
ATypeError because words is not iterable.
BNo error; prints the 11th chunk correctly.
CSyntaxError due to missing colon in list comprehension.
DIndexError because chunks[10] does not exist.
Attempts:
2 left
💡 Hint

Check how many chunks are created and if index 10 is valid.

Practice

(1/5)
1. What is the main purpose of chunking in document loading for AI?
easy
A. To translate documents into different languages
B. To combine multiple documents into one large file
C. To break large documents into smaller, manageable pieces
D. To remove all punctuation from the text

Solution

  1. Step 1: Understand chunking concept

    Chunking means splitting big documents into smaller parts so AI can handle them easily.
  2. Step 2: Identify the main goal

    The goal is to make documents manageable, not to combine or translate them.
  3. Final Answer:

    To break large documents into smaller, manageable pieces -> Option C
  4. Quick Check:

    Chunking = breaking big documents [OK]
Hint: Chunking means splitting big text into small parts [OK]
Common Mistakes:
  • Thinking chunking combines documents
  • Confusing chunking with translation
  • Assuming chunking removes punctuation
2. Which of the following is the correct way to specify chunk size and overlap in a document loader?
easy
A. loader.load(size=500, overlap=50)
B. loader.load(chunk_size=500, overlap=50)
C. loader.load(chunk=500, overlap=50)
D. loader.load(chunk_size=50, overlap=500)

Solution

  1. Step 1: Check parameter names

    The standard parameters are usually named chunk_size and overlap.
  2. Step 2: Verify values make sense

    Chunk size should be larger than overlap, so 500 and 50 is logical.
  3. Final Answer:

    <code>loader.load(chunk_size=500, overlap=50)</code> -> Option B
  4. Quick Check:

    Correct params = chunk_size and overlap [OK]
Hint: Chunk size param is chunk_size, overlap param is overlap [OK]
Common Mistakes:
  • Using wrong parameter names like size or chunk
  • Swapping chunk size and overlap values
  • Using overlap larger than chunk size
3. Given this code snippet:
chunks = loader.load(chunk_size=100, overlap=20)
print(len(chunks))

If the original document has 250 characters, what will be the output?
medium
A. 4
B. 3
C. 2
D. 5

Solution

  1. Step 1: Calculate chunk positions

    Chunks start every (chunk_size - overlap) = 80 characters: positions 0, 80, 160, 240.
  2. Step 2: Count chunks covering 250 characters

    Chunks at 0, 80, 160, and 240 cover the document. The last chunk at 240 covers 240-340, overlapping document end.
  3. Final Answer:

    4 -> Option A
  4. Quick Check:

    Chunks = ceil((250 - overlap) / (chunk_size - overlap)) = ceil((250 - 20) / 80) = ceil(230 / 80) = 3, but since the last chunk starts at 240, total chunks = 4 [OK]
Hint: Chunks start every chunk_size - overlap characters [OK]
Common Mistakes:
  • Ignoring overlap when counting chunks
  • Assuming chunks equal document length divided by chunk size
  • Not counting last partial chunk
4. You wrote this code but get an error:
chunks = loader.load(chunk_size=100, overlap=150)

What is the likely cause?
medium
A. Chunk size must be zero or negative
B. Chunk size and overlap must be equal
C. Missing import statement for loader
D. Overlap is larger than chunk size, causing invalid chunking

Solution

  1. Step 1: Check parameter relationship

    Overlap cannot be larger than chunk size because chunks would overlap more than their length.
  2. Step 2: Identify error cause

    Setting overlap=150 with chunk_size=100 is invalid and causes error.
  3. Final Answer:

    Overlap is larger than chunk size, causing invalid chunking -> Option D
  4. Quick Check:

    Overlap <= chunk size [OK]
Hint: Overlap must be smaller or equal to chunk size [OK]
Common Mistakes:
  • Setting overlap larger than chunk size
  • Assuming chunk size can be zero
  • Ignoring parameter constraints
5. You want to load a very long document for an AI model that understands context well but has a token limit of 512. Which chunking strategy is best?
hard
A. Use chunk size 256 with overlap 128 to keep context between chunks
B. Use chunk size 100 with overlap 0 to create many small chunks
C. Use chunk size 512 with zero overlap to maximize chunk length
D. Use chunk size 600 with overlap 100 to exceed token limit

Solution

  1. Step 1: Consider model token limit

    Model can handle max 512 tokens, so chunk size must be ≤512.
  2. Step 2: Choose overlap for context

    Overlap keeps context between chunks; 128 overlap with 256 chunk size balances size and context.
  3. Step 3: Evaluate other options

    Zero overlap loses context; chunk size >512 exceeds limit; very small chunks increase overhead.
  4. Final Answer:

    Use chunk size 256 with overlap 128 to keep context between chunks -> Option A
  5. Quick Check:

    Chunk size ≤ token limit + overlap for context [OK]
Hint: Balance chunk size and overlap to fit token limit and context [OK]
Common Mistakes:
  • Ignoring token limit and using too large chunks
  • Using zero overlap losing context
  • Choosing too small chunks causing inefficiency