0
0
Agentic AIml~20 mins

Document loading and chunking strategies in Agentic AI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Document Chunking Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Why chunk documents before processing?

Imagine you have a very long document to analyze with an AI model. Why is it useful to split this document into smaller chunks before processing?

ABecause chunking removes important information from the document.
BBecause chunking increases the total size of the document for better accuracy.
CBecause smaller chunks reduce memory use and help the model focus on manageable parts.
DBecause chunking merges multiple documents into one large file.
Attempts:
2 left
💡 Hint

Think about how computers handle large amounts of data and model input limits.

Predict Output
intermediate
1:30remaining
Output of chunking code snippet

What is the output of this Python code that chunks a text into pieces of 5 words?

Agentic AI
text = 'Machine learning helps computers learn from data and improve over time'
words = text.split()
chunks = [' '.join(words[i:i+5]) for i in range(0, len(words), 5)]
print(chunks)
A['Machine learning helps computers learn from', 'data and improve over time']
B['Machine learning helps computers learn', 'from data and improve over', 'time']
C['Machine learning helps computers', 'learn from data and improve', 'over time']
D['Machine learning helps', 'computers learn from', 'data and improve', 'over time']
Attempts:
2 left
💡 Hint

Look at how the range steps by 5 and how words are joined.

Model Choice
advanced
2:00remaining
Choosing chunk size for embedding models

You want to create vector embeddings from a large document for a search system. Which chunk size strategy is best to balance context and model limits?

AUse moderate chunk sizes (100-300 words) to balance context and model input limits.
BUse very small chunks (1-5 words) to get detailed embeddings.
CUse very large chunks (thousands of words) to keep full context.
DDo not chunk; embed the entire document at once.
Attempts:
2 left
💡 Hint

Consider model input size limits and the need for meaningful context.

Metrics
advanced
1:30remaining
Evaluating chunking impact on retrieval accuracy

You test two chunking strategies for document search: small chunks (50 words) and large chunks (500 words). Which metric would best show if chunking size affects search accuracy?

ARecall@k measuring how many relevant documents are retrieved in top k results.
BNumber of chunks created from the document.
CTraining loss of the embedding model.
DMean Squared Error (MSE) between chunk lengths.
Attempts:
2 left
💡 Hint

Think about how to measure search quality and relevance.

🔧 Debug
expert
2:00remaining
Debugging chunk overlap code error

What error does this code raise when trying to create overlapping chunks of size 4 with step 2?

text = 'AI models learn patterns from data to make predictions'
words = text.split()
chunks = [words[i:i+4] for i in range(0, len(words), 2)]
print(chunks[10])
ATypeError because words is not iterable.
BNo error; prints the 11th chunk correctly.
CSyntaxError due to missing colon in list comprehension.
DIndexError because chunks[10] does not exist.
Attempts:
2 left
💡 Hint

Check how many chunks are created and if index 10 is valid.