Practice

(1/5)

1. What is the main purpose of chunking in document loading for AI?

easy

A. To translate documents into different languages

B. To combine multiple documents into one large file

C. To break large documents into smaller, manageable pieces

D. To remove all punctuation from the text

Solution

Step 1: Understand chunking concept
Chunking means splitting big documents into smaller parts so AI can handle them easily.
Step 2: Identify the main goal
The goal is to make documents manageable, not to combine or translate them.
Final Answer:
To break large documents into smaller, manageable pieces -> Option C
Quick Check:
Chunking = breaking big documents [OK]

Hint: Chunking means splitting big text into small parts [OK]

Common Mistakes:

Thinking chunking combines documents
Confusing chunking with translation
Assuming chunking removes punctuation

2. Which of the following is the correct way to specify chunk size and overlap in a document loader?

easy

A. loader.load(size=500, overlap=50)

B. loader.load(chunk_size=500, overlap=50)

C. loader.load(chunk=500, overlap=50)

D. loader.load(chunk_size=50, overlap=500)

Solution

Step 1: Check parameter names
The standard parameters are usually named chunk_size and overlap.
Step 2: Verify values make sense
Chunk size should be larger than overlap, so 500 and 50 is logical.
Final Answer:
<code>loader.load(chunk_size=500, overlap=50)</code> -> Option B
Quick Check:
Correct params = chunk_size and overlap [OK]

Hint: Chunk size param is chunk_size, overlap param is overlap [OK]

Common Mistakes:

Using wrong parameter names like size or chunk
Swapping chunk size and overlap values
Using overlap larger than chunk size

3. Given this code snippet:

chunks = loader.load(chunk_size=100, overlap=20)
print(len(chunks))

If the original document has 250 characters, what will be the output?

medium

A. 4

B. 3

C. 2

D. 5

Solution

Step 1: Calculate chunk positions
Chunks start every (chunk_size - overlap) = 80 characters: positions 0, 80, 160, 240.
Step 2: Count chunks covering 250 characters
Chunks at 0, 80, 160, and 240 cover the document. The last chunk at 240 covers 240-340, overlapping document end.
Final Answer:
4 -> Option A
Quick Check:
Chunks = ceil((250 - overlap) / (chunk_size - overlap)) = ceil((250 - 20) / 80) = ceil(230 / 80) = 3, but since the last chunk starts at 240, total chunks = 4 [OK]

Hint: Chunks start every chunk_size - overlap characters [OK]

Common Mistakes:

Ignoring overlap when counting chunks
Assuming chunks equal document length divided by chunk size
Not counting last partial chunk

4. You wrote this code but get an error:

chunks = loader.load(chunk_size=100, overlap=150)

What is the likely cause?

medium

A. Chunk size must be zero or negative

B. Chunk size and overlap must be equal

C. Missing import statement for loader

D. Overlap is larger than chunk size, causing invalid chunking

Solution

Step 1: Check parameter relationship
Overlap cannot be larger than chunk size because chunks would overlap more than their length.
Step 2: Identify error cause
Setting overlap=150 with chunk_size=100 is invalid and causes error.
Final Answer:
Overlap is larger than chunk size, causing invalid chunking -> Option D
Quick Check:
Overlap <= chunk size [OK]

Hint: Overlap must be smaller or equal to chunk size [OK]

Common Mistakes:

Setting overlap larger than chunk size
Assuming chunk size can be zero
Ignoring parameter constraints

5. You want to load a very long document for an AI model that understands context well but has a token limit of 512. Which chunking strategy is best?

hard

A. Use chunk size 256 with overlap 128 to keep context between chunks

B. Use chunk size 100 with overlap 0 to create many small chunks

C. Use chunk size 512 with zero overlap to maximize chunk length

D. Use chunk size 600 with overlap 100 to exceed token limit

Solution

Step 1: Consider model token limit
Model can handle max 512 tokens, so chunk size must be ≤512.
Step 2: Choose overlap for context
Overlap keeps context between chunks; 128 overlap with 256 chunk size balances size and context.
Step 3: Evaluate other options
Zero overlap loses context; chunk size >512 exceeds limit; very small chunks increase overhead.
Final Answer:
Use chunk size 256 with overlap 128 to keep context between chunks -> Option A
Quick Check:
Chunk size ≤ token limit + overlap for context [OK]

Hint: Balance chunk size and overlap to fit token limit and context [OK]

Common Mistakes:

Ignoring token limit and using too large chunks
Using zero overlap losing context
Choosing too small chunks causing inefficiency

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.45	0.6	Initial training with raw chunks, model starts learning basic patterns.
2	0.3	0.75	Loss decreases as model better understands chunked text.
3	0.2	0.85	Model accuracy improves with clearer chunk boundaries.
4	0.15	0.9	Training converges, model effectively uses chunked data.

Document loading and chunking strategies in Agentic AI - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand chunking concept

Step 2: Identify the main goal

Final Answer:

Quick Check:

Solution

Step 1: Check parameter names

Step 2: Verify values make sense

Final Answer:

Quick Check:

Solution

Step 1: Calculate chunk positions

Step 2: Count chunks covering 250 characters

Final Answer:

Quick Check:

Solution

Step 1: Check parameter relationship

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Consider model token limit

Step 2: Choose overlap for context

Step 3: Evaluate other options

Final Answer:

Quick Check: