0
0
LangChainframework~10 mins

Token-based splitting in LangChain - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the TokenTextSplitter from langchain.text_splitter.

LangChain
from langchain.text_splitter import [1]
Drag options to blanks, or click blank then click option'
ATokenTextSplitter
BCharacterTextSplitter
CRecursiveCharacterTextSplitter
DSentenceSplitter
Attempts:
3 left
💡 Hint
Common Mistakes
Importing CharacterTextSplitter instead of TokenTextSplitter
Using incorrect class names
Forgetting to import from langchain.text_splitter
2fill in blank
medium

Complete the code to create a TokenTextSplitter instance with chunk size 1000.

LangChain
splitter = TokenTextSplitter(chunk_size=[1])
Drag options to blanks, or click blank then click option'
A2000
B1000
C500
D50
Attempts:
3 left
💡 Hint
Common Mistakes
Using too small chunk size like 50
Using chunk size larger than 2000 without reason
Confusing chunk size with chunk overlap
3fill in blank
hard

Fix the error in the code to split text using the TokenTextSplitter instance.

LangChain
chunks = splitter.[1](text)
Drag options to blanks, or click blank then click option'
Atokenize
Bsplit
Csplit_text
Dchunk_text
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'split' instead of 'split_text'
Using 'tokenize' which only tokenizes but does not split
Using non-existent methods like 'chunk_text'
4fill in blank
hard

Fill both blanks to create a TokenTextSplitter with chunk size 500 and chunk overlap 50.

LangChain
splitter = TokenTextSplitter(chunk_size=[1], chunk_overlap=[2])
Drag options to blanks, or click blank then click option'
A500
B100
C50
D0
Attempts:
3 left
💡 Hint
Common Mistakes
Setting overlap larger than chunk size
Using zero overlap when some overlap is needed
Confusing chunk size and overlap values
5fill in blank
hard

Fill all three blanks to create a TokenTextSplitter with chunk size 800, chunk overlap 100, and a custom encoding 'gpt2'.

LangChain
splitter = TokenTextSplitter(chunk_size=[1], chunk_overlap=[2], encoding_name=[3])
Drag options to blanks, or click blank then click option'
A1000
B100
C'gpt2'
D800
Attempts:
3 left
💡 Hint
Common Mistakes
Using encoding name without quotes
Mixing up chunk size and overlap values
Using unsupported encoding names