0
0
Prompt Engineering / GenAIml~10 mins

Tokenization and vocabulary in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to split the sentence into words using whitespace.

Prompt Engineering / GenAI
tokens = sentence.[1]()
Drag options to blanks, or click blank then click option'
Astrip
Bjoin
Csplit
Dreplace
Attempts:
3 left
💡 Hint
Common Mistakes
Using join() which combines words instead of splitting.
Using replace() which changes characters but does not split.
Using strip() which removes spaces only at the ends.
2fill in blank
medium

Complete the code to create a vocabulary set from the list of tokens.

Prompt Engineering / GenAI
vocab = set([1])
Drag options to blanks, or click blank then click option'
Asentence
Btokens
Cvocab
Dwords
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the original sentence string instead of the token list.
Passing the vocabulary variable itself which is not defined yet.
Passing an undefined variable like words.
3fill in blank
hard

Fix the error in the code to count the frequency of each token.

Prompt Engineering / GenAI
freq = {}
for token in tokens:
    freq[token] = freq.get([1], 0) + 1
Drag options to blanks, or click blank then click option'
Atoken
Btokens
Cfreq
Dcount
Attempts:
3 left
💡 Hint
Common Mistakes
Using the whole list tokens as the key.
Using the dictionary freq as the key.
Using an undefined variable like count.
4fill in blank
hard

Fill both blanks to create a dictionary of token lengths for tokens longer than 3 characters.

Prompt Engineering / GenAI
lengths = {token: [1] for token in tokens if len(token) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(token)
B>
C<
Dtoken
Attempts:
3 left
💡 Hint
Common Mistakes
Using the token itself as the value instead of its length.
Using less than (<) instead of greater than (>) in the condition.
5fill in blank
hard

Fill all three blanks to create a frequency dictionary for tokens longer than 2 characters.

Prompt Engineering / GenAI
freq_filtered = {token: [1] for token in tokens if len(token) [2] 2 and token in [3]
Drag options to blanks, or click blank then click option'
Afreq[token]
B>
Cvocab
Dtokens
Attempts:
3 left
💡 Hint
Common Mistakes
Using the list of tokens instead of the vocabulary set for membership check.
Using less than (<) instead of greater than (>) in the length condition.
Using the token itself instead of its frequency as the value.