Challenge - 5 Problems
Tokenization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of simple whitespace tokenization
What is the output of this Python code that splits a sentence into tokens using whitespace?
Data Analysis Python
sentence = "Data science is fun and exciting" tokens = sentence.split() print(tokens)
Attempts:
2 left
💡 Hint
Remember that split() without arguments splits on spaces.
✗ Incorrect
The split() method without arguments splits the string at each whitespace, producing a list of words.
❓ data_output
intermediate2:00remaining
Tokenizing with punctuation removal
Given the code below, what is the resulting list of tokens after removing punctuation and splitting?
Data Analysis Python
import string text = "Hello, world! Let's learn tokenization." tokens = text.translate(str.maketrans('', '', string.punctuation)).split() print(tokens)
Attempts:
2 left
💡 Hint
Removing punctuation means no commas or apostrophes remain.
✗ Incorrect
The translate method removes all punctuation characters, so words like "Let's" become "Lets" before splitting.
❓ visualization
advanced3:00remaining
Visualizing token frequency distribution
Which option shows the correct bar chart of token frequencies from the given text?
Data Analysis Python
from collections import Counter import matplotlib.pyplot as plt text = "apple banana apple orange banana apple" tokens = text.split() counter = Counter(tokens) plt.bar(counter.keys(), counter.values()) plt.show()
Attempts:
2 left
💡 Hint
Count how many times each word appears in the text.
✗ Incorrect
The word 'apple' appears 3 times, 'banana' 2 times, and 'orange' once, so the bar heights reflect this order.
🧠 Conceptual
advanced2:00remaining
Understanding tokenization challenges
Which option best describes a common challenge in tokenization for natural language processing?
Attempts:
2 left
💡 Hint
Think about how words like "don't" or "it's" are split.
✗ Incorrect
Contractions and punctuation can change word meaning if not handled properly during tokenization.
🔧 Debug
expert2:00remaining
Identify the error in tokenization code
What error does this code produce when trying to tokenize text by splitting on spaces?
Data Analysis Python
text = None tokens = text.split() print(tokens)
Attempts:
2 left
💡 Hint
Check the type of the variable before calling split.
✗ Incorrect
Since text is None, it has no split method, causing an AttributeError.