0
0
Data Analysis Pythondata~20 mins

Tokenization basics in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Tokenization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of simple whitespace tokenization
What is the output of this Python code that splits a sentence into tokens using whitespace?
Data Analysis Python
sentence = "Data science is fun and exciting"
tokens = sentence.split()
print(tokens)
A['Data', 'science', 'is', 'fun', 'and', 'exciting']
B['Data,', 'science,', 'is,', 'fun,', 'and,', 'exciting']
C['Data science is fun and exciting']
D['Data', 'science', 'is', 'fun', 'and']
Attempts:
2 left
💡 Hint
Remember that split() without arguments splits on spaces.
data_output
intermediate
2:00remaining
Tokenizing with punctuation removal
Given the code below, what is the resulting list of tokens after removing punctuation and splitting?
Data Analysis Python
import string
text = "Hello, world! Let's learn tokenization."
tokens = text.translate(str.maketrans('', '', string.punctuation)).split()
print(tokens)
A['Hello', 'world', 'Lets', 'learn', 'tokenization']
B['Hello,', 'world!', "Let's", 'learn', 'tokenization.']
C['Hello', 'world!', "Let's", 'learn', 'tokenization']
D['Hello', 'world', "Let's", 'learn', 'tokenization']
Attempts:
2 left
💡 Hint
Removing punctuation means no commas or apostrophes remain.
visualization
advanced
3:00remaining
Visualizing token frequency distribution
Which option shows the correct bar chart of token frequencies from the given text?
Data Analysis Python
from collections import Counter
import matplotlib.pyplot as plt
text = "apple banana apple orange banana apple"
tokens = text.split()
counter = Counter(tokens)
plt.bar(counter.keys(), counter.values())
plt.show()
ABar chart with all tokens equal height
BBar chart with 'banana' highest, then 'apple', then 'orange'
CBar chart with 'orange' highest, then 'banana', then 'apple'
DBar chart with 'apple' highest, then 'banana', then 'orange'
Attempts:
2 left
💡 Hint
Count how many times each word appears in the text.
🧠 Conceptual
advanced
2:00remaining
Understanding tokenization challenges
Which option best describes a common challenge in tokenization for natural language processing?
ARemoving all stopwords before tokenization
BHandling contractions and punctuation correctly to avoid losing meaning
CConverting all tokens to uppercase before analysis
DSorting tokens alphabetically after splitting
Attempts:
2 left
💡 Hint
Think about how words like "don't" or "it's" are split.
🔧 Debug
expert
2:00remaining
Identify the error in tokenization code
What error does this code produce when trying to tokenize text by splitting on spaces?
Data Analysis Python
text = None
tokens = text.split()
print(tokens)
ASyntaxError: invalid syntax
BTypeError: split() missing 1 required positional argument
CAttributeError: 'NoneType' object has no attribute 'split'
DValueError: empty string cannot be split
Attempts:
2 left
💡 Hint
Check the type of the variable before calling split.