0
0
NLPml~20 mins

One-hot encoding for text in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
One-Hot Encoding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of one-hot encoding a small text corpus
What is the output of the following code that one-hot encodes a list of words?
NLP
from sklearn.preprocessing import OneHotEncoder
import numpy as np

words = np.array([['cat'], ['dog'], ['cat'], ['bird']])
encoder = OneHotEncoder(sparse=False)
encoded = encoder.fit_transform(words)
print(encoded)
A
[[0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
B
[[1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]
C
[[0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
D
[[1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]]
Attempts:
2 left
💡 Hint
Remember that OneHotEncoder assigns columns in alphabetical order of unique words.
🧠 Conceptual
intermediate
1:30remaining
Understanding one-hot encoding vocabulary size
If you one-hot encode a text corpus with 10,000 unique words, what will be the size of each one-hot vector?
AA vector of length 1 with the index of the word
B10,000 elements with multiple elements set to 1 depending on word frequency
CA vector of length equal to the number of words in the sentence
D10,000 elements with exactly one element set to 1 and the rest 0
Attempts:
2 left
💡 Hint
One-hot encoding creates a vector with one position for each unique word.
Hyperparameter
advanced
1:30remaining
Choosing one-hot encoding parameters for text data
Which parameter of sklearn's OneHotEncoder controls whether the output is a sparse matrix or a dense array?
Asparse
Bhandle_unknown
Ccategories
Ddrop
Attempts:
2 left
💡 Hint
This parameter decides the output format to save memory or not.
Metrics
advanced
1:30remaining
Evaluating one-hot encoded text input for a classification model
You trained a classifier on one-hot encoded text data. Which metric best measures how well the model predicts the correct class labels?
APerplexity
BAccuracy
CMean Squared Error
DSilhouette Score
Attempts:
2 left
💡 Hint
Think about classification performance metrics.
🔧 Debug
expert
2:00remaining
Debugging one-hot encoding with unseen words during inference
You trained a OneHotEncoder on a training set and saved it. At inference, you try to transform new text containing words not seen during training. What error will sklearn's OneHotEncoder raise by default?
AKeyError: word not found in vocabulary
BTypeError: unsupported operand type(s)
CValueError: Found unknown categories during transform
DIndexError: index out of range
Attempts:
2 left
💡 Hint
Check how OneHotEncoder handles unknown categories by default.