Recall & Review
beginner
What is one-hot encoding in the context of text data?
One-hot encoding is a way to turn words into numbers by making a list where each word is represented by a vector with all zeros except for a single one in the position unique to that word.
Click to reveal answer
beginner
Why do we use one-hot encoding for text in machine learning?
We use one-hot encoding to convert text into a format that computers can understand and work with, turning words into numbers without implying any order or similarity between them.
Click to reveal answer
intermediate
What is a limitation of one-hot encoding for text?
One limitation is that one-hot encoding creates very large and sparse vectors when the vocabulary is big, which can be inefficient and does not capture the meaning or relationships between words.
Click to reveal answer
intermediate
How does one-hot encoding handle new words not seen during training?
New words not in the original vocabulary cannot be represented directly by one-hot encoding, so they are often ignored or replaced with a special 'unknown' token vector.
Click to reveal answer
beginner
Example: If the vocabulary is ['cat', 'dog', 'bird'], what is the one-hot vector for 'dog'?
The one-hot vector for 'dog' is [0, 1, 0] because 'dog' is the second word in the vocabulary list, so the second position is 1 and others are 0.
Click to reveal answer
What does one-hot encoding do to a word in text data?
✗ Incorrect
One-hot encoding creates a vector where only one position is 1, representing the word uniquely.
Which problem can happen with one-hot encoding when vocabulary is very large?
✗ Incorrect
One-hot vectors grow in size with vocabulary and mostly contain zeros, making them sparse and large.
How does one-hot encoding treat the relationship between words?
✗ Incorrect
One-hot encoding treats each word independently without showing any relationship or similarity.
If the vocabulary is ['apple', 'banana', 'cherry'], what is the one-hot vector for 'cherry'?
✗ Incorrect
'cherry' is the third word, so the third position is 1 and others are 0.
What happens if a new word not in the vocabulary appears during testing?
✗ Incorrect
New words not in the vocabulary are usually replaced with a special 'unknown' token vector.
Explain what one-hot encoding is and why it is used for text data in machine learning.
Think about how computers need numbers instead of words.
You got /4 concepts.
Describe one limitation of one-hot encoding and how it affects text processing.
Consider what happens when you have many words.
You got /3 concepts.