Bird
Raised Fist0
NLPml~10 mins

One-hot encoding for text in NLP - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a one-hot encoded vector for the word 'cat' using the vocabulary.

NLP
vocab = ['cat', 'dog', 'bird']
word = 'cat'
one_hot = [1 if w == [1] else 0 for w in vocab]
Drag options to blanks, or click blank then click option'
Aword
B'cat'
C'dog'
D'bird'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the variable 'word' instead of the string 'cat' inside the list comprehension.
Using a wrong string like 'dog' or 'bird' for comparison.
2fill in blank
medium

Complete the code to build a one-hot encoding dictionary for all words in the vocabulary.

NLP
vocab = ['apple', 'banana', 'cherry']
one_hot_dict = {word: [1 if w == [1] else 0 for w in vocab] for word in vocab}
Drag options to blanks, or click blank then click option'
Aword
B'apple'
C'banana'
D'cherry'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a fixed string like 'apple' instead of the variable 'word'.
Confusing the roles of 'word' and 'w' in the comprehension.
3fill in blank
hard

Fix the error in the code to correctly create a one-hot vector for the word 'dog'.

NLP
vocab = ['cat', 'dog', 'fish']
word = 'dog'
one_hot = [1 if w == [1] else 0 for w in vocab]
Drag options to blanks, or click blank then click option'
A'dog'
Bword
Cdog
D'cat'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the variable dog without quotes causing a NameError.
Using the variable 'word' without quotes when the code expects a string.
4fill in blank
hard

Fill both blanks to create a one-hot encoding dictionary for the vocabulary.

NLP
vocab = ['red', 'green', 'blue']
one_hot_dict = {word: [1 if w == [1] else 0 for w in [2]] for word in vocab}
Drag options to blanks, or click blank then click option'
Aword
Bvocab
Cwords
Dcolors
Attempts:
3 left
💡 Hint
Common Mistakes
Using undefined variables like 'words' or 'colors' instead of 'vocab'.
Mixing up the blanks by putting 'vocab' in the first blank.
5fill in blank
hard

Fill all three blanks to create a one-hot encoding dictionary and print the vector for 'blue'.

NLP
vocab = ['red', 'green', 'blue']
one_hot_dict = { [1]: [1 if w == [2] else 0 for w in [3] ] for word in vocab }
print(one_hot_dict['blue'])
Drag options to blanks, or click blank then click option'
Aword
Cvocab
Dw
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'w' as the dictionary key instead of 'word'.
Using 'w' in the comparison instead of 'word'.
Iterating over an undefined variable instead of 'vocab'.

Practice

(1/5)
1. What does one-hot encoding do to words in text processing?
easy
A. Converts each word into a vector with one 1 and rest 0s
B. Replaces words with their synonyms
C. Counts the number of letters in each word
D. Sorts words alphabetically

Solution

  1. Step 1: Understand one-hot encoding concept

    One-hot encoding creates a vector for each word where only one position is 1 and all others are 0.
  2. Step 2: Compare options with definition

    Only Converts each word into a vector with one 1 and rest 0s matches this definition exactly.
  3. Final Answer:

    Converts each word into a vector with one 1 and rest 0s -> Option A
  4. Quick Check:

    One-hot encoding = vector with single 1 [OK]
Hint: One-hot means one 1 in vector, rest zeros [OK]
Common Mistakes:
  • Thinking it replaces words with synonyms
  • Confusing with counting letters
  • Assuming it sorts words
2. Which of the following is the correct Python syntax to create a one-hot vector for the word 'cat' from vocabulary ['cat', 'dog', 'bird']?
easy
A. one_hot = [0, 0, 1]
B. one_hot = [0, 1, 0]
C. one_hot = [1, 1, 0]
D. one_hot = [1, 0, 0]

Solution

  1. Step 1: Identify the index of 'cat' in vocabulary

    'cat' is at index 0 in ['cat', 'dog', 'bird'].
  2. Step 2: Create one-hot vector with 1 at index 0

    The vector should have 1 at position 0 and 0 elsewhere: [1, 0, 0].
  3. Final Answer:

    [1, 0, 0] -> Option D
  4. Quick Check:

    Index 0 gets 1 in one-hot vector [OK]
Hint: Index of word = position of 1 in vector [OK]
Common Mistakes:
  • Putting 1 in wrong index
  • Using multiple 1s in vector
  • Confusing word order in vocabulary
3. What will be the output of this Python code?
vocab = ['apple', 'banana', 'cherry']
word = 'banana'
one_hot = [1 if w == word else 0 for w in vocab]
print(one_hot)
medium
A. [1, 0, 0]
B. [0, 1, 0]
C. [0, 0, 1]
D. [1, 1, 0]

Solution

  1. Step 1: Understand list comprehension logic

    For each word in vocab, put 1 if it matches 'banana', else 0.
  2. Step 2: Apply to vocab list

    'apple' != 'banana' -> 0, 'banana' == 'banana' -> 1, 'cherry' != 'banana' -> 0, so [0, 1, 0].
  3. Final Answer:

    [0, 1, 0] -> Option B
  4. Quick Check:

    Only 'banana' gets 1 in vector [OK]
Hint: Check which vocab word equals target word [OK]
Common Mistakes:
  • Mixing up word positions
  • Using 1 for all words
  • Misreading list comprehension
4. Identify the error in this one-hot encoding code snippet:
vocab = ['red', 'green', 'blue']
word = 'green'
one_hot = [0 if w == word else 1 for w in vocab]
print(one_hot)
medium
A. The list comprehension syntax is invalid
B. The vocabulary list is missing a word
C. The condition is reversed; it should assign 1 when words match
D. The print statement syntax is incorrect

Solution

  1. Step 1: Analyze the list comprehension condition

    It assigns 0 if word matches, else 1, which is opposite of one-hot logic.
  2. Step 2: Correct logic for one-hot encoding

    One-hot should assign 1 when words match and 0 otherwise.
  3. Final Answer:

    The condition is reversed; it should assign 1 when words match -> Option C
  4. Quick Check:

    Match word -> 1, else 0 [OK]
Hint: One-hot sets 1 for match, not 0 [OK]
Common Mistakes:
  • Reversing 0 and 1 in condition
  • Assuming syntax error instead of logic error
  • Ignoring correct vocabulary
5. Given a vocabulary ['sun', 'moon', 'star'] and a sentence 'moon star sun star', which one-hot encoded matrix correctly represents the sentence?
hard
A. [[0,1,0],[0,0,1],[1,0,0],[0,0,1]]
B. [[1,0,0],[0,1,0],[0,0,1],[0,1,0]]
C. [[0,0,1],[1,0,0],[0,1,0],[1,0,0]]
D. [[1,1,0],[0,0,1],[1,0,0],[0,0,1]]

Solution

  1. Step 1: Map each word to its one-hot vector

    Vocabulary indices: 'sun'->0, 'moon'->1, 'star'->2. So 'moon'=[0,1,0], 'star'=[0,0,1], 'sun'=[1,0,0].
  2. Step 2: Encode sentence words in order

    Sentence words: 'moon' -> [0,1,0], 'star' -> [0,0,1], 'sun' -> [1,0,0], 'star' -> [0,0,1].
  3. Final Answer:

    [[0,1,0],[0,0,1],[1,0,0],[0,0,1]] -> Option A
  4. Quick Check:

    Each word vector matches vocab index [OK]
Hint: Match word order and vocab index for vectors [OK]
Common Mistakes:
  • Mixing word order in sentence
  • Swapping indices of words
  • Using vectors with multiple 1s