Recall & Review

beginner

What is one-hot encoding in the context of text data?

One-hot encoding is a way to turn words into numbers by making a list where each word is represented by a vector with all zeros except for a single one in the position unique to that word.

Click to reveal answer

beginner

Why do we use one-hot encoding for text in machine learning?

We use one-hot encoding to convert text into a format that computers can understand and work with, turning words into numbers without implying any order or similarity between them.

Click to reveal answer

intermediate

What is a limitation of one-hot encoding for text?

One limitation is that one-hot encoding creates very large and sparse vectors when the vocabulary is big, which can be inefficient and does not capture the meaning or relationships between words.

Click to reveal answer

intermediate

How does one-hot encoding handle new words not seen during training?

New words not in the original vocabulary cannot be represented directly by one-hot encoding, so they are often ignored or replaced with a special 'unknown' token vector.

Click to reveal answer

beginner

Example: If the vocabulary is ['cat', 'dog', 'bird'], what is the one-hot vector for 'dog'?

The one-hot vector for 'dog' is [0, 1, 0] because 'dog' is the second word in the vocabulary list, so the second position is 1 and others are 0.

Click to reveal answer

What does one-hot encoding do to a word in text data?

ATurns it into a vector with one 1 and rest 0s

BAssigns a random number to the word

CReplaces the word with its length

DGroups similar words together

Which problem can happen with one-hot encoding when vocabulary is very large?

AWords become similar

BVectors become very small

CWords lose their order

DVectors become sparse and large

How does one-hot encoding treat the relationship between words?

AShows similarity between words

BDoes not show any relationship

CShows order of words

DGroups synonyms together

If the vocabulary is ['apple', 'banana', 'cherry'], what is the one-hot vector for 'cherry'?

A[1, 0, 0]

B[0, 1, 0]

C[0, 0, 1]

D[1, 1, 1]

What happens if a new word not in the vocabulary appears during testing?

AIt is ignored or replaced with an 'unknown' token

BIt is assigned the vector of the closest word

CIt gets a new one-hot vector automatically

DIt causes an error

Explain what one-hot encoding is and why it is used for text data in machine learning.

Describe one limitation of one-hot encoding and how it affects text processing.

Practice

(1/5)

1. What does one-hot encoding do to words in text processing?

easy

A. Converts each word into a vector with one 1 and rest 0s

B. Replaces words with their synonyms

C. Counts the number of letters in each word

D. Sorts words alphabetically

One-hot encoding for text in NLP - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand one-hot encoding concept

Step 2: Compare options with definition

Final Answer:

Quick Check:

Solution

Step 1: Identify the index of 'cat' in vocabulary

Step 2: Create one-hot vector with 1 at index 0

Final Answer:

Quick Check:

Solution

Step 1: Understand list comprehension logic

Step 2: Apply to vocab list

Final Answer:

Quick Check:

Solution

Step 1: Analyze the list comprehension condition

Step 2: Correct logic for one-hot encoding

Final Answer:

Quick Check:

Solution

Step 1: Map each word to its one-hot vector

Step 2: Encode sentence words in order

Final Answer:

Quick Check: