Why machines need numerical text representation in NLP - Explained with Examples

Practice

(1/5)

1. Why do machines need text to be converted into numbers before learning?

easy

A. Because words are too short to process

B. Because numbers are easier to read for humans

C. Because machines only understand numbers, not words

D. Because text is always incorrect

Solution

Step 1: Understand machine input requirements
Machines process data as numbers, not as text or words.
Step 2: Recognize the need for conversion
Text must be converted into numbers so machines can analyze and learn from it.
Final Answer:
Because machines only understand numbers, not words -> Option C
Quick Check:
Text to numbers = machines understand [OK]

Hint: Machines need numbers, not words, to learn [OK]

Common Mistakes:

Thinking machines understand words directly
Confusing human readability with machine input
Assuming text length matters more than format

2. Which of the following is a correct way to represent text numerically in Python?

easy

A. text_vector = {'word': 1, 'machine': 2}

B. text_vector = ['word', 'machine']

C. text_vector = 'word machine'

D. text_vector = 12345

Solution

Step 1: Identify numerical representation
text_vector = {'word': 1, 'machine': 2} shows a dictionary mapping words to numbers, which is a common numerical representation.
Step 2: Check other options
Options B and C are text or list of words, not numbers; A is just a number without relation to text.
Final Answer:
text_vector = {'word': 1, 'machine': 2} -> Option A
Quick Check:
Mapping words to numbers = correct representation [OK]

Hint: Look for word-to-number mapping in code [OK]

Common Mistakes:

Choosing plain text or list as numerical representation
Confusing numbers unrelated to words
Ignoring dictionary or vector formats

3. What will be the output of this Python code snippet?

from sklearn.feature_extraction.text import CountVectorizer
texts = ['hello world', 'hello machine']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
print(X.toarray())
print(vectorizer.get_feature_names_out())

medium

A. [[1 0 1] [1 1 0]] and ['hello' 'machine' 'world']

B. [[1 1] [1 1]] and ['hello' 'machine' 'world']

C. [[1 1] [1 0]] and ['hello' 'world']

D. [[1 0] [0 1]] and ['machine' 'world']

Solution

Step 1: Understand CountVectorizer output
CountVectorizer creates a vocabulary sorted alphabetically: ['hello', 'machine', 'world'].
Step 2: Map texts to vectors
'hello world' maps to [1, 0, 1], 'hello machine' maps to [1, 1, 0].
Final Answer:
[[1 0 1] [1 1 0]] and ['hello' 'machine' 'world'] -> Option A
Quick Check:
Text to count vectors and vocabulary = [[1 0 1] [1 1 0]] and ['hello' 'machine' 'world'] [OK]

Hint: Vocabulary is alphabetical; counts match word presence [OK]

Common Mistakes:

Mixing order of vocabulary words
Confusing counts with binary presence
Misreading array shapes

4. Identify the error in this code that tries to convert text to numbers:

texts = ['cat dog', 'dog mouse']
vectorizer = CountVectorizer()
X = vectorizer.transform(texts)
print(X.toarray())

medium

A. texts should be a single string, not a list

B. CountVectorizer must be fitted before transform

C. toarray() is not a valid method

D. CountVectorizer cannot handle multiple texts

Solution

Step 1: Check CountVectorizer usage
CountVectorizer requires calling fit() or fit_transform() before transform() to build vocabulary.
Step 2: Identify missing step
The code calls transform() without fitting, causing an error.
Final Answer:
CountVectorizer must be fitted before transform -> Option B
Quick Check:
fit() before transform() = correct usage [OK]

Hint: Always fit before transform with CountVectorizer [OK]

Common Mistakes:

Skipping fit() step
Passing list instead of string (which is allowed)
Misunderstanding toarray() method

5. You want to prepare text data for a machine learning model. Which approach best explains why you should convert text into numbers first?

hard

A. Because text data is too large to store in memory

B. Because converting text to numbers removes spelling errors

C. Because numbers are easier for humans to read than text

D. Because numerical data allows models to calculate patterns and relationships

Solution

Step 1: Understand model data needs
Machine learning models work by finding patterns in numbers, not raw text.
Step 2: Explain importance of numerical conversion
Converting text to numbers lets models calculate similarities and differences to learn effectively.
Final Answer:
Because numerical data allows models to calculate patterns and relationships -> Option D
Quick Check:
Numbers enable pattern learning in models [OK]

Hint: Models learn patterns from numbers, not raw text [OK]

Common Mistakes:

Thinking conversion is for memory saving
Believing numbers are for human reading
Assuming conversion fixes spelling

Start learning this pattern below

Practice

Solution

Step 1: Understand machine input requirements

Step 2: Recognize the need for conversion

Final Answer:

Quick Check:

Solution

Step 1: Identify numerical representation

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand CountVectorizer output

Step 2: Map texts to vectors

Final Answer:

Quick Check:

Solution

Step 1: Check CountVectorizer usage

Step 2: Identify missing step

Final Answer:

Quick Check:

Solution

Step 1: Understand model data needs

Step 2: Explain importance of numerical conversion

Final Answer:

Quick Check: