0
0
NLPml~20 mins

Why machines need numerical text representation in NLP - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Text Representation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why can't machines understand raw text directly?

Machines process numbers, not words. Why is it necessary to convert text into numbers before feeding it to a machine learning model?

ABecause machines only understand numerical data and cannot process raw text directly.
BBecause numerical text representation removes all grammar and meaning from the text.
CBecause converting text to numbers makes the text shorter and easier to read for humans.
DBecause raw text contains too many spelling mistakes that machines cannot fix.
Attempts:
2 left
💡 Hint

Think about how computers store and process information internally.

Predict Output
intermediate
2:00remaining
Output of simple text to number mapping

What is the output of this Python code that converts words to their length?

NLP
words = ['cat', 'dog', 'elephant']
lengths = [len(word) for word in words]
print(lengths)
A[3, 3, 8]
B['cat', 'dog', 'elephant']
C[5, 5, 5]
D[0, 0, 0]
Attempts:
2 left
💡 Hint

len(word) returns the number of characters in each word.

Model Choice
advanced
2:00remaining
Choosing the right text representation for sentiment analysis

You want to build a model to detect positive or negative feelings in movie reviews. Which numerical text representation is best to capture word meanings and context?

AOne-hot encoding of each word ignoring similarity
BPretrained word embeddings like Word2Vec or GloVe capturing semantic meaning
CBag-of-Words vector counting word frequencies without order
DRandom numbers assigned to each word
Attempts:
2 left
💡 Hint

Consider which method captures word meanings and relationships best.

Metrics
advanced
2:00remaining
Evaluating text classification model performance

After converting text to numbers and training a classifier, which metric best tells you how well the model correctly identifies positive reviews?

ARecall - proportion of actual positive reviews correctly identified
BAccuracy - overall correct predictions divided by total predictions
CPrecision - proportion of predicted positive reviews that are actually positive
DLoss - the error value during training
Attempts:
2 left
💡 Hint

Think about the metric that measures correctness of positive predictions.

🔧 Debug
expert
2:00remaining
Debugging numerical text representation error

Given this code snippet converting text to numbers, what error will it raise?

NLP
text = 'hello world'
word_to_index = {'hello': 1, 'world': 2}
numbers = [word_to_index[word] for word in text.split() + ['!']]
print(numbers)
ANo error, output will be [1, 2, 0]
BSyntaxError due to invalid list concatenation
CTypeError because split() returns a string not a list
DKeyError because '!' is not in word_to_index dictionary
Attempts:
2 left
💡 Hint

Check if all words exist in the dictionary before accessing.