In natural language processing, what is the main purpose of tokenization?
Think about how you break a sentence into parts before understanding it.
Tokenization breaks text into smaller units called tokens, usually words or sentences, which are easier to analyze.
What is the output of this Python code that counts word frequencies?
from collections import Counter text = 'apple banana apple orange banana apple' word_counts = Counter(text.split()) print(word_counts)
Count how many times each word appears in the text.
The Counter counts each word's frequency: 'apple' appears 3 times, 'banana' 2 times, and 'orange' once.
You want to build a model to classify movie reviews as positive or negative. Which model type is best suited for this task?
Think about models that understand sequences of words.
RNNs and Transformers are designed to process sequences like text, making them suitable for sentiment analysis.
You trained a spam detection model where spam messages are rare. Which metric is best to evaluate your model's performance?
Think about what matters when one class is much smaller than the other.
Precision and recall help measure how well the model detects rare spam messages without too many mistakes.
Consider this Python code snippet for lowercasing and removing punctuation from text. Why does it raise an error?
import string text = 'Hello, World!' clean_text = text.lower().translate(str.maketrans('', '', string.punctuation)) print(clean_text)
Check what the translate() method needs as input.
translate() requires a translation table created by str.maketrans(), not a string like string.punctuation.