Practice

(1/5)

1. Why is language processing challenging for computers?

easy

A. Because computers do not have enough memory

B. Because computers cannot store large amounts of data

C. Because language has only one fixed meaning per word

D. Because words can have multiple meanings depending on context

Solution

Step 1: Understand word ambiguity in language
Words often have several meanings, which depend on the context they appear in.
Step 2: Relate ambiguity to computer difficulty
Computers struggle to pick the correct meaning without understanding context, making language processing hard.
Final Answer:
Because words can have multiple meanings depending on context -> Option D
Quick Check:
Word ambiguity = D [OK]

Hint: Remember: words change meaning with context [OK]

Common Mistakes:

Thinking each word has only one meaning
Assuming computers lack memory causes difficulty
Confusing data storage with language understanding

2. Which of the following is the correct way to represent a sentence tokenization step in Python using NLTK?

easy

A. tokens = nltk.word_tokenize(sentence)

B. tokens = nltk.sentence_tokenize(sentence)

C. tokens = nltk.tokenize_words(sentence)

D. tokens = nltk.split(sentence)

Solution

Step 1: Recall NLTK tokenization functions
NLTK uses word_tokenize() to split sentences into words (tokens).
Step 2: Identify correct function for word tokenization
word_tokenize() is the correct function; sentence_tokenize() does not exist, and others are invalid.
Final Answer:
tokens = nltk.word_tokenize(sentence) -> Option A
Quick Check:
NLTK word tokenization = C [OK]

Hint: Use word_tokenize() for splitting sentence into words [OK]

Common Mistakes:

Using sentence_tokenize() which is not a valid function
Confusing word_tokenize() with tokenize_words()
Trying to split sentence with split() method

3. Given the code below, what will be the output?

sentence = "I saw her duck." 
tokens = sentence.split()
print(tokens)

medium

A. ['I', 'saw', 'her', 'duck.']

B. ['I', 'saw', 'her', 'duck']

C. ['I', 'saw', 'her', 'duck', '.']

D. ['I saw her duck']

Solution

Step 1: Understand split() behavior on string
split() divides the string by spaces, keeping punctuation attached to words.
Step 2: Apply split() to the sentence
Splitting "I saw her duck." by spaces results in ['I', 'saw', 'her', 'duck.'] with the period attached to 'duck.'
Final Answer:
['I', 'saw', 'her', 'duck.'] -> Option A
Quick Check:
split() keeps punctuation attached = A [OK]

Hint: split() keeps punctuation with words [OK]

Common Mistakes:

Assuming split() removes punctuation
Expecting punctuation as separate token
Confusing split() with word_tokenize()

4. The following code tries to remove stopwords from a list of tokens but does not work as expected. What is the error?

stopwords = ['the', 'is', 'at']
tokens = ['the', 'cat', 'is', 'on', 'the', 'mat']
filtered = [word for word in tokens if word not in stopwords()]
print(filtered)

medium

A. tokens should be converted to a set before filtering

B. The list comprehension syntax is incorrect

C. stopwords is a list, not a function; should not use parentheses

D. The print statement is missing parentheses

Solution

Step 1: Identify the error in stopwords usage
stopwords is a list, but the code uses stopwords() as if it were a function.
Step 2: Correct the usage of stopwords
Remove parentheses to use stopwords as a list: use 'word not in stopwords' instead of 'stopwords()'.
Final Answer:
stopwords is a list, not a function; should not use parentheses -> Option C
Quick Check:
stopwords list misuse = B [OK]

Hint: Lists are not functions; avoid parentheses [OK]

Common Mistakes:

Using parentheses after list variable
Thinking tokens must be sets to filter
Misreading list comprehension syntax

5. Which challenge best explains why idioms like "kick the bucket" are hard for AI to understand?

hard

A. Idioms are always spelled incorrectly

B. Idioms have meanings different from the literal words

C. Idioms contain rare words not in dictionaries

D. Idioms are too long for AI to process

Solution

Step 1: Understand idioms in language
Idioms are phrases whose meaning is not the sum of their individual words.
Step 2: Relate idioms to AI language challenges
AI struggles because it cannot infer the non-literal meaning from the literal words alone.
Final Answer:
Idioms have meanings different from the literal words -> Option B
Quick Check:
Idioms = non-literal meaning = A [OK]

Hint: Idioms mean more than their words [OK]

Common Mistakes:

Thinking idioms are misspelled
Assuming idioms use rare words
Believing idioms are too long to process

Epoch	Loss ↓	Accuracy ↑	Observation
1	1.2	0.45	Model starts learning basic word patterns
2	0.9	0.60	Model improves understanding of context
3	0.7	0.72	Model handles ambiguity better
4	0.6	0.78	Model learns common phrases and syntax
5	0.55	0.82	Model shows good generalization on training data

Challenges in language processing in NLP - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand word ambiguity in language

Step 2: Relate ambiguity to computer difficulty

Final Answer:

Quick Check:

Solution

Step 1: Recall NLTK tokenization functions

Step 2: Identify correct function for word tokenization

Final Answer:

Quick Check:

Solution

Step 1: Understand split() behavior on string

Step 2: Apply split() to the sentence

Final Answer:

Quick Check:

Solution

Step 1: Identify the error in stopwords usage

Step 2: Correct the usage of stopwords

Final Answer:

Quick Check:

Solution

Step 1: Understand idioms in language

Step 2: Relate idioms to AI language challenges

Final Answer:

Quick Check: