Bird
Raised Fist0
NLPml~20 mins

Challenges in language processing in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Language Processing Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Ambiguity in Language Processing

Which of the following best describes syntactic ambiguity in natural language processing?

AErrors caused by misspelled words in the input text
BWords that have multiple meanings depending on context
CA sentence that can be interpreted in more than one way due to its structure
DDifficulty in understanding slang or informal language
Attempts:
2 left
💡 Hint

Think about how sentence structure can change meaning.

Model Choice
intermediate
2:00remaining
Choosing a Model for Named Entity Recognition

You want to build a system that identifies names of people, places, and organizations in text. Which model type is most suitable?

ARecurrent Neural Network (RNN) or Transformer-based model for sequence labeling
BConvolutional Neural Network (CNN) for image classification
CK-Means clustering for grouping similar words
DLinear Regression for predicting numerical values
Attempts:
2 left
💡 Hint

Think about models good at understanding sequences of words.

Metrics
advanced
2:00remaining
Evaluating Language Model Performance

Which metric is most appropriate to evaluate a language model's ability to predict the next word in a sentence?

APerplexity measuring how well the model predicts a sample
BAccuracy of predicted sentiment labels
CMean Squared Error between predicted and actual word embeddings
DF1 score of named entity recognition tags
Attempts:
2 left
💡 Hint

Consider a metric that measures uncertainty in predictions.

🔧 Debug
advanced
2:00remaining
Identifying the Cause of Poor Translation Quality

A neural machine translation model produces fluent but incorrect translations. Which issue is most likely causing this?

AThe optimizer learning rate is set to zero
BThe training data is too small or not diverse enough
CThe input sentences are too short
DThe model uses too many layers causing overfitting
Attempts:
2 left
💡 Hint

Think about what affects the model's knowledge of language pairs.

Hyperparameter
expert
3:00remaining
Optimizing Transformer Model Training

When training a Transformer model for language tasks, which hyperparameter adjustment is most effective to reduce overfitting?

AIncrease the number of attention heads without changing dropout
BRemove layer normalization to speed up training
CDecrease the batch size to make training more stable
DIncrease dropout rate to randomly ignore some neurons during training
Attempts:
2 left
💡 Hint

Think about techniques that prevent the model from memorizing training data.

Practice

(1/5)
1. Why is language processing challenging for computers?
easy
A. Because computers do not have enough memory
B. Because computers cannot store large amounts of data
C. Because language has only one fixed meaning per word
D. Because words can have multiple meanings depending on context

Solution

  1. Step 1: Understand word ambiguity in language

    Words often have several meanings, which depend on the context they appear in.
  2. Step 2: Relate ambiguity to computer difficulty

    Computers struggle to pick the correct meaning without understanding context, making language processing hard.
  3. Final Answer:

    Because words can have multiple meanings depending on context -> Option D
  4. Quick Check:

    Word ambiguity = D [OK]
Hint: Remember: words change meaning with context [OK]
Common Mistakes:
  • Thinking each word has only one meaning
  • Assuming computers lack memory causes difficulty
  • Confusing data storage with language understanding
2. Which of the following is the correct way to represent a sentence tokenization step in Python using NLTK?
easy
A. tokens = nltk.word_tokenize(sentence)
B. tokens = nltk.sentence_tokenize(sentence)
C. tokens = nltk.tokenize_words(sentence)
D. tokens = nltk.split(sentence)

Solution

  1. Step 1: Recall NLTK tokenization functions

    NLTK uses word_tokenize() to split sentences into words (tokens).
  2. Step 2: Identify correct function for word tokenization

    word_tokenize() is the correct function; sentence_tokenize() does not exist, and others are invalid.
  3. Final Answer:

    tokens = nltk.word_tokenize(sentence) -> Option A
  4. Quick Check:

    NLTK word tokenization = C [OK]
Hint: Use word_tokenize() for splitting sentence into words [OK]
Common Mistakes:
  • Using sentence_tokenize() which is not a valid function
  • Confusing word_tokenize() with tokenize_words()
  • Trying to split sentence with split() method
3. Given the code below, what will be the output?
sentence = "I saw her duck." 
tokens = sentence.split()
print(tokens)
medium
A. ['I', 'saw', 'her', 'duck.']
B. ['I', 'saw', 'her', 'duck']
C. ['I', 'saw', 'her', 'duck', '.']
D. ['I saw her duck']

Solution

  1. Step 1: Understand split() behavior on string

    split() divides the string by spaces, keeping punctuation attached to words.
  2. Step 2: Apply split() to the sentence

    Splitting "I saw her duck." by spaces results in ['I', 'saw', 'her', 'duck.'] with the period attached to 'duck.'
  3. Final Answer:

    ['I', 'saw', 'her', 'duck.'] -> Option A
  4. Quick Check:

    split() keeps punctuation attached = A [OK]
Hint: split() keeps punctuation with words [OK]
Common Mistakes:
  • Assuming split() removes punctuation
  • Expecting punctuation as separate token
  • Confusing split() with word_tokenize()
4. The following code tries to remove stopwords from a list of tokens but does not work as expected. What is the error?
stopwords = ['the', 'is', 'at']
tokens = ['the', 'cat', 'is', 'on', 'the', 'mat']
filtered = [word for word in tokens if word not in stopwords()]
print(filtered)
medium
A. tokens should be converted to a set before filtering
B. The list comprehension syntax is incorrect
C. stopwords is a list, not a function; should not use parentheses
D. The print statement is missing parentheses

Solution

  1. Step 1: Identify the error in stopwords usage

    stopwords is a list, but the code uses stopwords() as if it were a function.
  2. Step 2: Correct the usage of stopwords

    Remove parentheses to use stopwords as a list: use 'word not in stopwords' instead of 'stopwords()'.
  3. Final Answer:

    stopwords is a list, not a function; should not use parentheses -> Option C
  4. Quick Check:

    stopwords list misuse = B [OK]
Hint: Lists are not functions; avoid parentheses [OK]
Common Mistakes:
  • Using parentheses after list variable
  • Thinking tokens must be sets to filter
  • Misreading list comprehension syntax
5. Which challenge best explains why idioms like "kick the bucket" are hard for AI to understand?
hard
A. Idioms are always spelled incorrectly
B. Idioms have meanings different from the literal words
C. Idioms contain rare words not in dictionaries
D. Idioms are too long for AI to process

Solution

  1. Step 1: Understand idioms in language

    Idioms are phrases whose meaning is not the sum of their individual words.
  2. Step 2: Relate idioms to AI language challenges

    AI struggles because it cannot infer the non-literal meaning from the literal words alone.
  3. Final Answer:

    Idioms have meanings different from the literal words -> Option B
  4. Quick Check:

    Idioms = non-literal meaning = A [OK]
Hint: Idioms mean more than their words [OK]
Common Mistakes:
  • Thinking idioms are misspelled
  • Assuming idioms use rare words
  • Believing idioms are too long to process