0
0
NLPml~20 mins

N-grams in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
N-grams Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this code generating bigrams?
Given the code below that generates bigrams from a sentence, what is the output?
NLP
sentence = "I love machine learning"
words = sentence.split()
bigrams = [(words[i], words[i+1]) for i in range(len(words)-1)]
print(bigrams)
A[('I', 'love'), ('love', 'machine'), ('learning', 'machine')]
B[('I', 'love'), ('love', 'machine'), ('machine', 'learning')]
C[('love', 'I'), ('machine', 'love'), ('learning', 'machine')]
D[('I', 'machine'), ('love', 'learning')]
Attempts:
2 left
💡 Hint
Remember bigrams are pairs of consecutive words.
🧠 Conceptual
intermediate
1:30remaining
Which statement best describes the purpose of n-grams in text processing?
Choose the best description of why n-grams are used in natural language processing.
AThey capture sequences of words to understand context and word order.
BThey remove stop words to reduce noise in text data.
CThey translate text from one language to another.
DThey count the total number of characters in a text.
Attempts:
2 left
💡 Hint
Think about how n-grams help capture word relationships.
Hyperparameter
advanced
2:00remaining
Choosing the right n for n-grams
If you want to capture longer phrases but avoid very sparse data, which n-gram size is usually the best choice?
AUnigrams (n=1) because they are simple and cover all words.
BFour-grams (n=4) because they capture the most detailed phrases.
CBigrams (n=2) because they balance context and data sparsity.
DTrigrams (n=3) because longer sequences always improve accuracy.
Attempts:
2 left
💡 Hint
Longer n-grams capture more context but can cause data sparsity.
Metrics
advanced
1:30remaining
Evaluating n-gram language models
Which metric is commonly used to evaluate the quality of an n-gram language model?
APerplexity, which measures how well the model predicts a sample.
BAccuracy, which counts correct word predictions only.
CMean Squared Error, used for regression tasks.
DF1 Score, used for classification balance.
Attempts:
2 left
💡 Hint
This metric measures uncertainty in predicting text sequences.
🔧 Debug
expert
2:30remaining
Why does this n-gram code raise an error?
Consider this code snippet to generate trigrams. Why does it raise an IndexError?
NLP
sentence = "Data science is fun"
words = sentence.split()
trigrams = [(words[i], words[i+1], words[i+2]) for i in range(len(words)-2)]
print(trigrams)
AThe split method does not create a list of words.
BThe print statement is missing parentheses.
CTuples cannot have three elements in Python.
DThe range goes too far, causing words[i+2] to exceed list length.
Attempts:
2 left
💡 Hint
Check the range limit for accessing words[i+2].