0
0
NLPml~20 mins

Python NLP ecosystem (NLTK, spaCy, Hugging Face) - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
NLP Ecosystem Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Tokenization Differences

Which statement best describes the difference between tokenization in NLTK and spaCy?

AspaCy tokenizes text into sentences only, while NLTK tokenizes into words only.
BNLTK uses rule-based tokenization while spaCy uses statistical models for tokenization.
CNLTK tokenization is slower because it uses deep learning models, spaCy uses simple regex.
DBoth NLTK and spaCy use the exact same tokenization algorithms internally.
Attempts:
2 left
💡 Hint

Think about how each library approaches breaking text into pieces.

Predict Output
intermediate
2:00remaining
Output of spaCy Named Entity Recognition

What is the output of this code snippet?

NLP
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('Apple is looking at buying U.K. startup for $1 billion')
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)
A[('Apple', 'ORG'), ('U.K.', 'LOC'), ('$1 billion', 'MONEY')]
B[('Apple', 'PERSON'), ('U.K.', 'ORG'), ('$1 billion', 'QUANTITY')]
C[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
D[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'QUANTITY')]
Attempts:
2 left
💡 Hint

Check spaCy's entity labels for organizations, geopolitical entities, and money.

Model Choice
advanced
2:00remaining
Choosing a Hugging Face Model for Sentiment Analysis

You want to perform sentiment analysis on movie reviews using Hugging Face transformers. Which model is the best choice?

Abert-base-uncased
Bgpt2
Croberta-base
Ddistilbert-base-uncased-finetuned-sst-2-english
Attempts:
2 left
💡 Hint

Look for a model fine-tuned specifically for sentiment tasks.

Hyperparameter
advanced
2:00remaining
Effect of Changing Learning Rate in Fine-Tuning Transformers

During fine-tuning a Hugging Face transformer model, what is the most likely effect of setting the learning rate too high?

AThe model training becomes unstable and loss may not decrease properly.
BThe model converges faster and achieves higher accuracy.
CThe model ignores the training data and uses pre-trained weights only.
DThe model will underfit and have very low training loss.
Attempts:
2 left
💡 Hint

Think about what happens when updates are too large during training.

🔧 Debug
expert
2:00remaining
Debugging Tokenization Output in NLTK

What error or unexpected output will this code produce?

NLP
from nltk.tokenize import word_tokenize
text = "Hello, world! Let's test tokenization."
tokens = word_tokenize(text)
print(tokens[10])
AIndexError: list index out of range
B'tokenization'
C'test'
DSyntaxError
Attempts:
2 left
💡 Hint

Count how many tokens are produced and check the index accessed.