Practice

(1/5)

1. Which Python library is best known for providing pre-trained models for advanced NLP tasks?

easy

A. NLTK

B. Hugging Face

C. spaCy

D. Scikit-learn

Solution

Step 1: Understand the role of each library
NLTK is mainly for learning and basic NLP tasks, spaCy is for fast real-world processing, and Hugging Face offers powerful pre-trained models.
Step 2: Identify the library specialized in pre-trained models
Hugging Face is known for its large collection of pre-trained transformer models for advanced NLP.
Final Answer:
Hugging Face -> Option B
Quick Check:
Pre-trained models = Hugging Face [OK]

Hint: Remember: Hugging Face = pre-trained models [OK]

Common Mistakes:

Confusing NLTK as the source of pre-trained models
Thinking spaCy provides many pre-trained transformer models
Choosing Scikit-learn which is not specialized for NLP

2. Which of the following is the correct way to import the English language model in spaCy?

easy

A. import spacy; nlp = spacy.load('en_core_web_sm')

B. import spacy; nlp = spacy.load('english')

C. from spacy import English; nlp = English()

D. import spacy; nlp = spacy.load('en')

Solution

Step 1: Recall spaCy's model loading syntax
spaCy loads models using spacy.load() with the model name like 'en_core_web_sm'.
Step 2: Check each option's syntax
import spacy; nlp = spacy.load('en_core_web_sm') uses the correct model name for the small English core model. 'en' loads a blank model without components, 'english' is not a valid model name, and from spacy import English; nlp = English() only initializes a basic tokenizer without trained pipelines.
Final Answer:
import spacy; nlp = spacy.load('en_core_web_sm') -> Option A
Quick Check:
spaCy model load = spacy.load('en_core_web_sm') [OK]

Hint: Use spacy.load('en_core_web_sm') to load English model [OK]

Common Mistakes:

Using 'english' or 'en' instead of 'en_core_web_sm'
Trying to import English class instead of loading model
Forgetting to install the model before loading

3. What will be the output of this NLTK code snippet?

import nltk
from nltk.tokenize import word_tokenize
text = "Hello world!"
tokens = word_tokenize(text)
print(tokens)

medium

A. ['Hello world!']

B. ['Hello', 'world']

C. ['Hello', 'world!']

D. ['Hello', 'world', '!']

Solution

Step 1: Understand word_tokenize behavior
NLTK's word_tokenize splits text into words and punctuation separately.
Step 2: Apply tokenization to 'Hello world!'
The text splits into three tokens: 'Hello', 'world', and '!'.
Final Answer:
['Hello', 'world', '!'] -> Option D
Quick Check:
word_tokenize splits punctuation separately [OK]

Hint: word_tokenize splits punctuation as separate tokens [OK]

Common Mistakes:

Expecting punctuation to stay attached to words
Confusing tokenization with simple split()
Ignoring that '!' is a separate token

4. Identify the error in this Hugging Face transformers code snippet:

from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier('I love NLP!')
print(result[0])

medium

A. Missing model download before pipeline creation

B. Incorrect pipeline task name

C. No error, code runs correctly

D. Result indexing should be result[1]

Solution

Step 1: Check pipeline usage
The pipeline function with 'sentiment-analysis' is correct and downloads the default model automatically if needed.
Step 2: Verify result usage
The classifier returns a list of dicts; accessing result[0] is correct to get the first prediction.
Final Answer:
No error, code runs correctly -> Option C
Quick Check:
Hugging Face pipeline auto-downloads models [OK]

Hint: Hugging Face pipelines auto-download models [OK]

Common Mistakes:

Thinking model must be downloaded manually first
Using wrong pipeline task name
Accessing wrong index of result list

5. You want to extract named entities from a text quickly and accurately. Which combination of tools and steps is best?

hard

A. Use spaCy's pre-trained model with nlp = spacy.load('en_core_web_sm') and then nlp(text).ents

B. Use NLTK's word_tokenize and then manually match entity patterns

C. Use Hugging Face pipeline('ner') without loading any model

D. Use spaCy's tokenizer only and ignore entity recognition

Solution

Step 1: Identify fast and accurate named entity extraction
spaCy provides pre-trained models that include named entity recognition (NER) ready to use.
Step 2: Evaluate options for NER
Use spaCy's pre-trained model with nlp = spacy.load('en_core_web_sm') and then nlp(text).ents uses spaCy's model and extracts entities with nlp(text).ents, which is efficient and accurate. Use NLTK's word_tokenize and then manually match entity patterns requires manual pattern matching, which is slow and error-prone. Use Hugging Face pipeline('ner') without loading any model misses loading a model explicitly, which is needed. Use spaCy's tokenizer only and ignore entity recognition ignores entity recognition.
Final Answer:
Use spaCy's pre-trained model with nlp = spacy.load('en_core_web_sm') and then nlp(text).ents -> Option A
Quick Check:
spaCy pre-trained models = fast NER [OK]

Hint: spaCy pre-trained models provide fast named entity recognition [OK]

Common Mistakes:

Trying to do NER manually with NLTK tokens
Using pipeline('ner') without model loading
Ignoring entity extraction step

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.65	0.60	Model starts learning basic patterns from embeddings
2	0.48	0.75	Loss decreases and accuracy improves as model learns
3	0.35	0.85	Model converges with good accuracy on training data
4	0.30	0.88	Slight improvement, model stabilizes
5	0.28	0.90	Final epoch with best performance

Python NLP ecosystem (NLTK, spaCy, Hugging Face) - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of each library

Step 2: Identify the library specialized in pre-trained models

Final Answer:

Quick Check:

Solution

Step 1: Recall spaCy's model loading syntax

Step 2: Check each option's syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand word_tokenize behavior

Step 2: Apply tokenization to 'Hello world!'

Final Answer:

Quick Check:

Solution

Step 1: Check pipeline usage

Step 2: Verify result usage

Final Answer:

Quick Check:

Solution

Step 1: Identify fast and accurate named entity extraction

Step 2: Evaluate options for NER

Final Answer:

Quick Check: