NLPml~20 mins

Spam detection pipeline in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Spam Detection Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Model Choice

intermediate

2:00remaining

Choosing the best model for spam detection

You want to build a spam detection system that classifies emails as spam or not spam. Which model is best suited for this binary text classification task?

AA linear Support Vector Machine (SVM) with TF-IDF features

BK-Means clustering with raw email text

CA Convolutional Neural Network (CNN) designed for image recognition

DPrincipal Component Analysis (PCA) for dimensionality reduction

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of text preprocessing step

Given this Python code that preprocesses email text, what is the output?

NLP

import re
text = "Hello!!! This is spam??? Visit http://spam.com now."
cleaned = re.sub(r'http\S+', '', text)
cleaned = re.sub(r'[^a-zA-Z ]', '', cleaned).lower().split()
print(cleaned)

A['hello', 'this', 'is', 'spam', 'visit', 'now']

B['hello!!!', 'this', 'is', 'spam???', 'visit', 'http://spam.com', 'now']

C['hello', 'this', 'is', 'spam', 'visit', 'httpspamcom', 'now']

D['hello', 'this', 'is', 'spam', 'visit', 'http', 'spamcom', 'now']

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing the best hyperparameter for TF-IDF vectorizer

In a spam detection pipeline, you use a TF-IDF vectorizer. Which max_features value is best to balance performance and speed on a large email dataset?

Amax_features=10

Bmax_features=1000000

Cmax_features=None

Dmax_features=10000

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Interpreting spam detection model metrics

Your spam detection model has these results on test data: 90% accuracy, 70% precision, 95% recall. What does this mean?

AThe model correctly identifies most spam emails but also marks many non-spam as spam.

BThe model rarely misses spam emails but sometimes wrongly flags non-spam as spam.

CThe model is very precise but misses many spam emails.

DThe model has balanced precision and recall, so it is perfect.

Attempts:

2 left

🔧 Debug

expert

3:00remaining

Debugging model training failure in spam detection

Why does this spam detection training code raise a ValueError?

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
texts = ["Free money now", "Hello friend", "Win a prize"]
labels = [1, 0]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)
model = LogisticRegression()
model.fit(X, labels)

Afit_transform returns a dense matrix, but LogisticRegression needs sparse

BLogisticRegression requires labels to be strings, not integers

CThe labels list length does not match the number of texts

DTfidfVectorizer cannot process short texts

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of a spam detection pipeline in NLP?

easy

A. To convert text messages into numbers and train a model to identify spam

B. To translate messages into different languages

C. To summarize long emails automatically

D. To generate new text messages based on spam examples

Spam detection pipeline in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of a spam detection pipeline

Step 2: Identify the key function

Final Answer:

Quick Check:

Solution

Step 1: Recall the correct syntax for scikit-learn Pipeline

Step 2: Check each option's syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand the input and model output

Step 2: Predict expected labels

Final Answer:

Quick Check:

Solution

Step 1: Check the pipeline steps for correct instantiation

Step 2: Identify the error and fix

Final Answer:

Quick Check:

Solution

Step 1: Understand how to remove stop words in CountVectorizer

Step 2: Check pipeline options for correct usage

Final Answer:

Quick Check: