0
0
NLPml~20 mins

SVM for text classification in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
SVM Text Classifier Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
How does SVM handle text data?

Support Vector Machines (SVM) are used for text classification. How does SVM process text data before training?

ASVM directly uses raw text strings as input features without any transformation.
BSVM requires text to be translated into another language before training.
CSVM converts text into numerical vectors using techniques like TF-IDF or word embeddings before training.
DSVM uses the length of the text only as the feature for classification.
Attempts:
2 left
💡 Hint

Think about how computers understand text data for machine learning.

Predict Output
intermediate
2:00remaining
Output of SVM prediction on sample text

Given the following Python code using sklearn's SVM for text classification, what is the printed output?

NLP
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC

texts = ['I love apples', 'I hate bananas']
labels = [1, 0]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

model = SVC(kernel='linear')
model.fit(X, labels)

new_text = ['I love bananas']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction[0])
A0
B1
CError due to unseen words in new_text
DArray with multiple predictions
Attempts:
2 left
💡 Hint

Consider how SVM predicts based on learned features and similarity.

Hyperparameter
advanced
2:00remaining
Choosing the SVM kernel for text classification

Which kernel is generally best suited for SVM when classifying text data represented by TF-IDF vectors?

ASigmoid kernel, because it mimics neural networks.
BPolynomial kernel, because text data requires complex curved boundaries.
CRBF kernel, because it handles non-linear data better than linear kernel.
DLinear kernel, because text data is often linearly separable in high-dimensional space.
Attempts:
2 left
💡 Hint

Think about the nature of TF-IDF vectors and their dimensionality.

Metrics
advanced
2:00remaining
Evaluating SVM model performance on imbalanced text data

You trained an SVM classifier on imbalanced text data. Which metric is most reliable to evaluate the model's performance?

AF1-score, because it balances precision and recall.
BPrecision, because it measures how many predicted positives are correct.
CRecall, because it measures how many actual positives are found.
DAccuracy, because it shows overall correct predictions.
Attempts:
2 left
💡 Hint

Consider what happens when classes are imbalanced.

🔧 Debug
expert
2:00remaining
Why does this SVM training code raise an error?

Examine the code below. Why does it raise an error during training?

NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC

texts = ['good movie', 'bad movie', 'great film']
labels = [1, 0, 1]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = SVC(kernel='linear')
model.fit(X, labels)
AThe labels list length does not match the number of text samples.
BThe kernel parameter 'linear' is invalid.
CSVC requires labels to be strings, not integers.
DCountVectorizer cannot be used with SVM.
Attempts:
2 left
💡 Hint

Check the size of inputs and labels carefully.