Bird
Raised Fist0
NLPml~20 mins

SVM for text classification in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
SVM Text Classifier Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
How does SVM handle text data?

Support Vector Machines (SVM) are used for text classification. How does SVM process text data before training?

ASVM directly uses raw text strings as input features without any transformation.
BSVM requires text to be translated into another language before training.
CSVM converts text into numerical vectors using techniques like TF-IDF or word embeddings before training.
DSVM uses the length of the text only as the feature for classification.
Attempts:
2 left
💡 Hint

Think about how computers understand text data for machine learning.

Predict Output
intermediate
2:00remaining
Output of SVM prediction on sample text

Given the following Python code using sklearn's SVM for text classification, what is the printed output?

NLP
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC

texts = ['I love apples', 'I hate bananas']
labels = [1, 0]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

model = SVC(kernel='linear')
model.fit(X, labels)

new_text = ['I love bananas']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction[0])
A0
B1
CError due to unseen words in new_text
DArray with multiple predictions
Attempts:
2 left
💡 Hint

Consider how SVM predicts based on learned features and similarity.

Hyperparameter
advanced
2:00remaining
Choosing the SVM kernel for text classification

Which kernel is generally best suited for SVM when classifying text data represented by TF-IDF vectors?

ASigmoid kernel, because it mimics neural networks.
BPolynomial kernel, because text data requires complex curved boundaries.
CRBF kernel, because it handles non-linear data better than linear kernel.
DLinear kernel, because text data is often linearly separable in high-dimensional space.
Attempts:
2 left
💡 Hint

Think about the nature of TF-IDF vectors and their dimensionality.

Metrics
advanced
2:00remaining
Evaluating SVM model performance on imbalanced text data

You trained an SVM classifier on imbalanced text data. Which metric is most reliable to evaluate the model's performance?

AF1-score, because it balances precision and recall.
BPrecision, because it measures how many predicted positives are correct.
CRecall, because it measures how many actual positives are found.
DAccuracy, because it shows overall correct predictions.
Attempts:
2 left
💡 Hint

Consider what happens when classes are imbalanced.

🔧 Debug
expert
2:00remaining
Why does this SVM training code raise an error?

Examine the code below. Why does it raise an error during training?

NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC

texts = ['good movie', 'bad movie', 'great film']
labels = [1, 0, 1]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = SVC(kernel='linear')
model.fit(X, labels)
AThe labels list length does not match the number of text samples.
BThe kernel parameter 'linear' is invalid.
CSVC requires labels to be strings, not integers.
DCountVectorizer cannot be used with SVM.
Attempts:
2 left
💡 Hint

Check the size of inputs and labels carefully.

Practice

(1/5)
1. What is the main purpose of using an SVM (Support Vector Machine) in text classification?
easy
A. To find the best line that separates different text categories
B. To count the number of words in the text
C. To translate text into another language
D. To generate random text samples

Solution

  1. Step 1: Understand SVM's role in classification

    SVM tries to find a boundary (line or hyperplane) that best separates different classes in data.
  2. Step 2: Apply this to text classification

    In text classification, SVM finds the best line to separate categories like spam vs. not spam.
  3. Final Answer:

    To find the best line that separates different text categories -> Option A
  4. Quick Check:

    SVM separates classes = D [OK]
Hint: SVM separates classes by finding the best boundary line [OK]
Common Mistakes:
  • Thinking SVM counts words directly
  • Confusing SVM with translation tools
  • Assuming SVM generates text
2. Which of the following is the correct way to convert text data before applying an SVM model in Python?
easy
A. Use CountVectorizer() or TfidfVectorizer() to transform text into numbers
B. Directly feed raw text strings into the SVM model
C. Use OneHotEncoder() on raw text strings
D. Apply StandardScaler() on raw text strings

Solution

  1. Step 1: Identify text preprocessing for SVM

    SVM requires numeric input, so text must be converted to numbers using vectorizers like CountVectorizer or TfidfVectorizer.
  2. Step 2: Check other options

    Raw text cannot be fed directly; OneHotEncoder and StandardScaler are not suitable for raw text strings.
  3. Final Answer:

    Use CountVectorizer() or TfidfVectorizer() to transform text into numbers -> Option A
  4. Quick Check:

    Text to numbers = Vectorizer = C [OK]
Hint: Always vectorize text before SVM, never raw strings [OK]
Common Mistakes:
  • Feeding raw text directly to SVM
  • Using OneHotEncoder on text strings
  • Applying scalers on text without vectorizing
3. Given the following Python code snippet, what will be the output of print(predicted_labels)?
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

texts = ["I love cats", "Dogs are great", "Cats are cute", "I hate dogs"]
labels = [1, 0, 1, 0]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

model = LinearSVC()
model.fit(X, labels)

new_texts = ["I love dogs", "Cats are great"]
X_new = vectorizer.transform(new_texts)
predicted_labels = model.predict(X_new)
medium
A. [1, 0]
B. [0, 1]
C. [1, 1]
D. [0, 0]

Solution

  1. Step 1: Understand training labels and texts

    Texts labeled 1 are about cats, 0 about dogs. Model learns cats=1, dogs=0.
  2. Step 2: Predict new texts

    "I love dogs" likely labeled 0 (dog), "Cats are great" labeled 1 (cat).
  3. Final Answer:

    [0, 1] -> Option B
  4. Quick Check:

    Dog text=0, Cat text=1 = B [OK]
Hint: Match new text topics to training labels for quick guess [OK]
Common Mistakes:
  • Mixing label meanings
  • Assuming model predicts opposite labels
  • Ignoring vectorizer effect
4. You trained an SVM model for text classification but got an error: ValueError: could not convert string to float. What is the most likely cause?
medium
A. You set the wrong kernel parameter in SVM
B. You used too many training samples
C. You forgot to convert text data into numeric vectors before training
D. You used a linear kernel instead of RBF kernel

Solution

  1. Step 1: Analyze the error message

    The error means the model received raw text strings instead of numbers.
  2. Step 2: Identify cause in text classification

    Text must be vectorized (converted to numbers) before training SVM.
  3. Final Answer:

    You forgot to convert text data into numeric vectors before training -> Option C
  4. Quick Check:

    Raw text input causes conversion error = A [OK]
Hint: Check if text is vectorized before training SVM [OK]
Common Mistakes:
  • Ignoring need for vectorization
  • Blaming kernel choice for conversion errors
  • Assuming data size causes this error
5. You want to improve your SVM text classifier's performance on a dataset with many common words like "the", "and", "is". Which approach is best to try?
hard
A. Switch to a polynomial kernel without changing text preprocessing
B. Increase the SVM regularization parameter without changing vectorization
C. Use raw word counts without removing stop words
D. Use a TF-IDF vectorizer to reduce the impact of common words

Solution

  1. Step 1: Understand the problem with common words

    Common words appear everywhere and do not help distinguish classes well.
  2. Step 2: Choose vectorization method to reduce common word impact

    TF-IDF lowers weights of common words, improving model focus on important words.
  3. Step 3: Evaluate other options

    Changing regularization or kernel without addressing common words won't help much.
  4. Final Answer:

    Use a TF-IDF vectorizer to reduce the impact of common words -> Option D
  5. Quick Check:

    TF-IDF reduces common word weight = A [OK]
Hint: TF-IDF downweights common words, improving text classification [OK]
Common Mistakes:
  • Ignoring stop words effect
  • Changing SVM parameters without vectorizing
  • Using raw counts with many common words