from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression texts = ['good movie', 'bad movie', 'great film', 'terrible film'] labels = [1, 0, 1, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = LogisticRegression() model.fit(X, labels) new_text = ['good film'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction[0])

from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression texts = ['happy', 'sad', 'joyful'] labels = [1, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = LogisticRegression() model.fit(X, labels)

Practice

(1/5)

1. What is the main purpose of logistic regression when applied to text data?

easy

A. To count the number of words in a text

B. To generate new text sentences

C. To classify text into categories like positive or negative

D. To translate text from one language to another

Solution

Step 1: Understand logistic regression's role in text
Logistic regression is a method used to classify data into categories based on input features.
Step 2: Apply to text classification
When applied to text, logistic regression predicts categories like positive or negative sentiment.
Final Answer:
To classify text into categories like positive or negative -> Option C
Quick Check:
Logistic regression classifies text [OK]

Hint: Logistic regression predicts categories, not generates text [OK]

Common Mistakes:

Confusing classification with text generation
Thinking logistic regression translates languages
Assuming it only counts words

2. Which Python library is commonly used to convert text into numbers before applying logistic regression?

easy

A. CountVectorizer

B. matplotlib

C. pandas

D. seaborn

Solution

Step 1: Identify text to number conversion tools
CountVectorizer is a tool that converts text into a matrix of token counts, suitable for models.
Step 2: Match with logistic regression preprocessing
Before logistic regression, text must be numeric; CountVectorizer is commonly used for this.
Final Answer:
CountVectorizer -> Option A
Quick Check:
Text to numbers = CountVectorizer [OK]

Hint: CountVectorizer turns words into numbers for models [OK]

Common Mistakes:

Choosing plotting libraries like matplotlib
Confusing data frame libraries like pandas
Selecting visualization tools like seaborn

3. What will be the output of this code snippet?

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['good movie', 'bad movie']
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
model = LogisticRegression()
model.fit(X, labels)
pred = model.predict(vectorizer.transform(['good movie']))
print(pred)

medium

A. [0]

B. [1]

C. [1, 0]

D. Error: model not trained

Solution

Step 1: Understand training data and labels
Texts 'good movie' labeled 1 (positive), 'bad movie' labeled 0 (negative).
Step 2: Predict on 'good movie'
Model trained on these examples predicts label for 'good movie' as 1.
Final Answer:
[1] -> Option B
Quick Check:
Prediction for 'good movie' = 1 [OK]

Hint: Model predicts label matching training example [OK]

Common Mistakes:

Assuming prediction returns multiple labels
Thinking model is untrained causing error
Confusing label 0 and 1

4. Identify the error in this code snippet for logistic regression on text:

from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer

texts = ['happy', 'sad']
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
model = LogisticRegression()
model.fit(texts, labels)

medium

A. model.fit should use numeric features, not raw texts

B. CountVectorizer is not imported

C. fit_transform should be called on labels

D. Labels should be strings, not integers

Solution

Step 1: Check input to model.fit
Model expects numeric features, but code passes raw text strings.
Step 2: Correct usage of vectorized data
Must pass X (vectorized text) to model.fit, not original texts.
Final Answer:
model.fit should use numeric features, not raw texts -> Option A
Quick Check:
Model needs numbers, not raw text [OK]

Hint: Pass vectorized text, not raw strings, to model.fit [OK]

Common Mistakes:

Passing raw text instead of vectorized data
Confusing labels data type requirements
Ignoring import statements

5. You trained a logistic regression model on text data using CountVectorizer. When testing on new sentences, the model predicts only one class for all inputs. What is the best way to improve the model's performance?

hard

A. Change logistic regression to linear regression

B. Remove CountVectorizer and use raw text directly

C. Use fewer training examples to avoid overfitting

D. Increase the number of training examples and use n-grams in CountVectorizer

Solution

Step 1: Understand cause of single-class prediction
Model may be underfitting due to limited data or simple features.
Step 2: Improve feature richness and data size
Adding more training examples and using n-grams captures more context, improving model learning.
Final Answer:
Increase the number of training examples and use n-grams in CountVectorizer -> Option D
Quick Check:
More data + better features = better model [OK]

Hint: More data and richer features improve classification [OK]

Common Mistakes:

Removing vectorizer loses numeric input
Reducing data worsens model
Confusing regression types

Logistic regression for text in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand logistic regression's role in text

Step 2: Apply to text classification

Final Answer:

Quick Check:

Solution

Step 1: Identify text to number conversion tools

Step 2: Match with logistic regression preprocessing

Final Answer:

Quick Check:

Solution

Step 1: Understand training data and labels

Step 2: Predict on 'good movie'

Final Answer:

Quick Check:

Solution

Step 1: Check input to model.fit

Step 2: Correct usage of vectorized data

Final Answer:

Quick Check:

Solution

Step 1: Understand cause of single-class prediction

Step 2: Improve feature richness and data size

Final Answer:

Quick Check: