0
0
NLPml~5 mins

Logistic regression for text in NLP

Choose your learning style9 modes available
Introduction

Logistic regression helps us decide if a piece of text belongs to one group or another, like sorting emails into spam or not spam.

You want to tell if a movie review is positive or negative.
You need to classify emails as spam or not spam.
You want to detect if a tweet is about a certain topic or not.
You want to quickly sort customer feedback into categories.
You want a simple model to understand text classification.
Syntax
NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

# Convert text to numbers
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Create and train model
model = LogisticRegression()
model.fit(X, labels)

# Predict new text
new_X = vectorizer.transform(new_texts)
predictions = model.predict(new_X)

CountVectorizer turns words into numbers the model can understand.

LogisticRegression learns to separate text into classes based on these numbers.

Examples
This example trains on two sentences and predicts the sentiment of a new sentence.
NLP
texts = ['I love this movie', 'This movie is bad']
labels = [1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression()
model.fit(X, labels)

new_texts = ['I hate this movie']
new_X = vectorizer.transform(new_texts)
prediction = model.predict(new_X)
print(prediction)
This example classifies messages as spam (1) or not spam (0).
NLP
texts = ['spam offer', 'hello friend', 'win money now', "let's meet"]
labels = [1, 0, 1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression()
model.fit(X, labels)

new_texts = ['win a prize']
new_X = vectorizer.transform(new_texts)
prediction = model.predict(new_X)
print(prediction)
Sample Model

This program trains a logistic regression model to classify text as positive or negative. It splits data, trains, tests, and shows accuracy and predictions.

NLP
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample text data and labels (1=positive, 0=negative)
texts = [
    'I love this product',
    'This is the worst thing ever',
    'Absolutely fantastic experience',
    'I hate it',
    'Not good at all',
    'Best purchase I made',
    'Terrible quality',
    'I am very happy',
    'Do not buy this',
    'Highly recommend it'
]
labels = [1, 0, 1, 0, 0, 1, 0, 1, 0, 1]

# Split data into training and testing sets
X_train_texts, X_test_texts, y_train, y_test = train_test_split(texts, labels, test_size=0.3, random_state=42)

# Convert text to numbers
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(X_train_texts)
X_test = vectorizer.transform(X_test_texts)

# Create and train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Test accuracy: {accuracy:.2f}")
print(f"Test texts: {X_test_texts}")
print(f"Predictions: {y_pred}")
OutputSuccess
Important Notes

Logistic regression works best with simple, clear text data.

Text must be converted to numbers before training.

More data usually means better results.

Summary

Logistic regression can classify text into categories like positive or negative.

Text is first changed into numbers using tools like CountVectorizer.

Model learns from examples and then predicts new text labels.