Logistic regression helps us decide if a piece of text belongs to one group or another, like sorting emails into spam or not spam.
0
0
Logistic regression for text in NLP
Introduction
You want to tell if a movie review is positive or negative.
You need to classify emails as spam or not spam.
You want to detect if a tweet is about a certain topic or not.
You want to quickly sort customer feedback into categories.
You want a simple model to understand text classification.
Syntax
NLP
from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression # Convert text to numbers vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) # Create and train model model = LogisticRegression() model.fit(X, labels) # Predict new text new_X = vectorizer.transform(new_texts) predictions = model.predict(new_X)
CountVectorizer turns words into numbers the model can understand.
LogisticRegression learns to separate text into classes based on these numbers.
Examples
This example trains on two sentences and predicts the sentiment of a new sentence.
NLP
texts = ['I love this movie', 'This movie is bad'] labels = [1, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = LogisticRegression() model.fit(X, labels) new_texts = ['I hate this movie'] new_X = vectorizer.transform(new_texts) prediction = model.predict(new_X) print(prediction)
This example classifies messages as spam (1) or not spam (0).
NLP
texts = ['spam offer', 'hello friend', 'win money now', "let's meet"] labels = [1, 0, 1, 0] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = LogisticRegression() model.fit(X, labels) new_texts = ['win a prize'] new_X = vectorizer.transform(new_texts) prediction = model.predict(new_X) print(prediction)
Sample Model
This program trains a logistic regression model to classify text as positive or negative. It splits data, trains, tests, and shows accuracy and predictions.
NLP
from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Sample text data and labels (1=positive, 0=negative) texts = [ 'I love this product', 'This is the worst thing ever', 'Absolutely fantastic experience', 'I hate it', 'Not good at all', 'Best purchase I made', 'Terrible quality', 'I am very happy', 'Do not buy this', 'Highly recommend it' ] labels = [1, 0, 1, 0, 0, 1, 0, 1, 0, 1] # Split data into training and testing sets X_train_texts, X_test_texts, y_train, y_test = train_test_split(texts, labels, test_size=0.3, random_state=42) # Convert text to numbers vectorizer = CountVectorizer() X_train = vectorizer.fit_transform(X_train_texts) X_test = vectorizer.transform(X_test_texts) # Create and train logistic regression model model = LogisticRegression() model.fit(X_train, y_train) # Predict on test data y_pred = model.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Test accuracy: {accuracy:.2f}") print(f"Test texts: {X_test_texts}") print(f"Predictions: {y_pred}")
OutputSuccess
Important Notes
Logistic regression works best with simple, clear text data.
Text must be converted to numbers before training.
More data usually means better results.
Summary
Logistic regression can classify text into categories like positive or negative.
Text is first changed into numbers using tools like CountVectorizer.
Model learns from examples and then predicts new text labels.