0
0
ML Pythonml~5 mins

Sentiment analysis with scikit-learn in ML Python

Choose your learning style9 modes available
Introduction

Sentiment analysis helps us understand if text shows positive or negative feelings. Using scikit-learn makes it easy to build a simple model for this.

To find out if customer reviews are happy or unhappy.
To check if social media posts are positive or negative about a topic.
To analyze feedback from surveys quickly.
To sort emails or messages by mood.
To monitor brand reputation by scanning online comments.
Syntax
ML Python
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Prepare text data and labels
texts = ["I love this!", "This is bad.", "Amazing product", "Not good", "I am happy"]
labels = [1, 0, 1, 0, 1]  # 1=positive, 0=negative

# Convert text to numbers
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

# Train model
model = MultinomialNB()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Check accuracy
accuracy = accuracy_score(y_test, predictions)

CountVectorizer changes words into numbers the model can understand.

MultinomialNB is a simple and effective model for text classification.

Examples
This removes common English words like 'the' or 'and' to focus on important words.
ML Python
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(texts)
Alpha controls smoothing to handle words not seen in training.
ML Python
model = MultinomialNB(alpha=0.5)
model.fit(X_train, y_train)
Predict sentiment for new text by transforming it first.
ML Python
predictions = model.predict(vectorizer.transform(["I hate this"]))
Sample Model

This program trains a simple model to tell if text is positive or negative. It shows accuracy and compares predictions to actual labels.

ML Python
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Sample text data and labels
texts = [
    "I love this product",
    "This is the worst thing ever",
    "Absolutely fantastic experience",
    "I hate it",
    "Not good at all",
    "I am very happy",
    "Terrible service",
    "Best purchase I've made",
    "Awful, will not buy again",
    "Really pleased with this"
]
labels = [1, 0, 1, 0, 0, 1, 0, 1, 0, 1]  # 1=positive, 0=negative

# Convert text to numeric features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=1)

# Train the Naive Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)

print(f"Accuracy: {accuracy:.2f}")
print("Test predictions:", predictions)
print("Actual labels:  ", y_test)
OutputSuccess
Important Notes

More data usually means better results.

Try different models or add more text cleaning for improvement.

Accuracy shows how often the model guesses right.

Summary

Sentiment analysis finds if text is positive or negative.

Scikit-learn helps turn words into numbers and trains a simple model.

Check model accuracy to see how well it works.