0
0
MlopsProgramBeginner · 2 min read

Python sklearn Program to Analyze Sentiment with ML

Use CountVectorizer to convert text to numbers and LogisticRegression from sklearn to train a sentiment model; for example, model.fit(vectorizer.fit_transform(texts), labels) trains the model and model.predict(vectorizer.transform(new_texts)) predicts sentiment.
📋

Examples

InputI love this product
OutputPositive sentiment predicted
InputThis is the worst movie ever
OutputNegative sentiment predicted
Input
OutputNeutral or unable to predict sentiment
🧠

How to Think About It

To analyze sentiment, first convert text into numbers that a computer can understand using a tool like CountVectorizer. Then, train a simple model like LogisticRegression on labeled examples of positive and negative texts. Finally, use the trained model to predict the sentiment of new texts.
📐

Algorithm

1
Collect sample texts labeled as positive or negative
2
Convert texts into numeric features using CountVectorizer
3
Train a Logistic Regression model on these features and labels
4
Use the trained model to predict sentiment on new texts
5
Output the predicted sentiment as positive or negative
💻

Code

sklearn
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['I love this', 'This is bad', 'Amazing experience', 'Worst ever']
labels = [1, 0, 1, 0]  # 1=positive, 0=negative

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression(max_iter=1000)
model.fit(X, labels)

new_texts = ['I hate this', 'What a great day']
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

for text, pred in zip(new_texts, predictions):
    sentiment = 'Positive' if pred == 1 else 'Negative'
    print(f'Text: "{text}" -> Sentiment: {sentiment}')
Output
Text: "I hate this" -> Sentiment: Negative Text: "What a great day" -> Sentiment: Positive
🔍

Dry Run

Let's trace the example texts ['I hate this', 'What a great day'] through the code

1

Vectorize new texts

Convert ['I hate this', 'What a great day'] into numeric features using the learned vocabulary

2

Predict sentiment

Model predicts [0, 1] meaning Negative for first and Positive for second text

3

Print results

Output 'Negative' for 'I hate this' and 'Positive' for 'What a great day'

TextVectorized FeaturesPredictionSentiment
I hate this[0 0 1 1 0 1]0Negative
What a great day[1 1 0 0 1 0]1Positive
💡

Why This Works

Step 1: Text to numbers

We use CountVectorizer to turn words into numbers so the model can understand text.

Step 2: Train model

The LogisticRegression model learns patterns from labeled examples to distinguish positive and negative sentiment.

Step 3: Predict sentiment

The model uses learned patterns to predict if new texts are positive or negative.

🔄

Alternative Approaches

Use TfidfVectorizer instead of CountVectorizer
sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['I love this', 'This is bad', 'Amazing experience', 'Worst ever']
labels = [1, 0, 1, 0]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression(max_iter=1000)
model.fit(X, labels)

new_texts = ['I hate this', 'What a great day']
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

for text, pred in zip(new_texts, predictions):
    sentiment = 'Positive' if pred == 1 else 'Negative'
    print(f'Text: "{text}" -> Sentiment: {sentiment}')
TfidfVectorizer weighs words by importance, often improving accuracy but slightly increasing complexity.
Use Multinomial Naive Bayes instead of Logistic Regression
sklearn
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

texts = ['I love this', 'This is bad', 'Amazing experience', 'Worst ever']
labels = [1, 0, 1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

new_texts = ['I hate this', 'What a great day']
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

for text, pred in zip(new_texts, predictions):
    sentiment = 'Positive' if pred == 1 else 'Negative'
    print(f'Text: "{text}" -> Sentiment: {sentiment}')
Naive Bayes is simple and fast, good for small datasets but may be less accurate than Logistic Regression.

Complexity: O(n*m) time, O(n*m) space

Time Complexity

Vectorizing n texts with m unique words takes O(n*m), and training Logistic Regression depends on iterations but roughly O(n*m).

Space Complexity

Storing the vectorized matrix requires O(n*m) space, where n is number of texts and m is vocabulary size.

Which Approach is Fastest?

Naive Bayes trains faster than Logistic Regression but may be less accurate; TfidfVectorizer adds slight overhead but can improve results.

ApproachTimeSpaceBest For
CountVectorizer + LogisticRegressionO(n*m)O(n*m)Balanced accuracy and speed
TfidfVectorizer + LogisticRegressionO(n*m)O(n*m)Better accuracy with more computation
CountVectorizer + MultinomialNBO(n*m)O(n*m)Fast training on small datasets
💡
Always preprocess text by lowercasing and removing punctuation before vectorizing for better results.
⚠️
Beginners often forget to transform new texts with the same vectorizer used for training, causing errors or wrong predictions.