MlopsProgramBeginner · 2 min read

Python sklearn Program to Analyze Sentiment with ML

Use CountVectorizer to convert text to numbers and LogisticRegression from sklearn to train a sentiment model; for example, model.fit(vectorizer.fit_transform(texts), labels) trains the model and model.predict(vectorizer.transform(new_texts)) predicts sentiment.

📋

Examples

InputI love this product

OutputPositive sentiment predicted

InputThis is the worst movie ever

OutputNegative sentiment predicted

Input

OutputNeutral or unable to predict sentiment

🧠

How to Think About It

To analyze sentiment, first convert text into numbers that a computer can understand using a tool like CountVectorizer. Then, train a simple model like LogisticRegression on labeled examples of positive and negative texts. Finally, use the trained model to predict the sentiment of new texts.

📐

Algorithm

Collect sample texts labeled as positive or negative

Convert texts into numeric features using CountVectorizer

Train a Logistic Regression model on these features and labels

Use the trained model to predict sentiment on new texts

Output the predicted sentiment as positive or negative

💻

Code

sklearn

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['I love this', 'This is bad', 'Amazing experience', 'Worst ever']
labels = [1, 0, 1, 0]  # 1=positive, 0=negative

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression(max_iter=1000)
model.fit(X, labels)

new_texts = ['I hate this', 'What a great day']
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

for text, pred in zip(new_texts, predictions):
    sentiment = 'Positive' if pred == 1 else 'Negative'
    print(f'Text: "{text}" -> Sentiment: {sentiment}')

Output

Text: "I hate this" -> Sentiment: Negative Text: "What a great day" -> Sentiment: Positive

🔍

Dry Run

Let's trace the example texts ['I hate this', 'What a great day'] through the code

Vectorize new texts

Convert ['I hate this', 'What a great day'] into numeric features using the learned vocabulary

Predict sentiment

Model predicts [0, 1] meaning Negative for first and Positive for second text

Print results

Output 'Negative' for 'I hate this' and 'Positive' for 'What a great day'

Text	Vectorized Features	Prediction	Sentiment
I hate this	[0 0 1 1 0 1]	0	Negative
What a great day	[1 1 0 0 1 0]	1	Positive

💡

Why This Works

Step 1: Text to numbers

We use CountVectorizer to turn words into numbers so the model can understand text.

Step 2: Train model

The LogisticRegression model learns patterns from labeled examples to distinguish positive and negative sentiment.

Step 3: Predict sentiment

The model uses learned patterns to predict if new texts are positive or negative.

🔄

Alternative Approaches

Use TfidfVectorizer instead of CountVectorizer

sklearn

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

texts = ['I love this', 'This is bad', 'Amazing experience', 'Worst ever']
labels = [1, 0, 1, 0]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

model = LogisticRegression(max_iter=1000)
model.fit(X, labels)

new_texts = ['I hate this', 'What a great day']
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

for text, pred in zip(new_texts, predictions):
    sentiment = 'Positive' if pred == 1 else 'Negative'
    print(f'Text: "{text}" -> Sentiment: {sentiment}')

TfidfVectorizer weighs words by importance, often improving accuracy but slightly increasing complexity.

Use Multinomial Naive Bayes instead of Logistic Regression

sklearn

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

texts = ['I love this', 'This is bad', 'Amazing experience', 'Worst ever']
labels = [1, 0, 1, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

new_texts = ['I hate this', 'What a great day']
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

for text, pred in zip(new_texts, predictions):
    sentiment = 'Positive' if pred == 1 else 'Negative'
    print(f'Text: "{text}" -> Sentiment: {sentiment}')

Naive Bayes is simple and fast, good for small datasets but may be less accurate than Logistic Regression.

⚡

Complexity: O(nm) time, O(nm) space

Time Complexity

Vectorizing n texts with m unique words takes O(n*m), and training Logistic Regression depends on iterations but roughly O(n*m).

Space Complexity

Storing the vectorized matrix requires O(n*m) space, where n is number of texts and m is vocabulary size.

Which Approach is Fastest?

Naive Bayes trains faster than Logistic Regression but may be less accurate; TfidfVectorizer adds slight overhead but can improve results.

Approach	Time	Space	Best For
CountVectorizer + LogisticRegression	O(n*m)	O(n*m)	Balanced accuracy and speed
TfidfVectorizer + LogisticRegression	O(n*m)	O(n*m)	Better accuracy with more computation
CountVectorizer + MultinomialNB	O(n*m)	O(n*m)	Fast training on small datasets

💡

Always preprocess text by lowercasing and removing punctuation before vectorizing for better results.

⚠️

Beginners often forget to transform new texts with the same vectorizer used for training, causing errors or wrong predictions.

Examples

How to Think About It

Algorithm

Code

Dry Run

Vectorize new texts

Predict sentiment

Print results

Why This Works

Step 1: Text to numbers

Step 2: Train model

Step 3: Predict sentiment

Alternative Approaches

Complexity: O(n*m) time, O(n*m) space

Time Complexity

Space Complexity

Which Approach is Fastest?

Complexity: O(nm) time, O(nm) space