0
0
NLPml~5 mins

Naive Bayes for text in NLP

Choose your learning style9 modes available
Introduction

Naive Bayes helps us quickly guess the category of text, like spam or not spam, by using simple math rules.

To filter spam emails from normal emails.
To sort customer reviews into positive or negative groups.
To detect the topic of news articles automatically.
To classify short messages or tweets by subject.
To help chatbots understand user intent from text.
Syntax
NLP
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Create a vectorizer to turn text into numbers
vectorizer = CountVectorizer()

# Convert text data to number counts
X_train_counts = vectorizer.fit_transform(texts)

# Create the Naive Bayes model
model = MultinomialNB()

# Train the model with text numbers and labels
model.fit(X_train_counts, labels)

# Predict new text category
X_new_counts = vectorizer.transform(new_texts)
predicted = model.predict(X_new_counts)

Use CountVectorizer to convert text into numbers that the model can understand.

MultinomialNB works well for text data with word counts.

Examples
This example trains a simple model to classify short movie reviews as positive or negative.
NLP
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ['I love this movie', 'This movie is bad']
labels = ['positive', 'negative']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

new_text = ['I love it']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction)
This example shows how to detect spam messages using Naive Bayes.
NLP
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ['spam message here', 'hello friend']
labels = ['spam', 'not spam']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

new_text = ['free money']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction)
Sample Model

This program trains a Naive Bayes model on small text data to classify positive or negative sentiment. It shows the accuracy and predictions on test data.

NLP
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample text data and labels
texts = [
    'I love this phone',
    'This movie is great',
    'I hate this movie',
    'This phone is bad',
    'I enjoy watching movies',
    'I dislike this phone',
    'This movie is fantastic',
    'This phone is terrible'
]
labels = ['positive', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.25, random_state=42)

# Convert text to number counts
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)

# Create and train the Naive Bayes model
model = MultinomialNB()
model.fit(X_train_counts, y_train)

# Predict on test data
y_pred = model.predict(X_test_counts)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Test accuracy: {accuracy:.2f}")
print(f"Predictions: {y_pred}")
OutputSuccess
Important Notes

Naive Bayes assumes words appear independently, which is a simple but effective guess for text.

More data usually helps the model learn better categories.

Text must be converted to numbers before training the model.

Summary

Naive Bayes is a fast way to classify text into categories.

It uses word counts and simple math to guess the label.

Works well for spam detection, sentiment analysis, and topic sorting.