What is Naive Bayes for text in NLP?

NLPml~5 mins

Naive Bayes for text in NLP

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Naive Bayes helps us quickly guess the category of text, like spam or not spam, by using simple math rules.

To filter spam emails from normal emails.

To sort customer reviews into positive or negative groups.

To detect the topic of news articles automatically.

To classify short messages or tweets by subject.

To help chatbots understand user intent from text.

Syntax

NLP

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Create a vectorizer to turn text into numbers
vectorizer = CountVectorizer()

# Convert text data to number counts
X_train_counts = vectorizer.fit_transform(texts)

# Create the Naive Bayes model
model = MultinomialNB()

# Train the model with text numbers and labels
model.fit(X_train_counts, labels)

# Predict new text category
X_new_counts = vectorizer.transform(new_texts)
predicted = model.predict(X_new_counts)

Use CountVectorizer to convert text into numbers that the model can understand.

MultinomialNB works well for text data with word counts.

Examples

This example trains a simple model to classify short movie reviews as positive or negative.

NLP

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ['I love this movie', 'This movie is bad']
labels = ['positive', 'negative']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

new_text = ['I love it']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction)

This example shows how to detect spam messages using Naive Bayes.

NLP

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ['spam message here', 'hello friend']
labels = ['spam', 'not spam']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

new_text = ['free money']
X_new = vectorizer.transform(new_text)
prediction = model.predict(X_new)
print(prediction)

Sample Model

This program trains a Naive Bayes model on small text data to classify positive or negative sentiment. It shows the accuracy and predictions on test data.

NLP

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample text data and labels
texts = [
    'I love this phone',
    'This movie is great',
    'I hate this movie',
    'This phone is bad',
    'I enjoy watching movies',
    'I dislike this phone',
    'This movie is fantastic',
    'This phone is terrible'
]
labels = ['positive', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.25, random_state=42)

# Convert text to number counts
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)

# Create and train the Naive Bayes model
model = MultinomialNB()
model.fit(X_train_counts, y_train)

# Predict on test data
y_pred = model.predict(X_test_counts)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Test accuracy: {accuracy:.2f}")
print(f"Predictions: {y_pred}")

OutputSuccess

Important Notes

Naive Bayes assumes words appear independently, which is a simple but effective guess for text.

More data usually helps the model learn better categories.

Text must be converted to numbers before training the model.

Summary

Naive Bayes is a fast way to classify text into categories.

It uses word counts and simple math to guess the label.

Works well for spam detection, sentiment analysis, and topic sorting.

Practice

(1/5)

1. What is the main assumption behind the Naive Bayes algorithm when used for text classification?

easy

A. Words always appear in a fixed order

B. Words in a document are independent of each other given the class label

C. All documents have the same length

D. The frequency of words does not affect classification

Naive Bayes for text in NLP

Start learning this pattern below

Practice

Solution

Step 1: Understand Naive Bayes assumption

Step 2: Relate assumption to text classification

Final Answer:

Quick Check:

Solution

Step 1: Recall Naive Bayes formula for text

Step 2: Match formula to options

Final Answer:

Quick Check:

Solution

Step 1: Understand training data and labels

Step 2: Analyze prediction input

Final Answer:

Quick Check:

Solution

Step 1: Analyze training and input data

Step 2: Understand Naive Bayes behavior with mixed words

Final Answer:

Quick Check:

Solution

Step 1: Identify problem with rare words

Step 2: Apply Laplace smoothing

Final Answer:

Quick Check: