Naive Bayes helps us quickly guess the category of text, like spam or not spam, by using simple math rules.
Naive Bayes for text in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Syntax
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer # Create a vectorizer to turn text into numbers vectorizer = CountVectorizer() # Convert text data to number counts X_train_counts = vectorizer.fit_transform(texts) # Create the Naive Bayes model model = MultinomialNB() # Train the model with text numbers and labels model.fit(X_train_counts, labels) # Predict new text category X_new_counts = vectorizer.transform(new_texts) predicted = model.predict(X_new_counts)
Use CountVectorizer to convert text into numbers that the model can understand.
MultinomialNB works well for text data with word counts.
Examples
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer texts = ['I love this movie', 'This movie is bad'] labels = ['positive', 'negative'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = ['I love it'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction)
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer texts = ['spam message here', 'hello friend'] labels = ['spam', 'not spam'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = ['free money'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction)
Sample Model
This program trains a Naive Bayes model on small text data to classify positive or negative sentiment. It shows the accuracy and predictions on test data.
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Sample text data and labels texts = [ 'I love this phone', 'This movie is great', 'I hate this movie', 'This phone is bad', 'I enjoy watching movies', 'I dislike this phone', 'This movie is fantastic', 'This phone is terrible' ] labels = ['positive', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative'] # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.25, random_state=42) # Convert text to number counts vectorizer = CountVectorizer() X_train_counts = vectorizer.fit_transform(X_train) X_test_counts = vectorizer.transform(X_test) # Create and train the Naive Bayes model model = MultinomialNB() model.fit(X_train_counts, y_train) # Predict on test data y_pred = model.predict(X_test_counts) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Test accuracy: {accuracy:.2f}") print(f"Predictions: {y_pred}")
Important Notes
Naive Bayes assumes words appear independently, which is a simple but effective guess for text.
More data usually helps the model learn better categories.
Text must be converted to numbers before training the model.
Summary
Naive Bayes is a fast way to classify text into categories.
It uses word counts and simple math to guess the label.
Works well for spam detection, sentiment analysis, and topic sorting.
Practice
1. What is the main assumption behind the Naive Bayes algorithm when used for text classification?
easy
Solution
Step 1: Understand Naive Bayes assumption
Naive Bayes assumes that each feature (word) is independent of others given the class label.Step 2: Relate assumption to text classification
This means the presence or absence of one word does not affect another word's probability in the same document for classification.Final Answer:
Words in a document are independent of each other given the class label -> Option BQuick Check:
Naive Bayes = word independence assumption [OK]
Hint: Naive Bayes treats words as independent features [OK]
Common Mistakes:
- Thinking word order matters
- Assuming word frequency is ignored
- Believing documents must be same length
2. Which of the following is the correct way to calculate the probability of a document belonging to a class using Naive Bayes?
easy
Solution
Step 1: Recall Naive Bayes formula for text
The probability of a class given a document is proportional to the prior probability of the class times the product of the conditional probabilities of each word given the class.Step 2: Match formula to options
P(class) * \prod_{word} P(word|class) correctly shows multiplication (product) of P(word|class) terms with P(class).Final Answer:
P(class) * \prod_{word} P(word|class) -> Option CQuick Check:
Naive Bayes uses product of word probabilities [OK]
Hint: Multiply class prior by product of word likelihoods [OK]
Common Mistakes:
- Adding probabilities instead of multiplying
- Dividing probabilities incorrectly
- Subtracting probabilities
3. Given the following code snippet using sklearn's MultinomialNB for text classification, what will be the predicted class for the input text
['love this movie']?
from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB texts = ['I love this movie', 'I hate this movie'] labels = ['positive', 'negative'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = vectorizer.transform(['love this movie']) prediction = model.predict(new_text) print(prediction[0])
medium
Solution
Step 1: Understand training data and labels
The model is trained on two texts: one labeled 'positive' and one 'negative'. The words 'love' and 'hate' are key indicators.Step 2: Analyze prediction input
The input text 'love this movie' contains the word 'love' which appeared in the positive example, so the model predicts 'positive'.Final Answer:
positive -> Option DQuick Check:
Word 'love' matches positive class [OK]
Hint: Check which class words in input appeared during training [OK]
Common Mistakes:
- Confusing label names with words
- Ignoring vectorizer transformation
- Predicting word instead of class
4. Consider this code snippet using Naive Bayes for text classification:
from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB texts = ['spam spam spam', 'ham ham ham'] labels = ['spam', 'ham'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = vectorizer.transform(['spam ham spam']) prediction = model.predict(new_text) print(prediction[0])The output is unexpected. What is the likely cause?
medium
Solution
Step 1: Analyze training and input data
The training data has clear spam and ham texts. The input text mixes words from both classes.Step 2: Understand Naive Bayes behavior with mixed words
Naive Bayes calculates probabilities for each class. Mixed words can cause the model to be uncertain or pick the class with higher prior or likelihood.Final Answer:
The input text contains words from both classes causing confusion -> Option AQuick Check:
Mixed class words confuse Naive Bayes prediction [OK]
Hint: Mixed class words can confuse Naive Bayes predictions [OK]
Common Mistakes:
- Assuming unseen words cause error
- Thinking vectorizer was not fitted
- Believing labels must be numeric
5. You want to improve a Naive Bayes text classifier that often misclassifies short texts with rare words. Which approach is best to reduce this problem?
hard
Solution
Step 1: Identify problem with rare words
Rare or unseen words can cause zero probabilities, making Naive Bayes assign zero probability to classes incorrectly.Step 2: Apply Laplace smoothing
Laplace smoothing adds a small count to all words, preventing zero probabilities and improving classification on rare words.Final Answer:
Use Laplace smoothing to handle rare or unseen words -> Option AQuick Check:
Laplace smoothing fixes zero probability issues [OK]
Hint: Add smoothing to avoid zero probabilities for rare words [OK]
Common Mistakes:
- Thinking removing stop words fixes rare word issue
- Believing more classes always improve accuracy
- Ignoring smoothing effects on probabilities
