Naive Bayes helps us quickly guess the category of text, like spam or not spam, by using simple math rules.
0
0
Naive Bayes for text in NLP
Introduction
To filter spam emails from normal emails.
To sort customer reviews into positive or negative groups.
To detect the topic of news articles automatically.
To classify short messages or tweets by subject.
To help chatbots understand user intent from text.
Syntax
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer # Create a vectorizer to turn text into numbers vectorizer = CountVectorizer() # Convert text data to number counts X_train_counts = vectorizer.fit_transform(texts) # Create the Naive Bayes model model = MultinomialNB() # Train the model with text numbers and labels model.fit(X_train_counts, labels) # Predict new text category X_new_counts = vectorizer.transform(new_texts) predicted = model.predict(X_new_counts)
Use CountVectorizer to convert text into numbers that the model can understand.
MultinomialNB works well for text data with word counts.
Examples
This example trains a simple model to classify short movie reviews as positive or negative.
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer texts = ['I love this movie', 'This movie is bad'] labels = ['positive', 'negative'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = ['I love it'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction)
This example shows how to detect spam messages using Naive Bayes.
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer texts = ['spam message here', 'hello friend'] labels = ['spam', 'not spam'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts) model = MultinomialNB() model.fit(X, labels) new_text = ['free money'] X_new = vectorizer.transform(new_text) prediction = model.predict(X_new) print(prediction)
Sample Model
This program trains a Naive Bayes model on small text data to classify positive or negative sentiment. It shows the accuracy and predictions on test data.
NLP
from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Sample text data and labels texts = [ 'I love this phone', 'This movie is great', 'I hate this movie', 'This phone is bad', 'I enjoy watching movies', 'I dislike this phone', 'This movie is fantastic', 'This phone is terrible' ] labels = ['positive', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative'] # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.25, random_state=42) # Convert text to number counts vectorizer = CountVectorizer() X_train_counts = vectorizer.fit_transform(X_train) X_test_counts = vectorizer.transform(X_test) # Create and train the Naive Bayes model model = MultinomialNB() model.fit(X_train_counts, y_train) # Predict on test data y_pred = model.predict(X_test_counts) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Test accuracy: {accuracy:.2f}") print(f"Predictions: {y_pred}")
OutputSuccess
Important Notes
Naive Bayes assumes words appear independently, which is a simple but effective guess for text.
More data usually helps the model learn better categories.
Text must be converted to numbers before training the model.
Summary
Naive Bayes is a fast way to classify text into categories.
It uses word counts and simple math to guess the label.
Works well for spam detection, sentiment analysis, and topic sorting.