0
0
MlopsConceptBeginner · 3 min read

Naive Bayes Classifier in Python with sklearn Explained

A Naive Bayes classifier in Python is a simple machine learning model that uses probability to classify data based on Bayes' theorem with an assumption of feature independence. In sklearn, it is easy to implement with classes like GaussianNB for continuous data and MultinomialNB for count data.
⚙️

How It Works

Imagine you want to guess the type of fruit based on its color and size. The Naive Bayes classifier looks at each feature (color, size) separately and calculates the chance of the fruit being a certain type assuming these features do not affect each other. This is the "naive" part — it assumes features are independent, which is often not true but works well in practice.

It uses Bayes' theorem, which updates the probability of a class (like fruit type) based on new evidence (features). The model calculates the probability of each class given the features and picks the class with the highest probability as the prediction.

This method is fast, needs little data to train, and works well for text classification, spam detection, and other tasks where features can be treated as independent.

💻

Example

This example shows how to use GaussianNB from sklearn to classify simple data points into two classes.

python
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample data: features are [height, weight]
X = [[180, 80], [160, 60], [170, 65], [150, 50], [165, 55], [185, 90]]
# Labels: 0 = Class A, 1 = Class B
y = [0, 1, 0, 1, 1, 0]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Create and train the model
model = GaussianNB()
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Predictions: {y_pred}")
print(f"Accuracy: {accuracy:.2f}")
Output
Predictions: [1 0] Accuracy: 1.00
🎯

When to Use

Use Naive Bayes classifiers when you need a quick, simple model that works well with small datasets and when features are mostly independent. It is especially good for text classification tasks like spam detection, sentiment analysis, and document categorization.

It is less effective when features strongly depend on each other or when you need very high accuracy with complex data patterns.

Key Points

  • Naive Bayes assumes features are independent, which simplifies calculations.
  • It uses Bayes' theorem to calculate class probabilities.
  • sklearn provides easy-to-use Naive Bayes classes like GaussianNB and MultinomialNB.
  • Works well for text data and small datasets.
  • Fast to train and predict.

Key Takeaways

Naive Bayes classifier uses probability and assumes feature independence to classify data.
It is simple, fast, and effective for text classification and small datasets.
In Python, sklearn's GaussianNB is commonly used for continuous data.
Best used when features are mostly independent and quick results are needed.
Not ideal for complex feature dependencies or very high accuracy demands.