What is Log Loss in Machine Learning with Python | sklearn
log loss measures how well a classification model predicts probabilities for classes. It penalizes wrong predictions with a higher score, so lower log loss means better predictions. In Python, you can calculate it using sklearn.metrics.log_loss.How It Works
Log loss, also called logistic loss or cross-entropy loss, measures the difference between the true labels and the predicted probabilities of a classification model. Imagine you are guessing the chance of rain tomorrow. If you say 90% chance but it doesn't rain, you are more wrong than if you said 60%. Log loss captures this idea by penalizing confident wrong guesses more than unsure ones.
It works by taking the predicted probability for the true class and calculating the negative logarithm of that probability. If the model predicts a probability close to 1 for the correct class, the log loss is near zero (good). If it predicts a probability close to 0, the log loss becomes very large (bad). This helps models learn to give accurate probability estimates, not just correct labels.
Example
This example shows how to calculate log loss in Python using sklearn. We have true labels and predicted probabilities for two classes. The log_loss function returns a number showing how well the predictions match the true labels.
from sklearn.metrics import log_loss # True labels (0 or 1) y_true = [0, 1, 1, 0] # Predicted probabilities for class 1 # Each value is the model's confidence that the label is 1 # For example, 0.1 means 10% chance of class 1 y_pred = [0.1, 0.9, 0.8, 0.3] # Calculate log loss loss = log_loss(y_true, y_pred) print(f"Log Loss: {loss:.4f}")
When to Use
Use log loss when you want to evaluate how well a classification model predicts probabilities, not just labels. It is especially useful for models that output probabilities, like logistic regression or neural networks.
For example, in medical diagnosis, predicting the chance of disease accurately is important. Log loss helps measure if the model's probability estimates are reliable. It is also used in competitions like Kaggle to rank models based on their probability predictions.
Key Points
- Log loss measures the accuracy of predicted probabilities for classification.
- Lower log loss means better probability predictions.
- It penalizes confident wrong predictions more than unsure ones.
- Use
sklearn.metrics.log_lossin Python to calculate it easily.