Keras is used to build and train models easily. The key metrics depend on the task: for classification, accuracy, precision, recall, and F1 score matter. For regression, mean squared error or mean absolute error matter. These metrics tell us how well the model learned using Keras.
Keras as TensorFlow's high-level API - Model Metrics & Evaluation
For classification tasks, Keras models often use a confusion matrix to show results:
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | TP | FN
Negative | FP | TN
This helps calculate precision, recall, and accuracy from Keras model predictions.
When using Keras for classification, precision and recall trade off:
- High precision: Few false alarms. Good for spam filters so real emails aren't marked spam.
- High recall: Few missed positives. Good for medical tests so sick patients aren't missed.
Keras lets you tune models to balance these by changing thresholds or loss functions.
Using Keras, a good classification model might have:
- Accuracy above 85%
- Precision and recall above 80%
- F1 score close to precision and recall
Bad models have low accuracy (near random), or very unbalanced precision and recall (e.g., 95% precision but 10% recall).
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced.
- Data leakage: If test data leaks into training, metrics look falsely good.
- Overfitting: High training accuracy but low test accuracy means model memorized data, not learned.
- Ignoring recall or precision: Only looking at accuracy can hide poor performance on important classes.
Your Keras model has 98% accuracy but 12% recall on fraud cases. Is it good for production? Why not?
Answer: No, it is not good. The model misses 88% of fraud cases (low recall), which is dangerous. High accuracy is misleading because fraud is rare. You need better recall to catch fraud.