In binary classification, the main goal is to correctly separate two groups, like "spam" vs "not spam" emails. The key metrics are Precision, Recall, and F1 score. Precision tells us how many predicted positives are actually correct. Recall tells us how many real positives we found. F1 score balances both. We also look at Accuracy to see overall correctness, but it can be misleading if classes are uneven.
Binary classification model in TensorFlow - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Confusion Matrix:
Actual Positive Actual Negative
Predicted Positive TP = 70 FP = 10
Predicted Negative FN = 20 TN = 100
Total samples = TP + FP + FN + TN = 70 + 10 + 20 + 100 = 200
From this matrix, we calculate:
- Precision = TP / (TP + FP) = 70 / (70 + 10) = 0.875
- Recall = TP / (TP + FN) = 70 / (70 + 20) = 0.778
- F1 score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.823
- Accuracy = (TP + TN) / Total = (70 + 100) / 200 = 0.85
Imagine a spam filter. High precision means few good emails are wrongly marked as spam (low false alarms). High recall means catching most spam emails. If we focus only on recall, many good emails might be lost. So, for spam filters, precision is more important.
Now think about a cancer detector. Missing a cancer case (false negative) is very bad. So, high recall is critical to catch all sick patients, even if some healthy people get extra tests (lower precision). Here, recall matters more.
Good metrics for binary classification depend on the problem:
- Good: Precision and recall above 0.8, balanced F1 score, and accuracy above 0.8 usually mean the model works well.
- Bad: High accuracy but very low recall (e.g., 0.98 accuracy but 0.1 recall) means the model misses many positives. Or high recall but very low precision means many false alarms.
Always check precision and recall together, not just accuracy.
- Accuracy paradox: In imbalanced data, a model predicting only the majority class can have high accuracy but be useless.
- Data leakage: When test data leaks into training, metrics look unrealistically good.
- Overfitting indicators: Very high training accuracy but low test accuracy means the model memorizes training data and won't generalize.
- Ignoring class imbalance: Not adjusting metrics or thresholds when classes are uneven can mislead evaluation.
Your binary classification model has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production? Why or why not?
Answer: No, it is not good. The model misses 88% of positive cases, which is very risky for fraud detection. High accuracy is misleading here because most data is negative. You need to improve recall to catch more fraud cases.
Practice
Solution
Step 1: Understand output layer role in binary classification
The output layer must produce a probability between 0 and 1 to represent two classes.Step 2: Identify suitable activation function
Sigmoid activation compresses output to range [0, 1], perfect for binary decisions.Final Answer:
Sigmoid -> Option DQuick Check:
Binary output needs sigmoid = Sigmoid [OK]
- Using softmax for binary output
- Using ReLU which outputs unbounded values
- Using tanh which outputs between -1 and 1
Solution
Step 1: Identify appropriate loss for binary classification
Binary classification requires 'binary_crossentropy' loss to measure error correctly.Step 2: Check optimizer and metrics
'adam' optimizer and 'accuracy' metric are standard choices for training and evaluation.Final Answer:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) -> Option AQuick Check:
Binary loss = binary_crossentropy [OK]
- Using categorical_crossentropy for binary tasks
- Using mean_squared_error which is for regression
- Choosing hinge loss which is for SVMs
model = tf.keras.Sequential([ tf.keras.layers.Dense(10, activation='relu', input_shape=(5,)), tf.keras.layers.Dense(1, activation='sigmoid') ])
Solution
Step 1: Analyze the last layer configuration
The last Dense layer has 1 unit and sigmoid activation, so output shape is (batch_size, 1).Step 2: Understand batch dimension placeholder
TensorFlow uses None for batch size, so output shape is (None, 1).Final Answer:
(None, 1) -> Option AQuick Check:
Output units = 1 means shape = (None, 1) [OK]
- Confusing input shape with output shape
- Ignoring batch size dimension
- Assuming output shape is (1,) without batch
Solution
Step 1: Identify the cause of poor accuracy
Using categorical_crossentropy loss with a single sigmoid output causes wrong loss calculation.Step 2: Apply correct loss function
Switching to binary_crossentropy aligns loss with sigmoid output for binary classification.Final Answer:
Use binary_crossentropy loss instead of categorical_crossentropy -> Option BQuick Check:
Loss must match output activation [OK]
- Using softmax for binary output
- Removing output activation causing invalid probabilities
- Assuming batch size alone fixes accuracy
Solution
Step 1: Choose model complexity for dataset size
Two layers with relu then sigmoid balance learning capacity and binary output.Step 2: Select correct loss and optimizer
Binary_crossentropy fits binary tasks; adam optimizer adapts well for small datasets.Final Answer:
Sequential model with two Dense layers (10 units relu, then 1 unit sigmoid), compile with binary_crossentropy and adam -> Option CQuick Check:
Two layers + sigmoid + binary_crossentropy = Best practice [OK]
- Using softmax for binary classification
- Using tanh output activation
- Using mean_squared_error loss for classification
