Bird
Raised Fist0
TensorFlowml~20 mins

Binary classification model in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Binary classification model
Problem:Build a model to classify if a flower is Iris Setosa or not based on petal and sepal measurements.
Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.60
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting so that validation accuracy improves to above 85% while keeping training accuracy below 92%.
You can only modify the model architecture and training parameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
TensorFlow
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load data
iris = load_iris()
X = iris.data
# Binary target: 1 if Setosa, else 0
y = (iris.target == 0).astype(int)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)

# Build model with dropout and smaller layers
model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Train model
history = model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_val, y_val), callbacks=[early_stop], verbose=0)

# Evaluate
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)

print(f'Training accuracy: {train_acc*100:.2f}%, Validation accuracy: {val_acc*100:.2f}%')
print(f'Training loss: {train_loss:.3f}, Validation loss: {val_loss:.3f}')
Added dropout layers with rate 0.3 after dense layers to reduce overfitting.
Reduced number of neurons from larger layers to 16 and 8 to simplify the model.
Added early stopping to stop training when validation loss stops improving.
Set learning rate to 0.001 for stable training.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.60

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.30

Adding dropout and early stopping reduces overfitting by preventing the model from memorizing training data, which improves validation accuracy and generalization.
Bonus Experiment
Try using L2 regularization instead of dropout to reduce overfitting and compare results.
💡 Hint
Add kernel_regularizer=tf.keras.regularizers.l2(0.01) to Dense layers and remove dropout layers.

Practice

(1/5)
1. What activation function is commonly used in the output layer of a binary classification model in TensorFlow?
easy
A. Tanh
B. ReLU
C. Softmax
D. Sigmoid

Solution

  1. Step 1: Understand output layer role in binary classification

    The output layer must produce a probability between 0 and 1 to represent two classes.
  2. Step 2: Identify suitable activation function

    Sigmoid activation compresses output to range [0, 1], perfect for binary decisions.
  3. Final Answer:

    Sigmoid -> Option D
  4. Quick Check:

    Binary output needs sigmoid = Sigmoid [OK]
Hint: Binary output needs sigmoid activation [OK]
Common Mistakes:
  • Using softmax for binary output
  • Using ReLU which outputs unbounded values
  • Using tanh which outputs between -1 and 1
2. Which of the following is the correct way to compile a binary classification model in TensorFlow?
easy
A. model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
B. model.compile(optimizer='rmsprop', loss='hinge', metrics=['accuracy'])
C. model.compile(optimizer='sgd', loss='mean_squared_error', metrics=['accuracy'])
D. model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Solution

  1. Step 1: Identify appropriate loss for binary classification

    Binary classification requires 'binary_crossentropy' loss to measure error correctly.
  2. Step 2: Check optimizer and metrics

    'adam' optimizer and 'accuracy' metric are standard choices for training and evaluation.
  3. Final Answer:

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) -> Option A
  4. Quick Check:

    Binary loss = binary_crossentropy [OK]
Hint: Use binary_crossentropy loss for binary classification [OK]
Common Mistakes:
  • Using categorical_crossentropy for binary tasks
  • Using mean_squared_error which is for regression
  • Choosing hinge loss which is for SVMs
3. Given the following TensorFlow model code, what will be the shape of the output layer?
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, activation='relu', input_shape=(5,)),
  tf.keras.layers.Dense(1, activation='sigmoid')
])
medium
A. (None, 1)
B. (None, 10)
C. (5, 1)
D. (1,)

Solution

  1. Step 1: Analyze the last layer configuration

    The last Dense layer has 1 unit and sigmoid activation, so output shape is (batch_size, 1).
  2. Step 2: Understand batch dimension placeholder

    TensorFlow uses None for batch size, so output shape is (None, 1).
  3. Final Answer:

    (None, 1) -> Option A
  4. Quick Check:

    Output units = 1 means shape = (None, 1) [OK]
Hint: Output shape matches last layer units with batch size None [OK]
Common Mistakes:
  • Confusing input shape with output shape
  • Ignoring batch size dimension
  • Assuming output shape is (1,) without batch
4. You trained a binary classification model but the accuracy stays around 50% after many epochs. Which fix is most likely to improve the model?
medium
A. Change the output activation to softmax
B. Use binary_crossentropy loss instead of categorical_crossentropy
C. Increase the batch size to 1024
D. Remove the activation function from the output layer

Solution

  1. Step 1: Identify the cause of poor accuracy

    Using categorical_crossentropy loss with a single sigmoid output causes wrong loss calculation.
  2. Step 2: Apply correct loss function

    Switching to binary_crossentropy aligns loss with sigmoid output for binary classification.
  3. Final Answer:

    Use binary_crossentropy loss instead of categorical_crossentropy -> Option B
  4. Quick Check:

    Loss must match output activation [OK]
Hint: Match loss to output activation for correct training [OK]
Common Mistakes:
  • Using softmax for binary output
  • Removing output activation causing invalid probabilities
  • Assuming batch size alone fixes accuracy
5. You want to build a binary classification model to predict if an email is spam or not. Your dataset has 1000 samples with 20 features each. Which model architecture and compile settings are best?
hard
A. Sequential model with one Dense layer (1 unit, sigmoid), compile with binary_crossentropy and adam
B. Sequential model with one Dense layer (20 units, softmax), compile with categorical_crossentropy and sgd
C. Sequential model with two Dense layers (10 units relu, then 1 unit sigmoid), compile with binary_crossentropy and adam
D. Sequential model with three Dense layers (64 relu, 32 relu, 1 tanh), compile with mean_squared_error and rmsprop

Solution

  1. Step 1: Choose model complexity for dataset size

    Two layers with relu then sigmoid balance learning capacity and binary output.
  2. Step 2: Select correct loss and optimizer

    Binary_crossentropy fits binary tasks; adam optimizer adapts well for small datasets.
  3. Final Answer:

    Sequential model with two Dense layers (10 units relu, then 1 unit sigmoid), compile with binary_crossentropy and adam -> Option C
  4. Quick Check:

    Two layers + sigmoid + binary_crossentropy = Best practice [OK]
Hint: Use relu hidden layers + sigmoid output + binary_crossentropy [OK]
Common Mistakes:
  • Using softmax for binary classification
  • Using tanh output activation
  • Using mean_squared_error loss for classification