0
0
TensorFlowml~20 mins

Validation split in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Validation split
Problem:You are training a neural network to classify images into 10 categories. Currently, you train the model on all data without separating validation data.
Current Metrics:Training accuracy: 95%, Validation accuracy: Not measured
Issue:Without a validation split, you cannot monitor how well the model generalizes to unseen data during training.
Your Task
Add a validation split of 20% during training and observe validation accuracy improving over epochs.
Use the same model architecture and dataset.
Do not change batch size or number of epochs.
Hint 1
Hint 2
Solution
TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models

# Load example dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build a simple CNN model
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model with validation split
history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2)

# Evaluate on test data
test_loss, test_acc = model.evaluate(x_test, y_test)

print(f'Test accuracy: {test_acc:.4f}')
Added 'validation_split=0.2' parameter in model.fit() to reserve 20% of training data for validation.
Monitored validation accuracy during training to check model generalization.
Results Interpretation

Before adding validation split: Training accuracy was 95%, but no validation accuracy was measured, so model generalization was unknown.

After adding validation split: Training accuracy is about 85%, validation accuracy about 80%, and test accuracy about 78%. This shows the model generalizes reasonably well and helps detect overfitting.

Using a validation split helps monitor how well the model performs on unseen data during training. It prevents overfitting by showing if training accuracy is much higher than validation accuracy.
Bonus Experiment
Try using a separate validation dataset instead of validation_split parameter.
💡 Hint
Manually split the training data into training and validation sets before calling model.fit(), then pass validation_data=(x_val, y_val).