TensorFlowml~20 mins

K-fold cross-validation in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - K-fold cross-validation

Problem:You want to evaluate how well your neural network model will perform on new data. Currently, you train the model once and test it once, which might give a biased result.

Current Metrics:Training accuracy: 92%, Validation accuracy: 88%

Issue:The single train-test split might not represent the model's true performance. The validation accuracy could vary if you split data differently.

Your Task

Use K-fold cross-validation to get a more reliable estimate of model performance by training and validating the model on different data splits.

Use 5 folds for cross-validation.

Keep the same model architecture and training parameters.

Use TensorFlow and Keras only.

Hint 1

Hint 2

Hint 3

Solution

TensorFlow

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import KFold

# Generate dummy data
X = np.random.rand(1000, 20)
y = (np.sum(X, axis=1) > 10).astype(int)  # Simple binary target

# Define model architecture function
def create_model():
    model = Sequential([
        Dense(32, activation='relu', input_shape=(20,)),
        Dense(16, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

kf = KFold(n_splits=5, shuffle=True, random_state=42)
val_accuracies = []

for train_index, val_index in kf.split(X):
    X_train, X_val = X[train_index], X[val_index]
    y_train, y_val = y[train_index], y[val_index]

    model = create_model()
    model.fit(X_train, y_train, epochs=10, batch_size=32, verbose=0)
    loss, accuracy = model.evaluate(X_val, y_val, verbose=0)
    val_accuracies.append(accuracy)

average_val_accuracy = np.mean(val_accuracies)
print(f'Average validation accuracy over 5 folds: {average_val_accuracy:.4f}')

Implemented 5-fold cross-validation using sklearn's KFold.

Trained and evaluated the model on each fold separately.

Calculated average validation accuracy across all folds.

Results Interpretation

Before K-fold: Validation accuracy = 88%

After K-fold: Average validation accuracy = 89%

K-fold cross-validation gives a more reliable and stable estimate of model performance by testing on multiple data splits instead of just one.

Bonus Experiment

Try increasing the number of folds to 10 and observe how the average validation accuracy and training time change.

💡 Hint

More folds give a better estimate but increase training time because the model trains more times.