What is K-fold cross-validation in TensorFlow?

TensorFlowml~5 mins

K-fold cross-validation in TensorFlow

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

K-fold cross-validation helps us check how well a machine learning model will work on new data. It splits data into parts to test the model multiple times.

When you want to see how well your model performs on different parts of your data.

When you have limited data and want to use it efficiently for training and testing.

When you want to avoid overfitting by testing the model on unseen data multiple times.

When comparing different models to find the best one.

When tuning model settings to get the best performance.

Syntax

TensorFlow

from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=True, random_state=42)

for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    # Train and test your model here

n_splits is how many parts to split the data into.

shuffle=True mixes data before splitting to get better random parts.

Examples

This splits data into 3 parts and prints the indexes for training and testing sets.

TensorFlow

kf = KFold(n_splits=3)
for train_idx, test_idx in kf.split(X):
    print('Train:', train_idx, 'Test:', test_idx)

This splits data into 4 parts with shuffling and prints sizes of train and test sets.

TensorFlow

kf = KFold(n_splits=4, shuffle=True, random_state=1)
for train_idx, test_idx in kf.split(X):
    print('Train size:', len(train_idx), 'Test size:', len(test_idx))

Sample Model

This code splits data into 4 parts, trains a simple neural network on 3 parts, and tests on the remaining part. It repeats this for all folds and prints loss and accuracy for each.

TensorFlow

import numpy as np
import tensorflow as tf
from sklearn.model_selection import KFold

# Create simple data
X = np.array([[i] for i in range(20)])
y = np.array([0 if i < 10 else 1 for i in range(20)])

kf = KFold(n_splits=4, shuffle=True, random_state=42)

fold = 1
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Build a simple model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(8, activation='relu', input_shape=(1,)),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    # Train model
    model.fit(X_train, y_train, epochs=10, verbose=0)

    # Evaluate model
    loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
    print(f'Fold {fold} - Loss: {loss:.4f}, Accuracy: {accuracy:.4f}')
    fold += 1

OutputSuccess

Important Notes

Always shuffle data before splitting to get fair and random folds.

K-fold helps check model stability by testing on different data parts.

More folds mean more training rounds but better performance estimate.

Summary

K-fold cross-validation splits data into parts to test model multiple times.

It helps check how well the model works on new data.

Use it to avoid overfitting and choose the best model.