K-fold cross-validation helps us check how well a machine learning model will work on new data. It splits data into parts to test the model multiple times.
0
0
K-fold cross-validation in TensorFlow
Introduction
When you want to see how well your model performs on different parts of your data.
When you have limited data and want to use it efficiently for training and testing.
When you want to avoid overfitting by testing the model on unseen data multiple times.
When comparing different models to find the best one.
When tuning model settings to get the best performance.
Syntax
TensorFlow
from sklearn.model_selection import KFold kf = KFold(n_splits=5, shuffle=True, random_state=42) for train_index, test_index in kf.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] # Train and test your model here
n_splits is how many parts to split the data into.
shuffle=True mixes data before splitting to get better random parts.
Examples
This splits data into 3 parts and prints the indexes for training and testing sets.
TensorFlow
kf = KFold(n_splits=3) for train_idx, test_idx in kf.split(X): print('Train:', train_idx, 'Test:', test_idx)
This splits data into 4 parts with shuffling and prints sizes of train and test sets.
TensorFlow
kf = KFold(n_splits=4, shuffle=True, random_state=1) for train_idx, test_idx in kf.split(X): print('Train size:', len(train_idx), 'Test size:', len(test_idx))
Sample Model
This code splits data into 4 parts, trains a simple neural network on 3 parts, and tests on the remaining part. It repeats this for all folds and prints loss and accuracy for each.
TensorFlow
import numpy as np import tensorflow as tf from sklearn.model_selection import KFold # Create simple data X = np.array([[i] for i in range(20)]) y = np.array([0 if i < 10 else 1 for i in range(20)]) kf = KFold(n_splits=4, shuffle=True, random_state=42) fold = 1 for train_index, test_index in kf.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] # Build a simple model model = tf.keras.Sequential([ tf.keras.layers.Dense(8, activation='relu', input_shape=(1,)), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Train model model.fit(X_train, y_train, epochs=10, verbose=0) # Evaluate model loss, accuracy = model.evaluate(X_test, y_test, verbose=0) print(f'Fold {fold} - Loss: {loss:.4f}, Accuracy: {accuracy:.4f}') fold += 1
OutputSuccess
Important Notes
Always shuffle data before splitting to get fair and random folds.
K-fold helps check model stability by testing on different data parts.
More folds mean more training rounds but better performance estimate.
Summary
K-fold cross-validation splits data into parts to test model multiple times.
It helps check how well the model works on new data.
Use it to avoid overfitting and choose the best model.