0
0
TensorFlowml~5 mins

Batching and shuffling in TensorFlow

Choose your learning style9 modes available
Introduction

Batching helps train models faster by processing small groups of data at once. Shuffling mixes data so the model learns better and avoids bias.

When training a model on a large dataset that can't fit in memory all at once.
When you want to speed up training by processing multiple examples together.
When you want to prevent the model from learning the order of the data.
When you want to improve model accuracy by giving it varied data each time.
When preparing data for neural network training in TensorFlow.
Syntax
TensorFlow
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.shuffle(buffer_size).batch(batch_size)

shuffle(buffer_size) randomly mixes data within the buffer size.

batch(batch_size) groups data into batches of the given size.

Examples
This creates batches of 2 from the shuffled list of numbers 1 to 5.
TensorFlow
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5])
dataset = dataset.shuffle(5).batch(2)
Shuffle with buffer 100 and batch size 32 for feature data.
TensorFlow
dataset = tf.data.Dataset.from_tensor_slices(features)
dataset = dataset.shuffle(100).batch(32)
Sample Model

This program creates a dataset of numbers 0 to 9, shuffles them, and groups them into batches of 3. It prints each batch to show the effect of batching and shuffling.

TensorFlow
import tensorflow as tf

# Sample data: numbers 0 to 9
numbers = tf.data.Dataset.from_tensor_slices(tf.range(10))

# Shuffle with buffer size 10 and batch size 3
batched_dataset = numbers.shuffle(buffer_size=10).batch(3)

print("Batches:")
for batch in batched_dataset:
    print(batch.numpy())
OutputSuccess
Important Notes

Shuffling with a buffer size equal to or larger than the dataset size ensures full randomization.

Batch size affects memory use and training speed; smaller batches use less memory but may train slower.

Always shuffle training data but usually do not shuffle validation or test data.

Summary

Batching groups data to speed up training and use memory efficiently.

Shuffling mixes data to help the model learn better and avoid bias.

In TensorFlow, use shuffle() and batch() on datasets to prepare data for training.