Batching helps train models faster by processing small groups of data at once. Shuffling mixes data so the model learns better and avoids bias.
Batching and shuffling in TensorFlow
dataset = tf.data.Dataset.from_tensor_slices(data) dataset = dataset.shuffle(buffer_size).batch(batch_size)
shuffle(buffer_size) randomly mixes data within the buffer size.
batch(batch_size) groups data into batches of the given size.
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5]) dataset = dataset.shuffle(5).batch(2)
dataset = tf.data.Dataset.from_tensor_slices(features) dataset = dataset.shuffle(100).batch(32)
This program creates a dataset of numbers 0 to 9, shuffles them, and groups them into batches of 3. It prints each batch to show the effect of batching and shuffling.
import tensorflow as tf # Sample data: numbers 0 to 9 numbers = tf.data.Dataset.from_tensor_slices(tf.range(10)) # Shuffle with buffer size 10 and batch size 3 batched_dataset = numbers.shuffle(buffer_size=10).batch(3) print("Batches:") for batch in batched_dataset: print(batch.numpy())
Shuffling with a buffer size equal to or larger than the dataset size ensures full randomization.
Batch size affects memory use and training speed; smaller batches use less memory but may train slower.
Always shuffle training data but usually do not shuffle validation or test data.
Batching groups data to speed up training and use memory efficiently.
Shuffling mixes data to help the model learn better and avoid bias.
In TensorFlow, use shuffle() and batch() on datasets to prepare data for training.