0
0
TensorFlowml~5 mins

Prefetching for performance in TensorFlow

Choose your learning style9 modes available
Introduction

Prefetching helps your model get data faster by preparing the next batch while it is still training on the current one. This makes training smoother and quicker.

When training a model on a large dataset that does not fit into memory.
When you want to reduce waiting time between training steps.
When using data pipelines that load and preprocess data on the fly.
When you want to improve GPU or TPU utilization by feeding data continuously.
When training models with complex data augmentation that slows down data loading.
Syntax
TensorFlow
dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)

buffer_size controls how many batches to prepare in advance.

Using tf.data.AUTOTUNE lets TensorFlow decide the best buffer size automatically.

Examples
This prepares 1 batch ahead of time.
TensorFlow
dataset = dataset.prefetch(1)
TensorFlow automatically chooses the best number of batches to prefetch.
TensorFlow
dataset = dataset.prefetch(tf.data.AUTOTUNE)
Sample Model

This code creates a dataset of numbers, squares them, batches them in groups of 2, and uses prefetching to prepare the next batch while the current one is processed. It then prints each batch.

TensorFlow
import tensorflow as tf

# Create a simple dataset of numbers 0 to 9
raw_dataset = tf.data.Dataset.range(10)

# Map a function to square each number
mapped_dataset = raw_dataset.map(lambda x: x * x)

# Batch the data
batched_dataset = mapped_dataset.batch(2)

# Add prefetching to improve performance
prefetched_dataset = batched_dataset.prefetch(tf.data.AUTOTUNE)

# Iterate and print batches
for batch in prefetched_dataset:
    print(batch.numpy())
OutputSuccess
Important Notes

Prefetching works best when your data loading or preprocessing is slower than model training.

Using tf.data.AUTOTUNE is recommended for most cases to let TensorFlow optimize performance.

Prefetching does not change your data; it only speeds up how fast data is fed to the model.

Summary

Prefetching prepares data batches ahead of time to reduce waiting during training.

Use dataset.prefetch(tf.data.AUTOTUNE) for automatic buffer size tuning.

It helps keep your GPU or TPU busy and speeds up training.