0
0
TensorFlowml~5 mins

Caching datasets in TensorFlow

Choose your learning style9 modes available
Introduction

Caching datasets helps your program run faster by saving data in memory or on disk. This way, the data does not need to be loaded or processed again each time.

When you have a dataset that fits in memory and you want to speed up training.
When your dataset is expensive to load or preprocess and you want to avoid repeating those steps.
When you want to reuse the same dataset multiple times during training or evaluation.
When you want to reduce the time spent waiting for data during model training.
Syntax
TensorFlow
dataset = dataset.cache(filename=None)

If filename is None, the dataset is cached in memory.

If you provide a filename, the dataset is cached on disk at that location.

Examples
Caches the dataset in memory for faster access during training.
TensorFlow
dataset = dataset.cache()
Caches the dataset on disk at '/tmp/cache_file'. Useful if dataset is too large for memory.
TensorFlow
dataset = dataset.cache('/tmp/cache_file')
Sample Model

This code creates a dataset of numbers from 0 to 4, squares each number, and caches the results in memory. The first iteration computes and caches the squared numbers. The second iteration reads from the cache, making it faster.

TensorFlow
import tensorflow as tf

# Create a simple dataset
numbers = tf.data.Dataset.range(5)

# Map a function to square the numbers
squared = numbers.map(lambda x: x * x)

# Cache the dataset in memory
cached_dataset = squared.cache()

# Iterate twice to show caching effect
print('First iteration:')
for num in cached_dataset:
    print(num.numpy())

print('Second iteration:')
for num in cached_dataset:
    print(num.numpy())
OutputSuccess
Important Notes

Caching in memory is fast but requires enough RAM to hold the dataset.

Caching on disk is slower than memory but useful for large datasets.

Use caching to avoid repeating expensive preprocessing steps.

Summary

Caching saves dataset results to speed up repeated access.

Use dataset.cache() to cache in memory or dataset.cache(filename) to cache on disk.

Caching helps reduce training time by avoiding repeated data loading or processing.