Imagine you are training a deep learning model. Why does efficient data loading help prevent bottlenecks during training?
Think about what happens if the model waits for data.
Efficient data loading keeps the processor busy by feeding data quickly, so it doesn't wait and waste time.
What will be the output of this TensorFlow code snippet that uses prefetch?
import tensorflow as tf # Create a dataset of numbers 0 to 4 dataset = tf.data.Dataset.range(5) # Map function to square each number dataset = dataset.map(lambda x: x * x) # Prefetch 2 elements dataset = dataset.prefetch(2) # Collect all elements into a list result = list(dataset.as_numpy_iterator()) print(result)
Remember what map and prefetch do.
The map squares each number, and prefetch does not change the data, just speeds loading.
You have a large image dataset stored on disk. Which data loading strategy will best prevent bottlenecks during training?
Think about balancing memory use and speed.
Using tf.data with parallel map and prefetch allows loading and processing data efficiently in parallel with training.
If data loading is slow and causes the GPU to wait 30% of the time, what is the maximum possible speedup if data loading is optimized to zero wait?
Use the formula: speedup = 1 / (1 - fraction_waiting)
If GPU waits 30%, max speedup = 1 / (1 - 0.3) = 1.43 times faster.
Given this TensorFlow data pipeline code, why might training be slower than expected?
import tensorflow as tf def load_and_preprocess(path): image = tf.io.read_file(path) image = tf.image.decode_jpeg(image, channels=3) image = tf.image.resize(image, [224, 224]) return image paths = tf.constant(['img1.jpg', 'img2.jpg', 'img3.jpg']) dataset = tf.data.Dataset.from_tensor_slices(paths) dataset = dataset.map(load_and_preprocess) dataset = dataset.batch(2) for batch in dataset: # Simulate training step tf.sleep(0.1)
Check if data loading happens in parallel or sequentially.
Without num_parallel_calls, map runs sequentially, slowing data loading and training.