Bird
Raised Fist0
TensorFlowml~20 mins

Batching and shuffling in TensorFlow - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Batching and Shuffling Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output shape after batching?
Given a TensorFlow dataset of 100 samples, each sample is a vector of length 10. If we batch the dataset with batch size 20, what will be the shape of one batch?
TensorFlow
import tensorflow as tf

# Create dataset of 100 samples, each sample shape (10,)
dataset = tf.data.Dataset.from_tensor_slices(tf.random.uniform([100, 10]))

# Batch with size 20
batched_dataset = dataset.batch(20)

for batch in batched_dataset.take(1):
    print(batch.shape)
A(5, 10)
B(10, 20)
C(100, 10)
D(20, 10)
Attempts:
2 left
💡 Hint
Batching groups samples along the first dimension.
🧠 Conceptual
intermediate
1:30remaining
Why shuffle a dataset before training?
What is the main reason to shuffle a dataset before training a machine learning model?
ATo ensure the model sees data in a random order, preventing learning bias from data order
BTo reduce the dataset size by removing duplicates
CTo increase the batch size automatically
DTo normalize the input features
Attempts:
2 left
💡 Hint
Think about how data order might affect learning.
Hyperparameter
advanced
2:00remaining
Choosing shuffle buffer size
In TensorFlow, the shuffle() method requires a buffer size parameter. What is the effect of increasing the shuffle buffer size?
AIt decreases randomness but speeds up training
BIt increases randomness of shuffling but uses more memory
CIt changes the batch size automatically
DIt normalizes the dataset features
Attempts:
2 left
💡 Hint
Think about how buffer size affects the number of samples held before shuffling.
🔧 Debug
advanced
2:30remaining
Why does this TensorFlow dataset not shuffle properly?
Consider this code snippet: import tensorflow as tf raw_data = tf.data.Dataset.range(10) shuffled_data = raw_data.shuffle(buffer_size=5) for item in shuffled_data: print(item.numpy()) Why might the output order not be fully randomized?
ABecause the shuffle buffer size is smaller than the dataset size, limiting randomness
BBecause shuffle() requires batch() to work properly
CBecause the dataset is not batched before shuffling
DBecause shuffle() only works on datasets with more than 100 samples
Attempts:
2 left
💡 Hint
Think about how shuffle buffer size relates to dataset size.
Model Choice
expert
3:00remaining
Best practice for shuffling and batching in TensorFlow pipeline
You want to prepare a TensorFlow dataset for training a neural network. Which pipeline order is best for performance and correctness?
AShuffle and batch simultaneously using batch(shuffle=True)
BBatch the dataset first, then shuffle the batches
CShuffle the dataset first, then batch it
DNeither shuffle nor batch the dataset
Attempts:
2 left
💡 Hint
Consider how shuffling affects individual samples versus batches.

Practice

(1/5)
1. What is the main purpose of batching data in TensorFlow during training?
easy
A. To group data into smaller sets for faster and efficient training
B. To randomly mix data to avoid bias
C. To increase the size of the dataset
D. To convert data into images

Solution

  1. Step 1: Understand batching concept

    Batching means grouping data into smaller sets instead of using all data at once.
  2. Step 2: Identify batching benefit

    This grouping helps speed up training and uses memory efficiently.
  3. Final Answer:

    To group data into smaller sets for faster and efficient training -> Option A
  4. Quick Check:

    Batching = grouping data for efficiency [OK]
Hint: Batching groups data; shuffling mixes data [OK]
Common Mistakes:
  • Confusing batching with shuffling
  • Thinking batching increases dataset size
  • Believing batching changes data type
2. Which of the following is the correct way to shuffle and batch a TensorFlow dataset named ds with batch size 32?
easy
A. ds.batch(100).shuffle(32)
B. ds.batch(32).shuffle(100)
C. ds.shuffle(32).batch(100)
D. ds.shuffle(100).batch(32)

Solution

  1. Step 1: Recall correct order of operations

    In TensorFlow, you first shuffle the dataset, then batch it.
  2. Step 2: Match batch size and shuffle buffer

    Shuffle buffer size is usually larger than batch size; here shuffle(100) and batch(32) is correct.
  3. Final Answer:

    ds.shuffle(100).batch(32) -> Option D
  4. Quick Check:

    Shuffle before batch = ds.shuffle().batch() [OK]
Hint: Shuffle first, then batch with correct sizes [OK]
Common Mistakes:
  • Batching before shuffling
  • Using smaller shuffle buffer than batch size
  • Mixing batch and shuffle parameters
3. What will be the output shape of batches if you run the following code on a dataset of 100 samples with shape (28, 28, 1)?
batched_ds = ds.batch(20)
for batch in batched_ds:
    print(batch.shape)
medium
A. (20, 28, 28) for all batches
B. (20, 28, 28, 1) for all batches
C. (100, 28, 28, 1) for all batches
D. (28, 28, 1) for all batches

Solution

  1. Step 1: Understand batch size effect on shape

    Batching groups samples; each batch has shape (batch_size, sample_shape).
  2. Step 2: Calculate batch shapes for 100 samples with batch size 20

    There will be 5 batches; first 4 batches have 20 samples, last batch also 20 (100 divisible by 20).
  3. Final Answer:

    (20, 28, 28, 1) for all batches -> Option B
  4. Quick Check:

    Batch shape = (batch_size, sample_shape) [OK]
Hint: Batch shape adds batch size as first dimension [OK]
Common Mistakes:
  • Ignoring batch dimension in shape
  • Assuming last batch is smaller when divisible
  • Confusing sample shape with batch shape
4. You wrote this code but the dataset is not shuffled properly:
ds = tf.data.Dataset.range(10)
ds = ds.batch(2).shuffle(5)

What is the main issue?
medium
A. Shuffle should be called before batch to mix individual elements
B. Shuffle buffer size is too large
C. Batch size must be 1 for shuffle to work
D. Dataset.range(10) cannot be shuffled

Solution

  1. Step 1: Analyze order of shuffle and batch

    Shuffling after batching shuffles batches, not individual elements.
  2. Step 2: Correct order for proper shuffling

    Shuffle should be called before batch to mix individual data points.
  3. Final Answer:

    Shuffle should be called before batch to mix individual elements -> Option A
  4. Quick Check:

    Shuffle before batch for proper mixing [OK]
Hint: Shuffle before batch to mix single items [OK]
Common Mistakes:
  • Calling shuffle after batch
  • Using too small shuffle buffer
  • Thinking batch size must be 1
5. You have a dataset with 103 samples. You want to shuffle it with a buffer size of 50 and batch it with size 20. How many batches will you get and what will be the size of the last batch if you use:
ds.shuffle(50).batch(20)
hard
A. 6 batches; last batch size 20
B. 5 batches; last batch size 20
C. 6 batches; last batch size 3
D. 5 batches; last batch size 3

Solution

  1. Step 1: Calculate number of batches

    103 samples divided by batch size 20 gives 5 full batches (20*5=100) plus 1 partial batch with 3 samples.
  2. Step 2: Understand shuffle effect on batch count

    Shuffling does not change total samples, so batch count remains 6 with last batch smaller.
  3. Final Answer:

    6 batches; last batch size 3 -> Option C
  4. Quick Check:

    103/20 = 5 full + 1 partial batch [OK]
Hint: Divide samples by batch size; last batch may be smaller [OK]
Common Mistakes:
  • Ignoring last partial batch
  • Assuming shuffle changes batch count
  • Miscounting batches as 5 instead of 6