Bird
Raised Fist0
TensorFlowml~20 mins

Dataset from files in TensorFlow - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
TensorFlow Dataset Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of loading text files with tf.data
What is the output of this code snippet that loads text lines from two files using TensorFlow's tf.data API?
TensorFlow
import tensorflow as tf

# Create two small text files
with open('file1.txt', 'w') as f:
    f.write('apple\nbanana')
with open('file2.txt', 'w') as f:
    f.write('cherry\ndate')

# Load dataset from files
files = ['file1.txt', 'file2.txt']
dataset = tf.data.TextLineDataset(files)

# Collect all lines into a list
lines = list(dataset.as_numpy_iterator())
print(lines)
A[b'file1.txt', b'file2.txt']
BSyntaxError
C[b'apple banana', b'cherry date']
D[b'apple', b'banana', b'cherry', b'date']
Attempts:
2 left
💡 Hint
tf.data.TextLineDataset reads each line from all files in order.
data_output
intermediate
1:30remaining
Number of elements in a dataset from multiple CSV files
Given three CSV files each with 2 rows, what is the number of elements in the dataset created by tf.data.experimental.CsvDataset loading all files?
TensorFlow
import tensorflow as tf

# Assume files: data1.csv, data2.csv, data3.csv each with 2 rows
files = ['data1.csv', 'data2.csv', 'data3.csv']
dataset = tf.data.experimental.CsvDataset(files, [tf.float32, tf.int32])

count = 0
for _ in dataset:
    count += 1
print(count)
A6
B3
C2
D9
Attempts:
2 left
💡 Hint
Each file has 2 rows, and dataset reads all rows from all files.
🔧 Debug
advanced
1:30remaining
Error raised when loading non-existent files
What error will this code raise when trying to create a TextLineDataset from a file that does not exist?
TensorFlow
import tensorflow as tf

files = ['missing_file.txt']
dataset = tf.data.TextLineDataset(files)

for line in dataset:
    print(line.numpy())
ANo error, prints nothing
Btf.errors.NotFoundError
CValueError
DFileNotFoundError
Attempts:
2 left
💡 Hint
TensorFlow raises its own error type for missing files.
🚀 Application
advanced
2:30remaining
Creating a dataset from image files with labels
You have a folder with images and a CSV file mapping image filenames to labels. Which code snippet correctly creates a tf.data.Dataset yielding (image_tensor, label) pairs?
AUse tf.data.TextLineDataset on CSV, parse lines, then map to load images with tf.io.read_file and tf.image.decode_jpeg
BUse tf.data.Dataset.list_files on images, then map to load images and assign labels manually
CUse tf.data.experimental.CsvDataset on CSV, then map to load images and parse labels
DUse tf.data.Dataset.from_tensor_slices with image paths and labels loaded into memory
Attempts:
2 left
💡 Hint
CsvDataset is designed to read CSV files with typed columns.
🧠 Conceptual
expert
3:00remaining
Effect of interleave on dataset from multiple files
What is the main difference between tf.data.TextLineDataset(files) and tf.data.Dataset.from_tensor_slices(files).interleave(tf.data.TextLineDataset, cycle_length=2) when reading multiple text files?
ATextLineDataset reads files sequentially; interleave reads lines from files in parallel, mixing lines
BTextLineDataset reads files in parallel; interleave reads files sequentially
CBoth produce the same output order
DTextLineDataset reads only first file; interleave reads all files
Attempts:
2 left
💡 Hint
Interleave cycles through datasets to mix their elements.

Practice

(1/5)
1. What is the main purpose of using tf.data.Dataset.from_tensor_slices() with file paths in TensorFlow?
easy
A. To convert tensors into image files
B. To directly read image data from files into memory
C. To save datasets to disk as files
D. To create a dataset that holds file paths which can be read later

Solution

  1. Step 1: Understand the function purpose

    tf.data.Dataset.from_tensor_slices() creates a dataset from a tensor, often a list of file paths, not the file contents themselves.
  2. Step 2: Clarify dataset content

    The dataset holds file paths as strings, which can be mapped later to read actual file data.
  3. Final Answer:

    To create a dataset that holds file paths which can be read later -> Option D
  4. Quick Check:

    from_tensor_slices(file_paths) = dataset of paths [OK]
Hint: Remember: from_tensor_slices holds paths, not file data [OK]
Common Mistakes:
  • Thinking it reads file contents immediately
  • Confusing dataset creation with saving files
  • Assuming it converts tensors to images
2. Which of the following is the correct way to create a dataset from a list of image file paths in TensorFlow?
easy
A. dataset = tf.data.Dataset.from_tensor_slices(image_paths)
B. dataset = tf.data.Dataset.read_files(image_paths)
C. dataset = tf.data.Dataset.load(image_paths)
D. dataset = tf.data.Dataset.create(image_paths)

Solution

  1. Step 1: Recall correct TensorFlow method

    The method to create a dataset from a list of tensors (like file paths) is from_tensor_slices().
  2. Step 2: Verify options

    Methods like tf.data.Dataset.load(), tf.data.Dataset.read_files(), and tf.data.Dataset.create() are not valid TensorFlow dataset creation methods.
  3. Final Answer:

    dataset = tf.data.Dataset.from_tensor_slices(image_paths) -> Option A
  4. Quick Check:

    Correct method is from_tensor_slices [OK]
Hint: Use from_tensor_slices for lists of file paths [OK]
Common Mistakes:
  • Using non-existent methods like read_files or load
  • Confusing dataset creation with file reading
  • Misspelling method names
3. Given the code below, what will be the output when iterating over the dataset?
import tensorflow as tf
image_paths = ["img1.jpg", "img2.jpg"]
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
for item in dataset:
    print(item.numpy().decode())
medium
A. Error: decode() not found
B. [b'img1.jpg', b'img2.jpg']
C. img1.jpg\nimg2.jpg
D. Tensor objects printed

Solution

  1. Step 1: Understand dataset content

    The dataset contains string tensors of file paths: b'img1.jpg', b'img2.jpg'.
  2. Step 2: Decode bytes to string

    Calling item.numpy() returns bytes, and decode() converts bytes to normal strings.
  3. Final Answer:

    img1.jpg\nimg2.jpg -> Option C
  4. Quick Check:

    Decoded bytes = file names [OK]
Hint: Use .numpy().decode() to get string from tensor [OK]
Common Mistakes:
  • Printing tensor directly without decoding
  • Expecting list output instead of individual prints
  • Confusing bytes and strings
4. Identify the error in the following code snippet that tries to read image files from paths:
import tensorflow as tf
image_paths = ["img1.jpg", "img2.jpg"]
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
dataset = dataset.map(tf.io.read_file)
for img in dataset:
    print(img.numpy().shape)
medium
A. Cannot print shape of a scalar string tensor
B. tf.io.read_file is not a valid function
C. from_tensor_slices requires a tensor, not list
D. map() cannot be used on datasets

Solution

  1. Step 1: Analyze dataset after map

    After mapping tf.io.read_file, each element is a scalar string tensor containing raw file bytes.
  2. Step 2: Understand tensor shape

    img.numpy() returns Python bytes (raw file content), which has no .shape attribute. Printing img.numpy().shape raises AttributeError.
  3. Final Answer:

    Cannot print shape of a scalar string tensor -> Option A
  4. Quick Check:

    img.numpy() is bytes; no .shape [OK]
Hint: Raw file bytes are scalars; no shape attribute [OK]
Common Mistakes:
  • Assuming read_file returns image tensor
  • Thinking from_tensor_slices rejects lists
  • Believing map() is invalid on datasets
5. You want to create a TensorFlow dataset from a folder of images, resize each image to 128x128, and batch them in groups of 16. Which code snippet correctly achieves this?
hard
A. dataset = tf.keras.utils.image_dataset_from_directory('images', image_size=(128,128), batch_size=16)
B. dataset = tf.data.Dataset.list_files('images/*').map(lambda x: tf.image.resize(tf.io.decode_image(tf.io.read_file(x)), (128,128))).batch(16)
C. dataset = tf.data.Dataset.from_tensor_slices('images').map(tf.io.read_file).batch(16)
D. dataset = tf.keras.preprocessing.image_dataset_from_directory('images', batch_size=128, image_size=(16,16))

Solution

  1. Step 1: Understand dataset creation from folder

    dataset = tf.data.Dataset.list_files('images/*').map(lambda x: tf.image.resize(tf.io.decode_image(tf.io.read_file(x)), (128,128))).batch(16) uses list_files to get file paths, then maps reading, decoding, and resizing images correctly.
  2. Step 2: Check batch and resize parameters

    Images are resized to (128,128) and batched in groups of 16 as required.
  3. Final Answer:

    dataset = tf.data.Dataset.list_files('images/*').map(lambda x: tf.image.resize(tf.io.decode_image(tf.io.read_file(x)), (128,128))).batch(16) -> Option B
  4. Quick Check:

    list_files + map + resize + batch = correct pipeline [OK]
Hint: Use list_files + map with decode and resize, then batch [OK]
Common Mistakes:
  • Using wrong batch size or image size parameters
  • Confusing keras and tf.data APIs
  • Not decoding images before resizing