What is Dataset from files in TensorFlow?

We use datasets from files to easily load and work with data stored on your computer. This helps us train machine learning models with real data.

Dataset from files in TensorFlow - Syntax, Examples & Explanation

Practice

(1/5)

1. What is the main purpose of using tf.data.Dataset.from_tensor_slices() with file paths in TensorFlow?

easy

A. To convert tensors into image files

B. To directly read image data from files into memory

C. To save datasets to disk as files

D. To create a dataset that holds file paths which can be read later

Solution

Step 1: Understand the function purpose
tf.data.Dataset.from_tensor_slices() creates a dataset from a tensor, often a list of file paths, not the file contents themselves.
Step 2: Clarify dataset content
The dataset holds file paths as strings, which can be mapped later to read actual file data.
Final Answer:
To create a dataset that holds file paths which can be read later -> Option D
Quick Check:
from_tensor_slices(file_paths) = dataset of paths [OK]

Hint: Remember: from_tensor_slices holds paths, not file data [OK]

Common Mistakes:

Thinking it reads file contents immediately
Confusing dataset creation with saving files
Assuming it converts tensors to images

2. Which of the following is the correct way to create a dataset from a list of image file paths in TensorFlow?

easy

A. dataset = tf.data.Dataset.from_tensor_slices(image_paths)

B. dataset = tf.data.Dataset.read_files(image_paths)

C. dataset = tf.data.Dataset.load(image_paths)

D. dataset = tf.data.Dataset.create(image_paths)

Solution

Step 1: Recall correct TensorFlow method
The method to create a dataset from a list of tensors (like file paths) is from_tensor_slices().
Step 2: Verify options
Methods like tf.data.Dataset.load(), tf.data.Dataset.read_files(), and tf.data.Dataset.create() are not valid TensorFlow dataset creation methods.
Final Answer:
dataset = tf.data.Dataset.from_tensor_slices(image_paths) -> Option A
Quick Check:
Correct method is from_tensor_slices [OK]

Hint: Use from_tensor_slices for lists of file paths [OK]

Common Mistakes:

Using non-existent methods like read_files or load
Confusing dataset creation with file reading
Misspelling method names

3. Given the code below, what will be the output when iterating over the dataset?

import tensorflow as tf
image_paths = ["img1.jpg", "img2.jpg"]
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
for item in dataset:
    print(item.numpy().decode())

medium

A. Error: decode() not found

B. [b'img1.jpg', b'img2.jpg']

C. img1.jpg\nimg2.jpg

D. Tensor objects printed

Solution

Step 1: Understand dataset content
The dataset contains string tensors of file paths: b'img1.jpg', b'img2.jpg'.
Step 2: Decode bytes to string
Calling item.numpy() returns bytes, and decode() converts bytes to normal strings.
Final Answer:
img1.jpg\nimg2.jpg -> Option C
Quick Check:
Decoded bytes = file names [OK]

Hint: Use .numpy().decode() to get string from tensor [OK]

Common Mistakes:

Printing tensor directly without decoding
Expecting list output instead of individual prints
Confusing bytes and strings

4. Identify the error in the following code snippet that tries to read image files from paths:

import tensorflow as tf
image_paths = ["img1.jpg", "img2.jpg"]
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
dataset = dataset.map(tf.io.read_file)
for img in dataset:
    print(img.numpy().shape)

medium

A. Cannot print shape of a scalar string tensor

B. tf.io.read_file is not a valid function

C. from_tensor_slices requires a tensor, not list

D. map() cannot be used on datasets

Solution

Step 1: Analyze dataset after map
After mapping tf.io.read_file, each element is a scalar string tensor containing raw file bytes.
Step 2: Understand tensor shape
img.numpy() returns Python bytes (raw file content), which has no .shape attribute. Printing img.numpy().shape raises AttributeError.
Final Answer:
Cannot print shape of a scalar string tensor -> Option A
Quick Check:
img.numpy() is bytes; no .shape [OK]

Hint: Raw file bytes are scalars; no shape attribute [OK]

Common Mistakes:

Assuming read_file returns image tensor
Thinking from_tensor_slices rejects lists
Believing map() is invalid on datasets

5. You want to create a TensorFlow dataset from a folder of images, resize each image to 128x128, and batch them in groups of 16. Which code snippet correctly achieves this?

hard

A. dataset = tf.keras.utils.image_dataset_from_directory('images', image_size=(128,128), batch_size=16)

B. dataset = tf.data.Dataset.list_files('images/*').map(lambda x: tf.image.resize(tf.io.decode_image(tf.io.read_file(x)), (128,128))).batch(16)

C. dataset = tf.data.Dataset.from_tensor_slices('images').map(tf.io.read_file).batch(16)

D. dataset = tf.keras.preprocessing.image_dataset_from_directory('images', batch_size=128, image_size=(16,16))

Solution

Step 1: Understand dataset creation from folder
dataset = tf.data.Dataset.list_files('images/*').map(lambda x: tf.image.resize(tf.io.decode_image(tf.io.read_file(x)), (128,128))).batch(16) uses list_files to get file paths, then maps reading, decoding, and resizing images correctly.
Step 2: Check batch and resize parameters
Images are resized to (128,128) and batched in groups of 16 as required.
Final Answer:
dataset = tf.data.Dataset.list_files('images/*').map(lambda x: tf.image.resize(tf.io.decode_image(tf.io.read_file(x)), (128,128))).batch(16) -> Option B
Quick Check:
list_files + map + resize + batch = correct pipeline [OK]

Hint: Use list_files + map with decode and resize, then batch [OK]

Common Mistakes:

Using wrong batch size or image size parameters
Confusing keras and tf.data APIs
Not decoding images before resizing

Start learning this pattern below

Practice

Solution

Step 1: Understand the function purpose

Step 2: Clarify dataset content

Final Answer:

Quick Check:

Solution

Step 1: Recall correct TensorFlow method

Step 2: Verify options

Final Answer:

Quick Check:

Solution

Step 1: Understand dataset content

Step 2: Decode bytes to string

Final Answer:

Quick Check:

Solution

Step 1: Analyze dataset after map

Step 2: Understand tensor shape

Final Answer:

Quick Check:

Solution

Step 1: Understand dataset creation from folder

Step 2: Check batch and resize parameters

Final Answer:

Quick Check: