We use tf.data.Dataset to handle and prepare data easily for machine learning. It helps us load, transform, and feed data step-by-step.
0
0
tf.data.Dataset creation in TensorFlow
Introduction
When you have a list or array of data and want to process it in batches.
When you want to read data from files like images or text for training.
When you need to shuffle or repeat data during training.
When you want to apply transformations like mapping functions to your data.
When you want to build efficient input pipelines for TensorFlow models.
Syntax
TensorFlow
tf.data.Dataset.from_tensor_slices(data) tf.data.Dataset.from_generator(generator_function, output_types=output_types) tf.data.Dataset.from_tensors(tensor)
from_tensor_slices splits data into elements (like rows).
from_generator creates dataset from a Python generator for dynamic data.
Examples
This creates a dataset from a simple list and prints each item.
TensorFlow
import tensorflow as tf # Create dataset from a list data = [1, 2, 3, 4] dataset = tf.data.Dataset.from_tensor_slices(data) for item in dataset: print(item.numpy())
This creates a dataset with one element (the whole tensor).
TensorFlow
import tensorflow as tf # Create dataset from a single tensor tensor = tf.constant([[1, 2], [3, 4]]) dataset = tf.data.Dataset.from_tensors(tensor) for item in dataset: print(item.numpy())
This creates a dataset from a generator function that yields values.
TensorFlow
import tensorflow as tf def gen(): for i in range(3): yield i * 2 dataset = tf.data.Dataset.from_generator(gen, output_types=tf.int32) for item in dataset: print(item.numpy())
Sample Model
This program creates a dataset from a list of numbers and prints each number. It shows how to start using tf.data.Dataset with simple data.
TensorFlow
import tensorflow as tf # Sample data: list of numbers numbers = [10, 20, 30, 40, 50] # Create dataset from the list dataset = tf.data.Dataset.from_tensor_slices(numbers) # Print each element print("Dataset elements:") for element in dataset: print(element.numpy())
OutputSuccess
Important Notes
Datasets created with from_tensor_slices split the input data into individual elements.
Use from_generator when data is too large to fit in memory or needs to be generated on the fly.
Datasets can be chained with methods like batch(), shuffle(), and map() for more complex pipelines.
Summary
tf.data.Dataset helps manage data for TensorFlow models easily.
You can create datasets from lists, tensors, or generators.
Datasets let you process data step-by-step for training or evaluation.