We use datasets from files to easily load and work with data stored on your computer. This helps us train machine learning models with real data.
0
0
Dataset from files in TensorFlow
Introduction
You have images saved in folders and want to train a model to recognize them.
You want to read text data from files to analyze or build a language model.
You have CSV files with tabular data for prediction tasks.
You want to load large datasets without loading everything into memory at once.
You want to preprocess data as you load it for efficient training.
Syntax
TensorFlow
tf.data.Dataset.from_tensor_slices(filenames) # or for images image_dataset = tf.keras.utils.image_dataset_from_directory(directory_path, batch_size=32, image_size=(256, 256))
from_tensor_slices creates a dataset from a list of file paths.
image_dataset_from_directory loads images from folders and labels them automatically.
Examples
This creates a dataset from a list of text file names and prints each file name.
TensorFlow
import tensorflow as tf filenames = ["file1.txt", "file2.txt"] dataset = tf.data.Dataset.from_tensor_slices(filenames) for file in dataset: print(file.numpy())
This loads images from a folder called 'images', resizes them to 128x128, and prints the shape of one batch and its labels.
TensorFlow
import tensorflow as tf image_dataset = tf.keras.utils.image_dataset_from_directory( "./images", batch_size=16, image_size=(128, 128) ) for images, labels in image_dataset.take(1): print(images.shape, labels.numpy())
Sample Model
This program creates two text files, loads their paths into a TensorFlow dataset, reads the file contents, and prints them.
TensorFlow
import tensorflow as tf import os # Create example text files os.makedirs('data', exist_ok=True) with open('data/file1.txt', 'w') as f: f.write('Hello TensorFlow') with open('data/file2.txt', 'w') as f: f.write('Dataset from files') # List of file paths filenames = ["data/file1.txt", "data/file2.txt"] # Create dataset from file names file_dataset = tf.data.Dataset.from_tensor_slices(filenames) # Function to read file content @tf.function def read_file(filename): text = tf.io.read_file(filename) return text # Map function to dataset text_dataset = file_dataset.map(read_file) # Print contents for text in text_dataset: print(text.numpy().decode('utf-8'))
OutputSuccess
Important Notes
Use tf.io.read_file to read file contents inside the dataset pipeline.
Mapping functions lets you preprocess data as it loads.
Datasets help handle large data efficiently without loading all at once.
Summary
Datasets from files let you load data stored on disk easily.
You can create datasets from file paths or directly from image folders.
Use mapping to read and preprocess file contents during loading.