0
0
ML Pythonprogramming~5 mins

Loading datasets (CSV, built-in datasets) in ML Python

Choose your learning style9 modes available
Introduction

We need data to teach computers to learn. Loading datasets means getting data from files or built-in sources so we can use it to train models.

When you want to train a model using your own data saved in a CSV file.
When you want to quickly try machine learning with example datasets that come with libraries.
When you want to explore or understand data before building a model.
When you want to test your code with standard datasets everyone uses.
When you want to prepare data for analysis or visualization.
Syntax
ML Python
import pandas as pd
from sklearn import datasets

# Load CSV file
data = pd.read_csv('file.csv')

# Load built-in dataset
iris = datasets.load_iris()

Use pandas.read_csv() to load CSV files easily.

Built-in datasets like Iris come as objects with data and labels.

Examples
Loads data from a CSV file named 'data.csv' into a table-like structure called a DataFrame.
ML Python
import pandas as pd
data = pd.read_csv('data.csv')
Loads the Iris flower dataset included in scikit-learn for quick experiments.
ML Python
from sklearn import datasets
iris = datasets.load_iris()
Shows the first 5 rows of the Iris dataset features.
ML Python
print(iris.data[:5])
Sample Program

This program loads the Iris dataset from scikit-learn, converts it to a table, and prints the first 3 rows. Then it simulates loading a CSV file from a string and prints that data.

ML Python
import pandas as pd
from sklearn import datasets

# Load built-in Iris dataset
iris = datasets.load_iris()

# Convert to DataFrame for easier use
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target

# Show first 3 rows
print(iris_df.head(3))

# Load CSV example (simulate with CSV string)
csv_data = '''sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
6.7,3.1,4.7,1.5,versicolor
7.2,3.6,6.1,2.5,virginica
'''

from io import StringIO

# Read CSV from string (like a file)
csv_df = pd.read_csv(StringIO(csv_data))
print(csv_df)
OutputSuccess
Important Notes

CSV files are common for storing data tables and can be opened with spreadsheet programs.

Built-in datasets are great for learning because they are clean and ready to use.

Always check your data after loading to understand its shape and content.

Summary

Loading datasets means getting data from files or built-in sources to use in machine learning.

Use pandas.read_csv() for CSV files and libraries like scikit-learn for built-in datasets.

Always look at your data after loading to make sure it loaded correctly.