We need data to teach computers to learn. Loading datasets means getting data from files or built-in sources so we can use it to train models.
Loading datasets (CSV, built-in datasets) in ML Python
import pandas as pd from sklearn import datasets # Load CSV file data = pd.read_csv('file.csv') # Load built-in dataset iris = datasets.load_iris()
Use pandas.read_csv() to load CSV files easily.
Built-in datasets like Iris come as objects with data and labels.
import pandas as pd data = pd.read_csv('data.csv')
from sklearn import datasets iris = datasets.load_iris()
print(iris.data[:5])
This program loads the Iris dataset from scikit-learn, converts it to a table, and prints the first 3 rows. Then it simulates loading a CSV file from a string and prints that data.
import pandas as pd from sklearn import datasets # Load built-in Iris dataset iris = datasets.load_iris() # Convert to DataFrame for easier use iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names) iris_df['target'] = iris.target # Show first 3 rows print(iris_df.head(3)) # Load CSV example (simulate with CSV string) csv_data = '''sepal_length,sepal_width,petal_length,petal_width,species 5.1,3.5,1.4,0.2,setosa 6.7,3.1,4.7,1.5,versicolor 7.2,3.6,6.1,2.5,virginica ''' from io import StringIO # Read CSV from string (like a file) csv_df = pd.read_csv(StringIO(csv_data)) print(csv_df)
CSV files are common for storing data tables and can be opened with spreadsheet programs.
Built-in datasets are great for learning because they are clean and ready to use.
Always check your data after loading to understand its shape and content.
Loading datasets means getting data from files or built-in sources to use in machine learning.
Use pandas.read_csv() for CSV files and libraries like scikit-learn for built-in datasets.
Always look at your data after loading to make sure it loaded correctly.