0
0
ML Pythonprogramming~20 mins

Loading datasets (CSV, built-in datasets) in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Dataset Loading Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of loading a CSV with pandas
What is the output of the following code snippet when loading a CSV file with pandas and printing the first 3 rows?
ML Python
import pandas as pd
from io import StringIO
csv_data = '''name,age,score
Alice,30,85
Bob,25,90
Charlie,35,88
David,40,92'''
df = pd.read_csv(StringIO(csv_data))
print(df.head(3))
A
    name  age  score
0   Alice   30     85
1     Bob   25     90
2  Charlie   35     88
B
88  53  eilrahC  2
09  52  boB  1
58  03  ecilA  0
erocs  ega  eman
C
   name  age  score
0  Alice  30  85
1  Bob  25  90
2  Charlie  35  88
D
   name  age  score
0  Alice  30  85
1  Bob  25  90
2  Charlie  35  88
3  David  40  92
Attempts:
2 left
Model Choice
intermediate
2:00remaining
Choosing dataset loading method for built-in datasets
You want to load the Iris dataset for a classification task using scikit-learn. Which code snippet correctly loads the dataset as a dictionary with data and target arrays?
A
import pandas as pd
data = pd.read_csv('iris.csv')
X, y = data.iloc[:, :-1], data.iloc[:, -1]
B
from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target
C
from sklearn.datasets import fetch_iris
data = fetch_iris()
X, y = data.data, data.target
D
from sklearn.datasets import load_iris
data = load_iris(as_frame=True)
X, y = data.data, data.target
Attempts:
2 left
Metrics
advanced
2:00remaining
Evaluating dataset loading time
You load a large CSV dataset using pandas with and without specifying dtypes. Which statement about loading time and memory usage is true?
ASpecifying dtypes reduces loading time but increases memory usage.
BSpecifying dtypes increases loading time but reduces memory usage.
CSpecifying dtypes has no effect on loading time or memory usage.
DSpecifying dtypes reduces loading time and memory usage because pandas can optimize data storage.
Attempts:
2 left
🔧 Debug
advanced
2:00remaining
Identifying error when loading CSV with missing header
What error will the following code raise when loading a CSV file without a header row using pandas without specifying header=null?
ML Python
import pandas as pd
from io import StringIO
csv_data = '''Alice,30,85
Bob,25,90
Charlie,35,88'''
df = pd.read_csv(StringIO(csv_data))
print(df.head())
ANo error; pandas treats first row as header and data is loaded with column names 'Alice', '30', '85'
BTypeError because data types cannot be inferred
CValueError because of mismatched columns
DParserError due to missing header row
Attempts:
2 left
🧠 Conceptual
expert
2:00remaining
Understanding dataset loading with train_test_split
You load a built-in dataset and split it into training and testing sets using scikit-learn's train_test_split. Which statement is true about the resulting datasets?
Atrain_test_split randomly shuffles data before splitting, so training and testing sets are representative samples.
Btrain_test_split splits data sequentially without shuffling, so testing set contains last samples only.
Ctrain_test_split always splits data into equal halves regardless of test_size parameter.
Dtrain_test_split requires the dataset to be pre-shuffled manually before splitting.
Attempts:
2 left