Challenge - 5 Problems

🎖️

Dataset Loading Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of loading a CSV with pandas

What is the output of the following code snippet when loading a CSV file with pandas and printing the first 3 rows?

ML Python

import pandas as pd
from io import StringIO
csv_data = '''name,age,score
Alice,30,85
Bob,25,90
Charlie,35,88
David,40,92'''
df = pd.read_csv(StringIO(csv_data))
print(df.head(3))

    name  age  score
0   Alice   30     85
1     Bob   25     90
2  Charlie   35     88

88  53  eilrahC  2
09  52  boB  1
58  03  ecilA  0
erocs  ega  eman

   name  age  score
0  Alice  30  85
1  Bob  25  90
2  Charlie  35  88

   name  age  score
0  Alice  30  85
1  Bob  25  90
2  Charlie  35  88
3  David  40  92

Attempts:

2 left

❓ Model Choice

intermediate

2:00remaining

Choosing dataset loading method for built-in datasets

You want to load the Iris dataset for a classification task using scikit-learn. Which code snippet correctly loads the dataset as a dictionary with data and target arrays?

import pandas as pd
data = pd.read_csv('iris.csv')
X, y = data.iloc[:, :-1], data.iloc[:, -1]

from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target

from sklearn.datasets import fetch_iris
data = fetch_iris()
X, y = data.data, data.target

from sklearn.datasets import load_iris
data = load_iris(as_frame=True)
X, y = data.data, data.target

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Evaluating dataset loading time

You load a large CSV dataset using pandas with and without specifying dtypes. Which statement about loading time and memory usage is true?

ASpecifying dtypes reduces loading time but increases memory usage.

BSpecifying dtypes increases loading time but reduces memory usage.

CSpecifying dtypes has no effect on loading time or memory usage.

DSpecifying dtypes reduces loading time and memory usage because pandas can optimize data storage.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying error when loading CSV with missing header

What error will the following code raise when loading a CSV file without a header row using pandas without specifying header=null?

ML Python

import pandas as pd
from io import StringIO
csv_data = '''Alice,30,85
Bob,25,90
Charlie,35,88'''
df = pd.read_csv(StringIO(csv_data))
print(df.head())

ANo error; pandas treats first row as header and data is loaded with column names 'Alice', '30', '85'

BTypeError because data types cannot be inferred

CValueError because of mismatched columns

DParserError due to missing header row

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Understanding dataset loading with train_test_split

You load a built-in dataset and split it into training and testing sets using scikit-learn's train_test_split. Which statement is true about the resulting datasets?

Atrain_test_split randomly shuffles data before splitting, so training and testing sets are representative samples.

Btrain_test_split splits data sequentially without shuffling, so testing set contains last samples only.

Ctrain_test_split always splits data into equal halves regardless of test_size parameter.

Dtrain_test_split requires the dataset to be pre-shuffled manually before splitting.

Attempts:

2 left