Challenge - 5 Problems

🎖️

Reproducible Analysis Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this code using a fixed random seed?

This code sets a random seed before generating random numbers. What will be the printed output?

Data Analysis Python

import numpy as np
np.random.seed(42)
random_numbers = np.random.rand(3)
print(random_numbers)

A[0.37454012 0.95071431 0.73199394]

B[0.37454012 0.95071431 0.59865848]

C[0.5488135 0.71518937 0.60276338]

D[0.64589411 0.43758721 0.891773 ]

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

What is the shape of the DataFrame after filtering?

Given this DataFrame and filtering code, what is the shape of the resulting DataFrame?

Data Analysis Python

import pandas as pd
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['A'] > 2]

A(3, 1)

B(3, 2)

C(2, 2)

D(2, 1)

Attempts:

2 left

❓ visualization

advanced

2:00remaining

Which plot shows a reproducible histogram with fixed bins?

You want to create a histogram that looks the same every time you run the code. Which option produces a reproducible histogram with fixed bins?

Data Analysis Python

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
data = np.random.randn(1000)

Aplt.hist(data, bins=np.linspace(-4,4,21)); plt.show()

Bplt.hist(data, bins=20); plt.show()

Cplt.hist(data); plt.show()

Dplt.hist(data, bins='auto'); plt.show()

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

What error does this code raise when saving a DataFrame without specifying index?

This code saves a DataFrame to CSV. What error or issue occurs?

Data Analysis Python

import pandas as pd
df = pd.DataFrame({'x':[1,2], 'y':[3,4]})
df.to_csv('output.csv', index=False)
with open('output.csv') as f:
    lines = f.readlines()
print(lines[0])

AIndexError

B'0,x,y\n'

CFileNotFoundError

D'x,y\n'

Attempts:

2 left

🚀 Application

expert

3:00remaining

Which option ensures reproducible train-test split in scikit-learn?

You want to split your dataset into training and testing sets reproducibly. Which code snippet guarantees the same split every time?

Data Analysis Python

from sklearn.model_selection import train_test_split
X = [[i] for i in range(10)]
y = [0,1,0,1,0,1,0,1,0,1]

Atrain_test_split(X, y, test_size=0.3, shuffle=False)

Btrain_test_split(X, y, test_size=0.3, random_state=42)

Ctrain_test_split(X, y, test_size=0.3)

Dtrain_test_split(X, y, test_size=0.3, random_state=None)

Attempts:

2 left