0
0
Data Analysis Pythondata~20 mins

Reproducible analysis patterns in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Reproducible Analysis Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this code using a fixed random seed?
This code sets a random seed before generating random numbers. What will be the printed output?
Data Analysis Python
import numpy as np
np.random.seed(42)
random_numbers = np.random.rand(3)
print(random_numbers)
A[0.37454012 0.95071431 0.73199394]
B[0.37454012 0.95071431 0.59865848]
C[0.5488135 0.71518937 0.60276338]
D[0.64589411 0.43758721 0.891773 ]
Attempts:
2 left
💡 Hint
Setting a random seed fixes the sequence of random numbers generated.
data_output
intermediate
2:00remaining
What is the shape of the DataFrame after filtering?
Given this DataFrame and filtering code, what is the shape of the resulting DataFrame?
Data Analysis Python
import pandas as pd
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
filtered_df = df[df['A'] > 2]
A(3, 1)
B(3, 2)
C(2, 2)
D(2, 1)
Attempts:
2 left
💡 Hint
Count how many rows have 'A' values greater than 2.
visualization
advanced
2:00remaining
Which plot shows a reproducible histogram with fixed bins?
You want to create a histogram that looks the same every time you run the code. Which option produces a reproducible histogram with fixed bins?
Data Analysis Python
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
data = np.random.randn(1000)
Aplt.hist(data, bins=np.linspace(-4,4,21)); plt.show()
Bplt.hist(data, bins=20); plt.show()
Cplt.hist(data); plt.show()
Dplt.hist(data, bins='auto'); plt.show()
Attempts:
2 left
💡 Hint
Fixed bins mean the bin edges are explicitly set.
🔧 Debug
advanced
2:00remaining
What error does this code raise when saving a DataFrame without specifying index?
This code saves a DataFrame to CSV. What error or issue occurs?
Data Analysis Python
import pandas as pd
df = pd.DataFrame({'x':[1,2], 'y':[3,4]})
df.to_csv('output.csv', index=False)
with open('output.csv') as f:
    lines = f.readlines()
print(lines[0])
AIndexError
B'0,x,y\n'
CFileNotFoundError
D'x,y\n'
Attempts:
2 left
💡 Hint
Check the first line of the CSV file when index=False is used.
🚀 Application
expert
3:00remaining
Which option ensures reproducible train-test split in scikit-learn?
You want to split your dataset into training and testing sets reproducibly. Which code snippet guarantees the same split every time?
Data Analysis Python
from sklearn.model_selection import train_test_split
X = [[i] for i in range(10)]
y = [0,1,0,1,0,1,0,1,0,1]
Atrain_test_split(X, y, test_size=0.3, shuffle=False)
Btrain_test_split(X, y, test_size=0.3, random_state=42)
Ctrain_test_split(X, y, test_size=0.3)
Dtrain_test_split(X, y, test_size=0.3, random_state=None)
Attempts:
2 left
💡 Hint
Setting random_state fixes the randomness of the split.