Challenge - 5 Problems
Reproducible Analysis Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this code using a fixed random seed?
This code sets a random seed before generating random numbers. What will be the printed output?
Data Analysis Python
import numpy as np np.random.seed(42) random_numbers = np.random.rand(3) print(random_numbers)
Attempts:
2 left
💡 Hint
Setting a random seed fixes the sequence of random numbers generated.
✗ Incorrect
Using np.random.seed(42) fixes the random number generator's state, so np.random.rand(3) always produces the same three numbers: approximately 0.3745, 0.9507, and 0.7320.
❓ data_output
intermediate2:00remaining
What is the shape of the DataFrame after filtering?
Given this DataFrame and filtering code, what is the shape of the resulting DataFrame?
Data Analysis Python
import pandas as pd data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]} df = pd.DataFrame(data) filtered_df = df[df['A'] > 2]
Attempts:
2 left
💡 Hint
Count how many rows have 'A' values greater than 2.
✗ Incorrect
Rows where 'A' > 2 are those with values 3 and 4, so 2 rows remain. The DataFrame has 2 columns, so shape is (2, 2).
❓ visualization
advanced2:00remaining
Which plot shows a reproducible histogram with fixed bins?
You want to create a histogram that looks the same every time you run the code. Which option produces a reproducible histogram with fixed bins?
Data Analysis Python
import matplotlib.pyplot as plt import numpy as np np.random.seed(0) data = np.random.randn(1000)
Attempts:
2 left
💡 Hint
Fixed bins mean the bin edges are explicitly set.
✗ Incorrect
Option A uses fixed bin edges from -4 to 4 with 20 bins, ensuring the histogram is reproducible and consistent across runs.
🔧 Debug
advanced2:00remaining
What error does this code raise when saving a DataFrame without specifying index?
This code saves a DataFrame to CSV. What error or issue occurs?
Data Analysis Python
import pandas as pd df = pd.DataFrame({'x':[1,2], 'y':[3,4]}) df.to_csv('output.csv', index=False) with open('output.csv') as f: lines = f.readlines() print(lines[0])
Attempts:
2 left
💡 Hint
Check the first line of the CSV file when index=False is used.
✗ Incorrect
When index=False, the CSV header contains only column names without an index column, so the first line is 'x,y\n'.
🚀 Application
expert3:00remaining
Which option ensures reproducible train-test split in scikit-learn?
You want to split your dataset into training and testing sets reproducibly. Which code snippet guarantees the same split every time?
Data Analysis Python
from sklearn.model_selection import train_test_split X = [[i] for i in range(10)] y = [0,1,0,1,0,1,0,1,0,1]
Attempts:
2 left
💡 Hint
Setting random_state fixes the randomness of the split.
✗ Incorrect
Only specifying random_state=42 ensures the split is the same every time you run the code.