Challenge - 5 Problems
SciPy-Pandas Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of scipy.stats.ttest_ind with pandas DataFrames
Given two pandas DataFrames representing two groups of data, what is the output of the following code snippet?
SciPy
import pandas as pd from scipy import stats # Create two groups of data data1 = pd.DataFrame({'score': [88, 92, 85, 90, 87]}) data2 = pd.DataFrame({'score': [78, 81, 79, 83, 80]}) # Perform independent t-test result = stats.ttest_ind(data1['score'], data2['score']) print((round(result.statistic, 2), round(result.pvalue, 3)))
Attempts:
2 left
💡 Hint
Remember that the t-statistic sign depends on which group mean is larger.
✗ Incorrect
The first group has a higher mean score than the second, so the t-statistic is positive. The p-value is very small, indicating a significant difference.
❓ data_output
intermediate2:00remaining
Result of scipy.stats.linregress on pandas Series
What is the output of the linear regression performed on the following pandas Series using scipy.stats.linregress?
SciPy
import pandas as pd from scipy import stats x = pd.Series([1, 2, 3, 4, 5]) y = pd.Series([2, 4, 5, 4, 5]) result = stats.linregress(x, y) print((round(result.slope, 2), round(result.intercept, 2)))
Attempts:
2 left
💡 Hint
Slope is the change in y per unit change in x.
✗ Incorrect
The slope is approximately 0.7 and intercept about 1.6, fitting the data points best in a linear model.
❓ visualization
advanced2:30remaining
Visualizing correlation matrix with SciPy and Pandas
You have a pandas DataFrame with multiple numeric columns. You calculate the Pearson correlation matrix using SciPy. Which code snippet correctly produces a heatmap visualization of this correlation matrix using matplotlib?
SciPy
import pandas as pd import numpy as np from scipy.stats import pearsonr import matplotlib.pyplot as plt # Sample data np.random.seed(0) data = pd.DataFrame(np.random.rand(5, 4), columns=list('ABCD')) # Calculate correlation matrix corr = data.corr() # Visualization code here
Attempts:
2 left
💡 Hint
A heatmap uses imshow with a color map to show matrix values.
✗ Incorrect
Option C correctly uses imshow to display the correlation matrix as a heatmap with colorbar and axis labels.
🔧 Debug
advanced2:00remaining
Identify the error in applying scipy.stats.kstest on pandas Series
What error will the following code raise when running the Kolmogorov-Smirnov test on a pandas Series?
SciPy
import pandas as pd from scipy import stats sample = pd.Series([0.1, 0.4, 0.35, 0.8, 0.9]) result = stats.kstest(sample, 'uniform') print(result)
Attempts:
2 left
💡 Hint
Check if scipy.stats.kstest accepts pandas Series directly.
✗ Incorrect
scipy.stats.kstest accepts array-like inputs including pandas Series, so it runs without error and returns the test result.
🚀 Application
expert3:00remaining
Using SciPy and Pandas to find the most correlated feature
Given a pandas DataFrame with numeric columns, which code snippet correctly finds the column most positively correlated with column 'target' using SciPy's pearsonr function?
SciPy
import pandas as pd from scipy.stats import pearsonr data = pd.DataFrame({ 'target': [1, 2, 3, 4, 5], 'feat1': [5, 4, 3, 2, 1], 'feat2': [2, 3, 4, 5, 6], 'feat3': [5, 5, 5, 5, 5] }) # Find most correlated feature with 'target'
Attempts:
2 left
💡 Hint
Use pearsonr to get correlation coefficient, not p-value.
✗ Incorrect
Option A correctly uses pearsonr to compute correlation coefficients and finds the column with the highest positive correlation.