0
0
SciPydata~20 mins

SciPy with Pandas for data handling - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
SciPy-Pandas Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of scipy.stats.ttest_ind with pandas DataFrames
Given two pandas DataFrames representing two groups of data, what is the output of the following code snippet?
SciPy
import pandas as pd
from scipy import stats

# Create two groups of data
data1 = pd.DataFrame({'score': [88, 92, 85, 90, 87]})
data2 = pd.DataFrame({'score': [78, 81, 79, 83, 80]})

# Perform independent t-test
result = stats.ttest_ind(data1['score'], data2['score'])
print((round(result.statistic, 2), round(result.pvalue, 3)))
A(-6.32, 0.000)
B(6.32, 0.000)
C(-1.23, 0.234)
D(1.23, 0.234)
Attempts:
2 left
💡 Hint
Remember that the t-statistic sign depends on which group mean is larger.
data_output
intermediate
2:00remaining
Result of scipy.stats.linregress on pandas Series
What is the output of the linear regression performed on the following pandas Series using scipy.stats.linregress?
SciPy
import pandas as pd
from scipy import stats

x = pd.Series([1, 2, 3, 4, 5])
y = pd.Series([2, 4, 5, 4, 5])

result = stats.linregress(x, y)
print((round(result.slope, 2), round(result.intercept, 2)))
A(0.6, 2.2)
B(0.7, 1.6)
C(0.8, 1.4)
D(0.5, 2.5)
Attempts:
2 left
💡 Hint
Slope is the change in y per unit change in x.
visualization
advanced
2:30remaining
Visualizing correlation matrix with SciPy and Pandas
You have a pandas DataFrame with multiple numeric columns. You calculate the Pearson correlation matrix using SciPy. Which code snippet correctly produces a heatmap visualization of this correlation matrix using matplotlib?
SciPy
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

# Sample data
np.random.seed(0)
data = pd.DataFrame(np.random.rand(5, 4), columns=list('ABCD'))

# Calculate correlation matrix
corr = data.corr()

# Visualization code here
A
plt.bar(corr.columns, corr.iloc[0])
plt.show()
B
plt.plot(corr)
plt.show()
C
plt.imshow(corr, cmap='coolwarm', interpolation='none')
plt.colorbar()
plt.xticks(range(len(corr)), corr.columns)
plt.yticks(range(len(corr)), corr.columns)
plt.show()
D
plt.scatter(corr.columns, corr.columns)
plt.show()
Attempts:
2 left
💡 Hint
A heatmap uses imshow with a color map to show matrix values.
🔧 Debug
advanced
2:00remaining
Identify the error in applying scipy.stats.kstest on pandas Series
What error will the following code raise when running the Kolmogorov-Smirnov test on a pandas Series?
SciPy
import pandas as pd
from scipy import stats

sample = pd.Series([0.1, 0.4, 0.35, 0.8, 0.9])

result = stats.kstest(sample, 'uniform')
print(result)
ATypeError: 'Series' object is not iterable
BAttributeError: 'Series' object has no attribute 'shape'
CValueError: Data must be 1-dimensional array-like
DNo error, outputs KstestResult(statistic=..., pvalue=...)
Attempts:
2 left
💡 Hint
Check if scipy.stats.kstest accepts pandas Series directly.
🚀 Application
expert
3:00remaining
Using SciPy and Pandas to find the most correlated feature
Given a pandas DataFrame with numeric columns, which code snippet correctly finds the column most positively correlated with column 'target' using SciPy's pearsonr function?
SciPy
import pandas as pd
from scipy.stats import pearsonr

data = pd.DataFrame({
    'target': [1, 2, 3, 4, 5],
    'feat1': [5, 4, 3, 2, 1],
    'feat2': [2, 3, 4, 5, 6],
    'feat3': [5, 5, 5, 5, 5]
})

# Find most correlated feature with 'target'
A
corrs = {col: pearsonr(data['target'], data[col])[0] for col in data.columns if col != 'target'}
most_corr = max(corrs, key=corrs.get)
print(most_corr)
B
corrs = data.corr()['target'].drop('target')
most_corr = corrs.idxmax()
print(most_corr)
C
most_corr = data.corrwith(data['target']).idxmax()
print(most_corr)
D
most_corr = max(data.columns, key=lambda col: pearsonr(data['target'], data[col])[1])
print(most_corr)
Attempts:
2 left
💡 Hint
Use pearsonr to get correlation coefficient, not p-value.