Challenge - 5 Problems

🎖️

SciPy-Pandas Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of scipy.stats.ttest_ind with pandas DataFrames

Given two pandas DataFrames representing two groups of data, what is the output of the following code snippet?

SciPy

import pandas as pd
from scipy import stats

# Create two groups of data
data1 = pd.DataFrame({'score': [88, 92, 85, 90, 87]})
data2 = pd.DataFrame({'score': [78, 81, 79, 83, 80]})

# Perform independent t-test
result = stats.ttest_ind(data1['score'], data2['score'])
print((round(result.statistic, 2), round(result.pvalue, 3)))

A(-6.32, 0.000)

B(6.32, 0.000)

C(-1.23, 0.234)

D(1.23, 0.234)

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Result of scipy.stats.linregress on pandas Series

What is the output of the linear regression performed on the following pandas Series using scipy.stats.linregress?

SciPy

import pandas as pd
from scipy import stats

x = pd.Series([1, 2, 3, 4, 5])
y = pd.Series([2, 4, 5, 4, 5])

result = stats.linregress(x, y)
print((round(result.slope, 2), round(result.intercept, 2)))

A(0.6, 2.2)

B(0.7, 1.6)

C(0.8, 1.4)

D(0.5, 2.5)

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualizing correlation matrix with SciPy and Pandas

You have a pandas DataFrame with multiple numeric columns. You calculate the Pearson correlation matrix using SciPy. Which code snippet correctly produces a heatmap visualization of this correlation matrix using matplotlib?

SciPy

import pandas as pd
import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt

# Sample data
np.random.seed(0)
data = pd.DataFrame(np.random.rand(5, 4), columns=list('ABCD'))

# Calculate correlation matrix
corr = data.corr()

# Visualization code here

plt.bar(corr.columns, corr.iloc[0])
plt.show()

plt.plot(corr)
plt.show()

plt.imshow(corr, cmap='coolwarm', interpolation='none')
plt.colorbar()
plt.xticks(range(len(corr)), corr.columns)
plt.yticks(range(len(corr)), corr.columns)
plt.show()

plt.scatter(corr.columns, corr.columns)
plt.show()

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in applying scipy.stats.kstest on pandas Series

What error will the following code raise when running the Kolmogorov-Smirnov test on a pandas Series?

SciPy

import pandas as pd
from scipy import stats

sample = pd.Series([0.1, 0.4, 0.35, 0.8, 0.9])

result = stats.kstest(sample, 'uniform')
print(result)

ATypeError: 'Series' object is not iterable

BAttributeError: 'Series' object has no attribute 'shape'

CValueError: Data must be 1-dimensional array-like

DNo error, outputs KstestResult(statistic=..., pvalue=...)

Attempts:

2 left

🚀 Application

expert

3:00remaining

Using SciPy and Pandas to find the most correlated feature

Given a pandas DataFrame with numeric columns, which code snippet correctly finds the column most positively correlated with column 'target' using SciPy's pearsonr function?

SciPy

import pandas as pd
from scipy.stats import pearsonr

data = pd.DataFrame({
    'target': [1, 2, 3, 4, 5],
    'feat1': [5, 4, 3, 2, 1],
    'feat2': [2, 3, 4, 5, 6],
    'feat3': [5, 5, 5, 5, 5]
})

# Find most correlated feature with 'target'

corrs = {col: pearsonr(data['target'], data[col])[0] for col in data.columns if col != 'target'}
most_corr = max(corrs, key=corrs.get)
print(most_corr)

corrs = data.corr()['target'].drop('target')
most_corr = corrs.idxmax()
print(most_corr)

most_corr = data.corrwith(data['target']).idxmax()
print(most_corr)

most_corr = max(data.columns, key=lambda col: pearsonr(data['target'], data[col])[1])
print(most_corr)

Attempts:

2 left