Challenge - 5 Problems
Exploratory Data Analysis Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of basic data summary
What is the output of the following code snippet that summarizes a DataFrame?
Pandas
import pandas as pd data = {'Age': [25, 30, 22, 40, 28], 'Salary': [50000, 60000, 45000, 80000, 52000]} df = pd.DataFrame(data) summary = df.describe() print(summary)
Attempts:
2 left
💡 Hint
Look at the output of pandas DataFrame describe() method.
✗ Incorrect
The describe() method returns a DataFrame with count, mean, std, min, quartiles, and max for each numeric column.
❓ data_output
intermediate2:00remaining
Count of missing values per column
Given the DataFrame below, what is the output of counting missing values per column?
Pandas
import pandas as pd import numpy as np data = {'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 22, 40], 'Salary': [50000, 60000, 45000, None]} df = pd.DataFrame(data) missing_counts = df.isnull().sum() print(missing_counts)
Attempts:
2 left
💡 Hint
Check which values are None or np.nan in each column.
✗ Incorrect
The isnull() method marks None as missing. Each column has exactly one missing value.
❓ visualization
advanced2:00remaining
Correct histogram plot code
Which option produces a histogram plot of the 'Age' column with 5 bins using matplotlib?
Pandas
import pandas as pd import matplotlib.pyplot as plt data = {'Age': [25, 30, 22, 40, 28, 35, 33, 31, 29, 27]} df = pd.DataFrame(data) # Choose the correct code to plot histogram
Attempts:
2 left
💡 Hint
matplotlib's hist function takes data and bins as arguments.
✗ Incorrect
plt.hist() is the correct matplotlib function to plot a histogram. Option D uses pandas plotting but requires plt.show(). Option D is invalid because plt.plot does not accept bins. Option D is invalid because df.plot.hist expects no argument or column name as parameter.
🔧 Debug
advanced2:00remaining
Identify error in groupby aggregation
What error does the following code raise?
Pandas
import pandas as pd data = {'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 15, 25]} df = pd.DataFrame(data) result = df.groupby('Category').agg({'Value': 'sum', 'NonExistent': 'mean'}) print(result)
Attempts:
2 left
💡 Hint
Check if all keys in agg dictionary exist in DataFrame columns.
✗ Incorrect
The aggregation dictionary references a column 'NonExistent' which is not in the DataFrame, causing a KeyError.
🚀 Application
expert3:00remaining
Calculate correlation matrix and interpret output
Given the DataFrame below, what is the correlation coefficient between 'X' and 'Y'?
Pandas
import pandas as pd import numpy as np data = {'X': [1, 2, 3, 4, 5], 'Y': [2, 4, 6, 8, 10], 'Z': [5, 4, 3, 2, 1]} df = pd.DataFrame(data) corr_matrix = df.corr() print(round(corr_matrix.loc['X', 'Y'], 2))
Attempts:
2 left
💡 Hint
Check if Y is a perfect linear function of X.
✗ Incorrect
Y is exactly 2 times X, so correlation is perfect positive 1.00.