Challenge - 5 Problems
Data Exploration Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this data summary code?
Given the DataFrame below, what will be the output of the
df.describe() method?Pandas
import pandas as pd data = {'age': [25, 30, 22, 40, 28], 'income': [50000, 60000, 45000, 80000, 52000]} df = pd.DataFrame(data) print(df.describe())
Attempts:
2 left
💡 Hint
Look carefully at the 25% percentile values for both columns.
✗ Incorrect
The
describe() method calculates count, mean, std, min, 25%, 50%, 75%, and max for numeric columns. The 25% percentile for age is 25.0 and for income is 50000.0 based on the data.❓ data_output
intermediate2:00remaining
How many missing values are in the DataFrame?
Given the DataFrame below, what is the output of
df.isnull().sum()?Pandas
import pandas as pd import numpy as np data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'age': [25, np.nan, 30, 22], 'income': [50000, 60000, np.nan, 45000]} df = pd.DataFrame(data) print(df.isnull().sum())
Attempts:
2 left
💡 Hint
Check which columns have missing values and count them.
✗ Incorrect
The 'age' column has one missing value (NaN), and 'income' also has one missing value. The 'name' column has no missing values.
❓ visualization
advanced2:00remaining
Which plot best shows the distribution of a numeric column?
You want to understand how the values in the 'age' column are spread out. Which plot below is best for this?
Pandas
import pandas as pd import matplotlib.pyplot as plt data = {'age': [22, 25, 25, 30, 35, 40, 40, 40, 45, 50]} df = pd.DataFrame(data) # Option A plt.hist(df['age'], bins=5) plt.title('Histogram of Age') plt.show() # Option B plt.scatter(range(len(df)), df['age']) plt.title('Scatter plot of Age') plt.show() # Option C plt.bar(df['age'].value_counts().index, df['age'].value_counts()) plt.title('Bar chart of Age counts') plt.show() # Option D plt.boxplot(df['age']) plt.title('Boxplot of Age') plt.show()
Attempts:
2 left
💡 Hint
Think about which plot shows how often each age range occurs.
✗ Incorrect
A histogram groups data into bins and shows how many values fall into each range, which is ideal for understanding distribution.
🧠 Conceptual
advanced1:30remaining
Why is data exploration important before modeling?
Which reason below best explains why exploring data is a crucial step before building a predictive model?
Attempts:
2 left
💡 Hint
Think about what problems might happen if you don't understand your data first.
✗ Incorrect
Exploring data helps find mistakes, missing values, and patterns that affect model accuracy and reliability.
🔧 Debug
expert2:00remaining
What error does this code raise during data exploration?
Consider this code snippet. What error will it raise when run?
Pandas
import pandas as pd data = {'age': [25, 30, 22], 'income': [50000, 60000]} df = pd.DataFrame(data) print(df.describe())
Attempts:
2 left
💡 Hint
Check if all columns have the same number of values.
✗ Incorrect
The 'income' list has only 2 values while 'age' has 3, causing a ValueError when creating the DataFrame.