0
0
Pandasdata~20 mins

Exploratory data analysis workflow in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Exploratory Data Analysis Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of basic data summary
What is the output of the following code snippet that summarizes a DataFrame?
Pandas
import pandas as pd

data = {'Age': [25, 30, 22, 40, 28], 'Salary': [50000, 60000, 45000, 80000, 52000]}
df = pd.DataFrame(data)
summary = df.describe()
print(summary)
A{'Age': {'count': 5.0, 'mean': 29.0, 'std': 7.071, 'min': 22.0, '25%': 25.0, '50%': 28.0, '75%': 30.0, 'max': 40.0}, 'Salary': {'count': 5.0, 'mean': 57400.0, 'std': 13435.028, 'min': 45000.0, '25%': 50000.0, '50%': 52000.0, '75%': 60000.0, 'max': 80000.0}}
B
   Age  Salary
0   25   50000
1   30   60000
2   22   45000
3   40   80000
4   28   52000
C
       Age        Salary
count   5.0      5.000000
mean   29.0  57400.000000
std     7.071 13435.028
min    22.0  45000.000000
25%    25.0  50000.000000
50%    28.0  52000.000000
75%    30.0  60000.000000
max    40.0  80000.000000
D
Age       29.0
Salary  57400.0
dtype: float64
Attempts:
2 left
💡 Hint
Look at the output of pandas DataFrame describe() method.
data_output
intermediate
2:00remaining
Count of missing values per column
Given the DataFrame below, what is the output of counting missing values per column?
Pandas
import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 22, 40], 'Salary': [50000, 60000, 45000, None]}
df = pd.DataFrame(data)
missing_counts = df.isnull().sum()
print(missing_counts)
A
Name      1
Age       1
Salary    1
dtype: int64
B
Name      1
Age       0
Salary    1
dtype: int64
C
Name      0
Age       1
Salary    1
dtype: int64
D
Name      1
Age       1
Salary    0
dtype: int64
Attempts:
2 left
💡 Hint
Check which values are None or np.nan in each column.
visualization
advanced
2:00remaining
Correct histogram plot code
Which option produces a histogram plot of the 'Age' column with 5 bins using matplotlib?
Pandas
import pandas as pd
import matplotlib.pyplot as plt

data = {'Age': [25, 30, 22, 40, 28, 35, 33, 31, 29, 27]}
df = pd.DataFrame(data)

# Choose the correct code to plot histogram
A
df.plot.hist('Age', bins=5)
plt.show()
B
df['Age'].plot.hist(bins=5)
plt.show()
C
plt.plot(df['Age'], bins=5)
plt.show()
D
plt.hist(df['Age'], bins=5)
plt.show()
Attempts:
2 left
💡 Hint
matplotlib's hist function takes data and bins as arguments.
🔧 Debug
advanced
2:00remaining
Identify error in groupby aggregation
What error does the following code raise?
Pandas
import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)
result = df.groupby('Category').agg({'Value': 'sum', 'NonExistent': 'mean'})
print(result)
AKeyError: 'NonExistent'
BTypeError: unsupported operand type(s) for +: 'int' and 'str'
CValueError: No numeric types to aggregate
DNo error, prints grouped sums and means
Attempts:
2 left
💡 Hint
Check if all keys in agg dictionary exist in DataFrame columns.
🚀 Application
expert
3:00remaining
Calculate correlation matrix and interpret output
Given the DataFrame below, what is the correlation coefficient between 'X' and 'Y'?
Pandas
import pandas as pd
import numpy as np

data = {'X': [1, 2, 3, 4, 5], 'Y': [2, 4, 6, 8, 10], 'Z': [5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
corr_matrix = df.corr()
print(round(corr_matrix.loc['X', 'Y'], 2))
A-1.00
B1.00
C0.00
D0.50
Attempts:
2 left
💡 Hint
Check if Y is a perfect linear function of X.