Challenge - 5 Problems

🎖️

Exploratory Data Analysis Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of basic data summary

What is the output of the following code snippet that summarizes a DataFrame?

Pandas

import pandas as pd

data = {'Age': [25, 30, 22, 40, 28], 'Salary': [50000, 60000, 45000, 80000, 52000]}
df = pd.DataFrame(data)
summary = df.describe()
print(summary)

A{'Age': {'count': 5.0, 'mean': 29.0, 'std': 7.071, 'min': 22.0, '25%': 25.0, '50%': 28.0, '75%': 30.0, 'max': 40.0}, 'Salary': {'count': 5.0, 'mean': 57400.0, 'std': 13435.028, 'min': 45000.0, '25%': 50000.0, '50%': 52000.0, '75%': 60000.0, 'max': 80000.0}}

   Age  Salary
0   25   50000
1   30   60000
2   22   45000
3   40   80000
4   28   52000

       Age        Salary
count   5.0      5.000000
mean   29.0  57400.000000
std     7.071 13435.028
min    22.0  45000.000000
25%    25.0  50000.000000
50%    28.0  52000.000000
75%    30.0  60000.000000
max    40.0  80000.000000

Age       29.0
Salary  57400.0
dtype: float64

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Count of missing values per column

Given the DataFrame below, what is the output of counting missing values per column?

Pandas

import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 22, 40], 'Salary': [50000, 60000, 45000, None]}
df = pd.DataFrame(data)
missing_counts = df.isnull().sum()
print(missing_counts)

Name      1
Age       1
Salary    1
dtype: int64

Name      1
Age       0
Salary    1
dtype: int64

Name      0
Age       1
Salary    1
dtype: int64

Name      1
Age       1
Salary    0
dtype: int64

Attempts:

2 left

❓ visualization

advanced

2:00remaining

Correct histogram plot code

Which option produces a histogram plot of the 'Age' column with 5 bins using matplotlib?

Pandas

import pandas as pd
import matplotlib.pyplot as plt

data = {'Age': [25, 30, 22, 40, 28, 35, 33, 31, 29, 27]}
df = pd.DataFrame(data)

# Choose the correct code to plot histogram

df.plot.hist('Age', bins=5)
plt.show()

df['Age'].plot.hist(bins=5)
plt.show()

plt.plot(df['Age'], bins=5)
plt.show()

plt.hist(df['Age'], bins=5)
plt.show()

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify error in groupby aggregation

What error does the following code raise?

Pandas

import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)
result = df.groupby('Category').agg({'Value': 'sum', 'NonExistent': 'mean'})
print(result)

AKeyError: 'NonExistent'

BTypeError: unsupported operand type(s) for +: 'int' and 'str'

CValueError: No numeric types to aggregate

DNo error, prints grouped sums and means

Attempts:

2 left

🚀 Application

expert

3:00remaining

Calculate correlation matrix and interpret output

Given the DataFrame below, what is the correlation coefficient between 'X' and 'Y'?

Pandas

import pandas as pd
import numpy as np

data = {'X': [1, 2, 3, 4, 5], 'Y': [2, 4, 6, 8, 10], 'Z': [5, 4, 3, 2, 1]}
df = pd.DataFrame(data)
corr_matrix = df.corr()
print(round(corr_matrix.loc['X', 'Y'], 2))

A-1.00

B1.00

C0.00

D0.50

Attempts:

2 left