Challenge - 5 Problems

🎖️

Data Analysis Workflow Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this data cleaning step?

Given a DataFrame with missing values, what will be the result after running this code?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': [None, 2, 3, 4]
})

cleaned = df.dropna()
print(cleaned)

     A    B
0  1.0  NaN
1  2.0  2.0
2  NaN  3.0
3  4.0  4.0

     A    B
1  2.0  2.0
3  4.0  4.0

Empty DataFrame
Columns: [A, B]
Index: []

     A    B
0  1.0  NaN
3  4.0  4.0

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

How many unique categories are in this dataset after cleaning?

After removing duplicates, how many unique 'Category' values remain?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'C', 'B', 'D', 'D', 'E']
})

cleaned = df.drop_duplicates()
unique_count = cleaned['Category'].nunique()
print(unique_count)

Attempts:

2 left

❓ visualization

advanced

2:00remaining

Which plot correctly shows the distribution of 'Age' after cleaning?

Given this cleaned DataFrame, which plot code will produce a histogram of 'Age'?

Data Analysis Python

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'Age': [22, 25, 29, 35, 40, 40, 22, 30]
})

plt.hist(df['Age'])
plt.show()

plt.plot(df['Age'])
plt.show()

plt.scatter(df.index, df['Age'])
plt.show()

plt.boxplot(df['Age'])
plt.show()

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

What is the correct order of steps in a data analysis workflow?

Arrange these steps in the correct order for a typical data analysis workflow.

A4,2,1,3,5

B1,2,4,3,5

C2,4,1,3,5

D2,1,4,3,5

Attempts:

2 left

🚀 Application

expert

2:00remaining

What is the mean value of 'Score' after cleaning and filtering?

Given this DataFrame, after removing rows with missing 'Score' and filtering scores >= 70, what is the mean?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({
    'Name': ['Anna', 'Ben', 'Cara', 'Dan', 'Eva'],
    'Score': [85, None, 70, 65, 90]
})

cleaned = df.dropna(subset=['Score'])
filtered = cleaned[cleaned['Score'] >= 70]
mean_score = filtered['Score'].mean()
print(round(mean_score, 2))

A75.00

B80.00

C81.67

D70.00

Attempts:

2 left