Challenge - 5 Problems
Data Analysis Workflow Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this data cleaning step?
Given a DataFrame with missing values, what will be the result after running this code?
Data Analysis Python
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, None, 4], 'B': [None, 2, 3, 4] }) cleaned = df.dropna() print(cleaned)
Attempts:
2 left
💡 Hint
dropna() removes rows with any missing values.
✗ Incorrect
The dropna() function removes rows where any column has NaN. Only rows 1 and 3 have no missing values.
❓ data_output
intermediate2:00remaining
How many unique categories are in this dataset after cleaning?
After removing duplicates, how many unique 'Category' values remain?
Data Analysis Python
import pandas as pd df = pd.DataFrame({ 'Category': ['A', 'B', 'A', 'C', 'B', 'D', 'D', 'E'] }) cleaned = df.drop_duplicates() unique_count = cleaned['Category'].nunique() print(unique_count)
Attempts:
2 left
💡 Hint
drop_duplicates() removes repeated rows, then count unique categories.
✗ Incorrect
After removing duplicates, categories are A, B, C, D, E, total 5 unique values.
❓ visualization
advanced2:00remaining
Which plot correctly shows the distribution of 'Age' after cleaning?
Given this cleaned DataFrame, which plot code will produce a histogram of 'Age'?
Data Analysis Python
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({ 'Age': [22, 25, 29, 35, 40, 40, 22, 30] })
Attempts:
2 left
💡 Hint
Histogram shows frequency distribution of a single variable.
✗ Incorrect
plt.hist() creates a histogram showing how ages are distributed in bins.
🧠 Conceptual
advanced2:00remaining
What is the correct order of steps in a data analysis workflow?
Arrange these steps in the correct order for a typical data analysis workflow.
Attempts:
2 left
💡 Hint
Think about what you do first: get data, then clean it, then explore.
✗ Incorrect
First collect data, then clean it, explore patterns, visualize findings, and conclude.
🚀 Application
expert2:00remaining
What is the mean value of 'Score' after cleaning and filtering?
Given this DataFrame, after removing rows with missing 'Score' and filtering scores >= 70, what is the mean?
Data Analysis Python
import pandas as pd df = pd.DataFrame({ 'Name': ['Anna', 'Ben', 'Cara', 'Dan', 'Eva'], 'Score': [85, None, 70, 65, 90] }) cleaned = df.dropna(subset=['Score']) filtered = cleaned[cleaned['Score'] >= 70] mean_score = filtered['Score'].mean() print(round(mean_score, 2))
Attempts:
2 left
💡 Hint
Calculate mean after filtering scores 70 and above.
✗ Incorrect
Scores after cleaning and filtering are 85, 70, 90. Mean is (85+70+90)/3 = 81.67.