0
0
Data Analysis Pythondata~20 mins

Data analysis workflow (collect, clean, explore, visualize, conclude) in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Data Analysis Workflow Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this data cleaning step?
Given a DataFrame with missing values, what will be the result after running this code?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': [None, 2, 3, 4]
})

cleaned = df.dropna()
print(cleaned)
A
     A    B
0  1.0  NaN
1  2.0  2.0
2  NaN  3.0
3  4.0  4.0
B
     A    B
1  2.0  2.0
3  4.0  4.0
C
Empty DataFrame
Columns: [A, B]
Index: []
D
     A    B
0  1.0  NaN
3  4.0  4.0
Attempts:
2 left
💡 Hint
dropna() removes rows with any missing values.
data_output
intermediate
2:00remaining
How many unique categories are in this dataset after cleaning?
After removing duplicates, how many unique 'Category' values remain?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'C', 'B', 'D', 'D', 'E']
})

cleaned = df.drop_duplicates()
unique_count = cleaned['Category'].nunique()
print(unique_count)
A5
B8
C4
D6
Attempts:
2 left
💡 Hint
drop_duplicates() removes repeated rows, then count unique categories.
visualization
advanced
2:00remaining
Which plot correctly shows the distribution of 'Age' after cleaning?
Given this cleaned DataFrame, which plot code will produce a histogram of 'Age'?
Data Analysis Python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'Age': [22, 25, 29, 35, 40, 40, 22, 30]
})
A
plt.hist(df['Age'])
plt.show()
B
plt.plot(df['Age'])
plt.show()
C
plt.scatter(df.index, df['Age'])
plt.show()
D
plt.boxplot(df['Age'])
plt.show()
Attempts:
2 left
💡 Hint
Histogram shows frequency distribution of a single variable.
🧠 Conceptual
advanced
2:00remaining
What is the correct order of steps in a data analysis workflow?
Arrange these steps in the correct order for a typical data analysis workflow.
A4,2,1,3,5
B1,2,4,3,5
C2,4,1,3,5
D2,1,4,3,5
Attempts:
2 left
💡 Hint
Think about what you do first: get data, then clean it, then explore.
🚀 Application
expert
2:00remaining
What is the mean value of 'Score' after cleaning and filtering?
Given this DataFrame, after removing rows with missing 'Score' and filtering scores >= 70, what is the mean?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'Name': ['Anna', 'Ben', 'Cara', 'Dan', 'Eva'],
    'Score': [85, None, 70, 65, 90]
})

cleaned = df.dropna(subset=['Score'])
filtered = cleaned[cleaned['Score'] >= 70]
mean_score = filtered['Score'].mean()
print(round(mean_score, 2))
A75.00
B80.00
C81.67
D70.00
Attempts:
2 left
💡 Hint
Calculate mean after filtering scores 70 and above.