Challenge - 5 Problems

🎖️

Systematic Cleaning Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ data_output

intermediate

2:00remaining

Effect of Missing Data on Mean Calculation

Consider a dataset with missing values. What is the output of the mean calculation without cleaning?

Pandas

import pandas as pd

data = {'score': [10, 20, None, 40, 50]}
df = pd.DataFrame(data)
mean_score = df['score'].mean()
print(mean_score)

ATypeError

BNone

CSyntaxError

D30.0

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Impact of Duplicate Rows on Data Analysis

Given a DataFrame with duplicate rows, what is the number of unique rows after removing duplicates?

Pandas

import pandas as pd

data = {'id': [1, 2, 2, 3, 4, 4, 4], 'value': [10, 20, 20, 30, 40, 40, 40]}
df = pd.DataFrame(data)
unique_rows = df.drop_duplicates().shape[0]
print(unique_rows)

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

Why Systematic Cleaning Prevents Analysis Errors

Which statement best explains why systematic data cleaning is important before analysis?

AIt helps identify and fix inconsistencies that could lead to wrong conclusions.

BIt ensures all data is removed to avoid errors.

CIt speeds up the analysis by deleting half the data randomly.

DIt replaces all numbers with zeros to simplify calculations.

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

Result of Cleaning and Filtering Data

What is the output DataFrame after cleaning missing values and filtering scores above 25?

Pandas

import pandas as pd

data = {'name': ['Anna', 'Bob', 'Cara', 'Dan'], 'score': [20, None, 30, 40]}
df = pd.DataFrame(data)
df_clean = df.dropna()
df_filtered = df_clean[df_clean['score'] > 25]
print(df_filtered)

   name  score
0  Anna   20.0
2  Cara   30.0
3   Dan   40.0

   name  score
2  Cara   30.0
3   Dan   40.0

   name  score
1   Bob    NaN
2  Cara   30.0
3   Dan   40.0

Empty DataFrame
Columns: [name, score]
Index: []

Attempts:

2 left

🚀 Application

expert

3:00remaining

Identify the Impact of Unclean Data on Grouped Aggregation

Given this dataset, what is the average score per group if missing values are NOT cleaned?

Pandas

import pandas as pd
import numpy as np

data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'], 'score': [10, np.nan, 20, 30, np.nan, 50]}
df = pd.DataFrame(data)
grouped_mean = df.groupby('group')['score'].mean()
print(grouped_mean)

group
A    NaN
B    25.0
C    NaN
Name: score, dtype: float64

group
A    10.0
B    25.0
C    0.0
Name: score, dtype: float64

group
A    10.0
B    25.0
C    50.0
Name: score, dtype: float64

group
A    20.0
B    25.0
C    50.0
Name: score, dtype: float64

Attempts:

2 left