Challenge - 5 Problems

🎖️

Master of Combining Multiple Cleaning Steps

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of combined cleaning steps on a DataFrame

What is the output of the following code after applying multiple cleaning steps?

Pandas

import pandas as pd

data = {'Name': ['Alice', 'Bob', None, 'David'],
        'Age': [25, None, 30, 22],
        'City': ['New York', 'Los Angeles', 'New York', None]}
df = pd.DataFrame(data)

# Cleaning steps
result = (df.dropna(subset=['Name'])
            .fillna({'Age': df['Age'].mean(), 'City': 'Unknown'})
            .reset_index(drop=True))
print(result)

    Name   Age         City
0  Alice  25.0     New York
1    Bob  27.0       Los Angeles
2  David  22.0      Unknown

    Name   Age         City
0  Alice  25.0     New York
1    Bob  25.666667  Los Angeles
2  David  22.0      New York

    Name   Age         City
0  Alice  25.0     New York
1    Bob  NaN       Los Angeles
2  David  22.0      Unknown

    Name   Age         City
0  Alice  25.0     New York
1    Bob  25.666667  Los Angeles
2  David  22.0      Unknown

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after multiple cleaning steps

After applying the cleaning steps below, how many rows remain in the DataFrame?

Pandas

import pandas as pd

data = {'Product': ['A', 'B', 'C', 'D', None],
        'Price': [10, None, 15, 20, 25],
        'Stock': [100, 200, None, 150, 300]}
df = pd.DataFrame(data)

cleaned = (df.dropna(subset=['Product'])
             .fillna({'Price': 0, 'Stock': 0})
             .query('Price > 0 and Stock > 0'))
print(len(cleaned))

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in combined cleaning steps

What error does the following code raise?

Pandas

import pandas as pd

data = {'A': [1, 2, None], 'B': [4, None, 6]}
df = pd.DataFrame(data)

result = df.bfill().dropna(subset=['C'])

AKeyError: 'C'

BTypeError: fillna() got an unexpected keyword argument 'method'

CAttributeError: 'DataFrame' object has no attribute 'backfill'

DNo error, returns DataFrame

Attempts:

2 left

❓ visualization

advanced

2:00remaining

Resulting plot after cleaning and grouping

After cleaning and grouping the data, what does the bar plot show?

Pandas

import pandas as pd
import matplotlib.pyplot as plt

data = {'Category': ['X', 'Y', 'X', 'Z', None],
        'Value': [10, 20, None, 15, 5]}
df = pd.DataFrame(data)

cleaned = df.dropna(subset=['Category']).fillna({'Value': 0})
grouped = cleaned.groupby('Category').sum()

ax = grouped.plot(kind='bar', legend=False)
plt.ylabel('Sum of Values')
plt.title('Sum of Values by Category')
plt.close()  # Prevent actual plot display in test

print(grouped)

A{'Value': {'X': 10.0, 'Y': 20.0, 'Z': 15.0}}

B{'Value': {'X': 10.0, 'Y': 20.0, 'Z': 0.0}}

C{'Value': {'X': 0.0, 'Y': 20.0, 'Z': 15.0}}

D{'Value': {'X': 10.0, 'Y': 0.0, 'Z': 15.0}}

Attempts:

2 left

🚀 Application

expert

2:30remaining

Combining multiple cleaning steps for a real dataset

You have a DataFrame with columns 'ID', 'Score', and 'Grade'. You want to:

Remove rows where 'ID' is missing.
Fill missing 'Score' with the median score.
Replace missing 'Grade' with 'Incomplete'.
Drop rows where 'Score' is less than 50 after filling.

Which code snippet correctly performs all these steps?

Adf.fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'}).dropna(subset=['ID']).query('Score >= 50')

Bdf.query('Score >= 50').dropna(subset=['ID']).fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'})

Cdf.dropna(subset=['ID']).fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'}).query('Score >= 50')

Ddf.dropna(subset=['ID']).query('Score >= 50').fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'})

Attempts:

2 left