Challenge - 5 Problems

🎖️

Master of Handling Inconsistent Values

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of filling missing values with mode

What is the output DataFrame after filling missing values in the 'color' column with the mode of that column?

Pandas

import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', None, 'blue', None, 'red']})
mode_color = df['color'].mode()[0]
df['color'] = df['color'].fillna(mode_color)
print(df)

   color
0    red
1   blue
2   blue
3   blue
4   blue
5    red

   color
0    red
1   blue
2    red
3   blue
4    red
5    red

   color
0    red
1   blue
2  None
3   blue
4  None
5    red

   color
0    red
1   blue
2   blue
3   blue
4  None
5    red

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Count unique values after standardizing text

After converting all entries in the 'fruit' column to lowercase, how many unique values remain?

Pandas

import pandas as pd

df = pd.DataFrame({'fruit': ['Apple', 'apple', 'Banana', 'BANANA', 'banana', 'Cherry']})
df['fruit'] = df['fruit'].str.lower()
unique_count = df['fruit'].nunique()
print(unique_count)

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify error in replacing inconsistent categorical values

What error does this code raise when trying to replace inconsistent values in the 'status' column?

Pandas

import pandas as pd

df = pd.DataFrame({'status': ['Active', 'active', 'Inactive', 'inactive', 'Pending']})
df['status'] = df['status'].replace({'active': 'Active', 'inactive': 'Inactive'})
print(df)

AValueError: cannot replace with different length

BKeyError: 'active' not found in axis

CTypeError: unhashable type: 'dict'

No error; output is:
  status
0  Active
1  Active
2  Inactive
3  Inactive
4  Pending

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualizing inconsistent data before and after cleaning

Which plot correctly shows the count of each category in the 'department' column before and after fixing inconsistent capitalization?

Pandas

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'department': ['Sales', 'sales', 'HR', 'hr', 'IT', 'it', 'IT']})
df['department_clean'] = df['department'].str.upper()
counts_before = df['department'].value_counts()
counts_after = df['department_clean'].value_counts()

plt.figure(figsize=(8,4))
plt.subplot(1,2,1)
counts_before.plot(kind='bar', title='Before Cleaning')
plt.subplot(1,2,2)
counts_after.plot(kind='bar', title='After Cleaning')
plt.tight_layout()
plt.show()

ALeft plot shows 2 bars for 'Sales' and 'sales'; right plot shows 1 bar for 'SALES'.

BLeft plot shows 1 bar for 'Sales'; right plot shows 2 bars for 'SALES' and 'IT'.

CLeft plot shows 3 bars for 'Sales', 'HR', 'IT'; right plot shows 3 bars for 'SALES', 'HR', 'IT'.

DLeft plot shows 6 bars for each entry; right plot shows 1 bar for all combined.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Handling mixed data types in a column for analysis

Given a DataFrame column 'age' with mixed types (integers, strings, and NaNs), which option correctly converts all valid ages to integers and replaces invalid entries with the median age?

Pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({'age': [25, '30', 'unknown', np.nan, 22, '27']})

# Your task: convert 'age' to integers, replace invalid with median

# median calculation should ignore invalid and NaN values

# Which option achieves this correctly?

df['age'] = pd.to_numeric(df['age'], errors='coerce')
median_age = int(df['age'].median())
df['age'] = df['age'].fillna(median_age).astype(int)
print(df)

df['age'] = df['age'].astype(int)
median_age = df['age'].median()
df['age'] = df['age'].fillna(median_age)
print(df)

df['age'] = df['age'].replace('unknown', 0).astype(int)
median_age = df['age'].median()
df['age'] = df['age'].fillna(median_age)
print(df)

df['age'] = df['age'].apply(lambda x: int(x) if x.isdigit() else np.nan)
median_age = df['age'].median()
df['age'] = df['age'].fillna(median_age).astype(int)
print(df)

Attempts:

2 left