Challenge - 5 Problems
Master of Handling Inconsistent Values
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of filling missing values with mode
What is the output DataFrame after filling missing values in the 'color' column with the mode of that column?
Pandas
import pandas as pd df = pd.DataFrame({'color': ['red', 'blue', None, 'blue', None, 'red']}) mode_color = df['color'].mode()[0] df['color'] = df['color'].fillna(mode_color) print(df)
Attempts:
2 left
💡 Hint
The mode is the most frequent value in the column. Filling missing values replaces NaNs with this value.
✗ Incorrect
The column has two modes ('red' and 'blue', both appearing twice); mode()[0] selects 'red' (first appearing). Filling missing values replaces None with 'red'.
❓ data_output
intermediate1:30remaining
Count unique values after standardizing text
After converting all entries in the 'fruit' column to lowercase, how many unique values remain?
Pandas
import pandas as pd df = pd.DataFrame({'fruit': ['Apple', 'apple', 'Banana', 'BANANA', 'banana', 'Cherry']}) df['fruit'] = df['fruit'].str.lower() unique_count = df['fruit'].nunique() print(unique_count)
Attempts:
2 left
💡 Hint
Lowercasing makes 'Apple' and 'apple' the same value.
✗ Incorrect
After lowercasing, the unique fruits are 'apple', 'banana', and 'cherry', totaling 3.
🔧 Debug
advanced2:00remaining
Identify error in replacing inconsistent categorical values
What error does this code raise when trying to replace inconsistent values in the 'status' column?
Pandas
import pandas as pd df = pd.DataFrame({'status': ['Active', 'active', 'Inactive', 'inactive', 'Pending']}) df['status'] = df['status'].replace({'active': 'Active', 'inactive': 'Inactive'}) print(df)
Attempts:
2 left
💡 Hint
Check if replace is case sensitive and if keys exist in the column.
✗ Incorrect
The replace method works fine and replaces 'active' with 'Active' and 'inactive' with 'Inactive'. No error occurs.
❓ visualization
advanced2:30remaining
Visualizing inconsistent data before and after cleaning
Which plot correctly shows the count of each category in the 'department' column before and after fixing inconsistent capitalization?
Pandas
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({'department': ['Sales', 'sales', 'HR', 'hr', 'IT', 'it', 'IT']}) df['department_clean'] = df['department'].str.upper() counts_before = df['department'].value_counts() counts_after = df['department_clean'].value_counts() plt.figure(figsize=(8,4)) plt.subplot(1,2,1) counts_before.plot(kind='bar', title='Before Cleaning') plt.subplot(1,2,2) counts_after.plot(kind='bar', title='After Cleaning') plt.tight_layout() plt.show()
Attempts:
2 left
💡 Hint
Check how value_counts counts exact matches before and after uppercasing.
✗ Incorrect
Before cleaning, 'Sales' and 'sales' are counted separately. After cleaning, both become 'SALES' and counted together.
🚀 Application
expert3:00remaining
Handling mixed data types in a column for analysis
Given a DataFrame column 'age' with mixed types (integers, strings, and NaNs), which option correctly converts all valid ages to integers and replaces invalid entries with the median age?
Pandas
import pandas as pd import numpy as np df = pd.DataFrame({'age': [25, '30', 'unknown', np.nan, 22, '27']}) # Your task: convert 'age' to integers, replace invalid with median # median calculation should ignore invalid and NaN values # Which option achieves this correctly?
Attempts:
2 left
💡 Hint
Use pd.to_numeric with errors='coerce' to convert and set invalid to NaN.
✗ Incorrect
Option A converts all valid numbers to numeric, invalid to NaN, calculates median ignoring NaN, then fills NaN with median and converts to int.