0
0
Pandasdata~20 mins

Handling inconsistent values in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Master of Handling Inconsistent Values
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of filling missing values with mode
What is the output DataFrame after filling missing values in the 'color' column with the mode of that column?
Pandas
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', None, 'blue', None, 'red']})
mode_color = df['color'].mode()[0]
df['color'] = df['color'].fillna(mode_color)
print(df)
A
   color
0    red
1   blue
2   blue
3   blue
4   blue
5    red
B
   color
0    red
1   blue
2    red
3   blue
4    red
5    red
C
   color
0    red
1   blue
2  None
3   blue
4  None
5    red
D
   color
0    red
1   blue
2   blue
3   blue
4  None
5    red
Attempts:
2 left
💡 Hint
The mode is the most frequent value in the column. Filling missing values replaces NaNs with this value.
data_output
intermediate
1:30remaining
Count unique values after standardizing text
After converting all entries in the 'fruit' column to lowercase, how many unique values remain?
Pandas
import pandas as pd

df = pd.DataFrame({'fruit': ['Apple', 'apple', 'Banana', 'BANANA', 'banana', 'Cherry']})
df['fruit'] = df['fruit'].str.lower()
unique_count = df['fruit'].nunique()
print(unique_count)
A4
B6
C3
D5
Attempts:
2 left
💡 Hint
Lowercasing makes 'Apple' and 'apple' the same value.
🔧 Debug
advanced
2:00remaining
Identify error in replacing inconsistent categorical values
What error does this code raise when trying to replace inconsistent values in the 'status' column?
Pandas
import pandas as pd

df = pd.DataFrame({'status': ['Active', 'active', 'Inactive', 'inactive', 'Pending']})
df['status'] = df['status'].replace({'active': 'Active', 'inactive': 'Inactive'})
print(df)
AValueError: cannot replace with different length
BKeyError: 'active' not found in axis
CTypeError: unhashable type: 'dict'
D
No error; output is:
  status
0  Active
1  Active
2  Inactive
3  Inactive
4  Pending
Attempts:
2 left
💡 Hint
Check if replace is case sensitive and if keys exist in the column.
visualization
advanced
2:30remaining
Visualizing inconsistent data before and after cleaning
Which plot correctly shows the count of each category in the 'department' column before and after fixing inconsistent capitalization?
Pandas
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'department': ['Sales', 'sales', 'HR', 'hr', 'IT', 'it', 'IT']})
df['department_clean'] = df['department'].str.upper()
counts_before = df['department'].value_counts()
counts_after = df['department_clean'].value_counts()

plt.figure(figsize=(8,4))
plt.subplot(1,2,1)
counts_before.plot(kind='bar', title='Before Cleaning')
plt.subplot(1,2,2)
counts_after.plot(kind='bar', title='After Cleaning')
plt.tight_layout()
plt.show()
ALeft plot shows 2 bars for 'Sales' and 'sales'; right plot shows 1 bar for 'SALES'.
BLeft plot shows 1 bar for 'Sales'; right plot shows 2 bars for 'SALES' and 'IT'.
CLeft plot shows 3 bars for 'Sales', 'HR', 'IT'; right plot shows 3 bars for 'SALES', 'HR', 'IT'.
DLeft plot shows 6 bars for each entry; right plot shows 1 bar for all combined.
Attempts:
2 left
💡 Hint
Check how value_counts counts exact matches before and after uppercasing.
🚀 Application
expert
3:00remaining
Handling mixed data types in a column for analysis
Given a DataFrame column 'age' with mixed types (integers, strings, and NaNs), which option correctly converts all valid ages to integers and replaces invalid entries with the median age?
Pandas
import pandas as pd
import numpy as np

df = pd.DataFrame({'age': [25, '30', 'unknown', np.nan, 22, '27']})

# Your task: convert 'age' to integers, replace invalid with median

# median calculation should ignore invalid and NaN values

# Which option achieves this correctly?
A
df['age'] = pd.to_numeric(df['age'], errors='coerce')
median_age = int(df['age'].median())
df['age'] = df['age'].fillna(median_age).astype(int)
print(df)
B
df['age'] = df['age'].astype(int)
median_age = df['age'].median()
df['age'] = df['age'].fillna(median_age)
print(df)
C
df['age'] = df['age'].replace('unknown', 0).astype(int)
median_age = df['age'].median()
df['age'] = df['age'].fillna(median_age)
print(df)
D
df['age'] = df['age'].apply(lambda x: int(x) if x.isdigit() else np.nan)
median_age = df['age'].median()
df['age'] = df['age'].fillna(median_age).astype(int)
print(df)
Attempts:
2 left
💡 Hint
Use pd.to_numeric with errors='coerce' to convert and set invalid to NaN.