0
0
Pandasdata~20 mins

Data validation checks in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Data Validation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Check for missing values in a DataFrame
What is the output of this code that checks for missing values in the DataFrame?
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': ['x', None, 'y', 'z']
})

result = df.isnull().sum()
A
A    1
B    1
dtype: int64
B
A    0
B    0
dtype: int64
C
A    1
B    0
dtype: int64
D
A    0
B    1
dtype: int64
Attempts:
2 left
💡 Hint
Use the isnull() method to find missing values and sum() to count them per column.
data_output
intermediate
2:00remaining
Count unique values per column
What is the output of this code that counts unique values in each column of the DataFrame?
Pandas
import pandas as pd

df = pd.DataFrame({
    'Color': ['red', 'blue', 'red', 'green'],
    'Shape': ['circle', 'square', 'circle', 'triangle']
})

unique_counts = df.nunique()
A
Color    4
Shape    4
dtype: int64
B
Color    3
Shape    3
dtype: int64
C
Color    3
Shape    4
dtype: int64
D
Color    4
Shape    3
dtype: int64
Attempts:
2 left
💡 Hint
The nunique() method counts distinct values per column.
🔧 Debug
advanced
2:00remaining
Identify the error in data type validation
What error does this code raise when checking if all values in column 'Age' are integers?
Pandas
import pandas as pd

df = pd.DataFrame({'Age': [25, 30, 'thirty-five', 40]})

all_int = df['Age'].apply(lambda x: isinstance(x, int)).all()
AFalse
BTrue
CTypeError
DValueError
Attempts:
2 left
💡 Hint
Check the data types of each value in the 'Age' column.
🚀 Application
advanced
2:00remaining
Detect duplicate rows in a DataFrame
Which option correctly returns a DataFrame containing only the duplicate rows?
Pandas
import pandas as pd

df = pd.DataFrame({
    'ID': [1, 2, 2, 3, 4, 4, 4],
    'Value': ['a', 'b', 'b', 'c', 'd', 'd', 'd']
})
Adf[df.duplicated(keep='first')]
Bdf[df.duplicated()]
Cdf[df.duplicated(keep='last')]
Ddf[df.duplicated(keep=False)]
Attempts:
2 left
💡 Hint
Use duplicated() with keep=False to mark all duplicates as True.
🧠 Conceptual
expert
3:00remaining
Understanding data validation with custom rules
You want to validate a DataFrame column 'Score' to ensure all values are between 0 and 100 inclusive. Which code snippet correctly returns True if all values meet this condition?
A(df['Score'] >= 0) & (df['Score'] <= 100).all()
Bdf['Score'].apply(lambda x: 0 <= x <= 100).all()
Cdf['Score'].between(0, 100).all()
D((df['Score'] >= 0) & (df['Score'] <= 100)).all()
Attempts:
2 left
💡 Hint
Use the pandas between() method for inclusive range checks.