Challenge - 5 Problems

🎖️

Data Validation Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Check for missing values in a DataFrame

What is the output of this code that checks for missing values in the DataFrame?

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': ['x', None, 'y', 'z']
})

result = df.isnull().sum()

A    1
B    1
dtype: int64

A    0
B    0
dtype: int64

A    1
B    0
dtype: int64

A    0
B    1
dtype: int64

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Count unique values per column

What is the output of this code that counts unique values in each column of the DataFrame?

Pandas

import pandas as pd

df = pd.DataFrame({
    'Color': ['red', 'blue', 'red', 'green'],
    'Shape': ['circle', 'square', 'circle', 'triangle']
})

unique_counts = df.nunique()

Color    4
Shape    4
dtype: int64

Color    3
Shape    3
dtype: int64

Color    3
Shape    4
dtype: int64

Color    4
Shape    3
dtype: int64

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in data type validation

What error does this code raise when checking if all values in column 'Age' are integers?

Pandas

import pandas as pd

df = pd.DataFrame({'Age': [25, 30, 'thirty-five', 40]})

all_int = df['Age'].apply(lambda x: isinstance(x, int)).all()

AFalse

BTrue

CTypeError

DValueError

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Detect duplicate rows in a DataFrame

Which option correctly returns a DataFrame containing only the duplicate rows?

Pandas

import pandas as pd

df = pd.DataFrame({
    'ID': [1, 2, 2, 3, 4, 4, 4],
    'Value': ['a', 'b', 'b', 'c', 'd', 'd', 'd']
})

Adf[df.duplicated(keep='first')]

Bdf[df.duplicated()]

Cdf[df.duplicated(keep='last')]

Ddf[df.duplicated(keep=False)]

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Understanding data validation with custom rules

You want to validate a DataFrame column 'Score' to ensure all values are between 0 and 100 inclusive. Which code snippet correctly returns True if all values meet this condition?

A(df['Score'] >= 0) & (df['Score'] <= 100).all()

Bdf['Score'].apply(lambda x: 0 <= x <= 100).all()

Cdf['Score'].between(0, 100).all()

D((df['Score'] >= 0) & (df['Score'] <= 100)).all()

Attempts:

2 left