Challenge - 5 Problems

🎖️

Pipe Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of chained pipe() calls on a DataFrame

What is the output of the following code that uses pipe() to clean a DataFrame step-by-step?

Pandas

import pandas as pd

def drop_missing(df):
    return df.dropna()

def multiply_age(df):
    df['age'] = df['age'] * 2
    return df

def rename_columns(df):
    return df.rename(columns={'age': 'double_age'})

df = pd.DataFrame({
    'name': ['Alice', 'Bob', None, 'David'],
    'age': [25, None, 30, 40]
})

result = (df.pipe(drop_missing)
            .pipe(multiply_age)
            .pipe(rename_columns))

print(result)

   name  double_age
0  Alice          50
3  David          80

B{'name': ['Alice', 'David'], 'double_age': [50, 80]}

   name  age
0  Alice   50
3  David   80

   name  age
0  Alice   25
3  David   40

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after cleaning pipeline

Given the following cleaning pipeline using pipe(), how many rows remain in the final DataFrame?

Pandas

import pandas as pd

def filter_adults(df):
    return df[df['age'] >= 18]

def drop_duplicates(df):
    return df.drop_duplicates(subset=['name'])

def add_status(df):
    df['status'] = 'adult'
    return df

df = pd.DataFrame({
    'name': ['Anna', 'Ben', 'Anna', 'Cara', 'Dan'],
    'age': [17, 22, 17, 19, 16]
})

result = (df.pipe(filter_adults)
            .pipe(drop_duplicates)
            .pipe(add_status))

print(len(result))

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in this pipe() cleaning function

What error will this code raise when run, and why?

Pandas

import pandas as pd

def add_column(df):
    df['new_col'] = df['age'] + 10
    # Missing return statement

df = pd.DataFrame({'age': [20, 30, 40]})

result = df.pipe(add_column)
print(result)

None
(empty output)

BAttributeError: 'NoneType' object has no attribute 'pipe'

CTypeError: unsupported operand type(s) for +: 'int' and 'str'

DKeyError: 'age'

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Using pipe() to apply multiple cleaning steps with parameters

You want to apply these cleaning steps to a DataFrame using pipe(): remove rows where 'score' is below a threshold, then add a new column 'passed' that is True if 'score' >= pass_mark. Which code correctly applies these steps with parameters using pipe()?

Pandas

import pandas as pd

def filter_score(df, threshold):
    return df[df['score'] >= threshold]

def add_passed_column(df, pass_mark):
    df['passed'] = df['score'] >= pass_mark
    return df

df = pd.DataFrame({'score': [55, 70, 40, 90]})

Aresult = df.pipe(filter_score, 50, pass_mark=60).pipe(add_passed_column)

Bresult = df.pipe(filter_score(threshold=50)).pipe(add_passed_column(pass_mark=60))

Cresult = df.pipe(filter_score, 50).pipe(add_passed_column, 60)

Dresult = df.pipe(filter_score, threshold=50).pipe(add_passed_column, pass_mark=60)

Attempts:

2 left

🧠 Conceptual

expert

1:30remaining

Why use pipe() in pandas cleaning pipelines?

Which of the following is the best explanation for why pipe() is useful in building data cleaning pipelines?

A<code>pipe()</code> automatically parallelizes DataFrame operations for faster execution.

B<code>pipe()</code> allows chaining functions that take and return DataFrames, improving readability and modularity.

C<code>pipe()</code> converts DataFrames into numpy arrays for faster numeric computation.

D<code>pipe()</code> replaces the need for writing custom functions by providing built-in cleaning methods.

Attempts:

2 left