0
0
Pandasdata~20 mins

Building cleaning pipelines with pipe() in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pipe Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of chained pipe() calls on a DataFrame
What is the output of the following code that uses pipe() to clean a DataFrame step-by-step?
Pandas
import pandas as pd

def drop_missing(df):
    return df.dropna()

def multiply_age(df):
    df['age'] = df['age'] * 2
    return df

def rename_columns(df):
    return df.rename(columns={'age': 'double_age'})

df = pd.DataFrame({
    'name': ['Alice', 'Bob', None, 'David'],
    'age': [25, None, 30, 40]
})

result = (df.pipe(drop_missing)
            .pipe(multiply_age)
            .pipe(rename_columns))

print(result)
A
   name  double_age
0  Alice          50
3  David          80
B{'name': ['Alice', 'David'], 'double_age': [50, 80]}
C
   name  age
0  Alice   50
3  David   80
D
   name  age
0  Alice   25
3  David   40
Attempts:
2 left
💡 Hint
Remember that pipe() passes the DataFrame to each function and returns the modified DataFrame.
data_output
intermediate
1:30remaining
Number of rows after cleaning pipeline
Given the following cleaning pipeline using pipe(), how many rows remain in the final DataFrame?
Pandas
import pandas as pd

def filter_adults(df):
    return df[df['age'] >= 18]

def drop_duplicates(df):
    return df.drop_duplicates(subset=['name'])

def add_status(df):
    df['status'] = 'adult'
    return df

df = pd.DataFrame({
    'name': ['Anna', 'Ben', 'Anna', 'Cara', 'Dan'],
    'age': [17, 22, 17, 19, 16]
})

result = (df.pipe(filter_adults)
            .pipe(drop_duplicates)
            .pipe(add_status))

print(len(result))
A4
B3
C2
D5
Attempts:
2 left
💡 Hint
Check which rows pass the age filter, then remove duplicates by name.
🔧 Debug
advanced
1:30remaining
Identify the error in this pipe() cleaning function
What error will this code raise when run, and why?
Pandas
import pandas as pd

def add_column(df):
    df['new_col'] = df['age'] + 10
    # Missing return statement

df = pd.DataFrame({'age': [20, 30, 40]})

result = df.pipe(add_column)
print(result)
A
None
(empty output)
BAttributeError: 'NoneType' object has no attribute 'pipe'
CTypeError: unsupported operand type(s) for +: 'int' and 'str'
DKeyError: 'age'
Attempts:
2 left
💡 Hint
Check what the function returns when used with pipe().
🚀 Application
advanced
2:00remaining
Using pipe() to apply multiple cleaning steps with parameters
You want to apply these cleaning steps to a DataFrame using pipe(): remove rows where 'score' is below a threshold, then add a new column 'passed' that is True if 'score' >= pass_mark. Which code correctly applies these steps with parameters using pipe()?
Pandas
import pandas as pd

def filter_score(df, threshold):
    return df[df['score'] >= threshold]

def add_passed_column(df, pass_mark):
    df['passed'] = df['score'] >= pass_mark
    return df

df = pd.DataFrame({'score': [55, 70, 40, 90]})
Aresult = df.pipe(filter_score, 50, pass_mark=60).pipe(add_passed_column)
Bresult = df.pipe(filter_score(threshold=50)).pipe(add_passed_column(pass_mark=60))
Cresult = df.pipe(filter_score, 50).pipe(add_passed_column, 60)
Dresult = df.pipe(filter_score, threshold=50).pipe(add_passed_column, pass_mark=60)
Attempts:
2 left
💡 Hint
Remember that extra arguments to pipe() after the first are passed as positional or keyword arguments to the function.
🧠 Conceptual
expert
1:30remaining
Why use pipe() in pandas cleaning pipelines?
Which of the following is the best explanation for why pipe() is useful in building data cleaning pipelines?
A<code>pipe()</code> automatically parallelizes DataFrame operations for faster execution.
B<code>pipe()</code> allows chaining functions that take and return DataFrames, improving readability and modularity.
C<code>pipe()</code> converts DataFrames into numpy arrays for faster numeric computation.
D<code>pipe()</code> replaces the need for writing custom functions by providing built-in cleaning methods.
Attempts:
2 left
💡 Hint
Think about how pipe() helps organize code.