Challenge - 5 Problems
Pipe Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of chained pipe() calls on a DataFrame
What is the output of the following code that uses
pipe() to clean a DataFrame step-by-step?Pandas
import pandas as pd def drop_missing(df): return df.dropna() def multiply_age(df): df['age'] = df['age'] * 2 return df def rename_columns(df): return df.rename(columns={'age': 'double_age'}) df = pd.DataFrame({ 'name': ['Alice', 'Bob', None, 'David'], 'age': [25, None, 30, 40] }) result = (df.pipe(drop_missing) .pipe(multiply_age) .pipe(rename_columns)) print(result)
Attempts:
2 left
💡 Hint
Remember that
pipe() passes the DataFrame to each function and returns the modified DataFrame.✗ Incorrect
The
drop_missing function removes rows with any missing values, so rows with Bob and None name are dropped. Then multiply_age doubles the 'age' values. Finally, rename_columns renames 'age' to 'double_age'. The result shows only Alice and David with doubled ages.❓ data_output
intermediate1:30remaining
Number of rows after cleaning pipeline
Given the following cleaning pipeline using
pipe(), how many rows remain in the final DataFrame?Pandas
import pandas as pd def filter_adults(df): return df[df['age'] >= 18] def drop_duplicates(df): return df.drop_duplicates(subset=['name']) def add_status(df): df['status'] = 'adult' return df df = pd.DataFrame({ 'name': ['Anna', 'Ben', 'Anna', 'Cara', 'Dan'], 'age': [17, 22, 17, 19, 16] }) result = (df.pipe(filter_adults) .pipe(drop_duplicates) .pipe(add_status)) print(len(result))
Attempts:
2 left
💡 Hint
Check which rows pass the age filter, then remove duplicates by name.
✗ Incorrect
Rows with age >= 18 are Ben (22, index 1) and Cara (19, index 3). Both Anna rows (age 17) and Dan (16) are filtered out. The remaining rows have unique names, so drop_duplicates removes nothing. Final length: 2.
🔧 Debug
advanced1:30remaining
Identify the error in this pipe() cleaning function
What error will this code raise when run, and why?
Pandas
import pandas as pd def add_column(df): df['new_col'] = df['age'] + 10 # Missing return statement df = pd.DataFrame({'age': [20, 30, 40]}) result = df.pipe(add_column) print(result)
Attempts:
2 left
💡 Hint
Check what the function returns when used with pipe().
✗ Incorrect
The
add_column function modifies the DataFrame in place but implicitly returns None (no return statement). Thus, df.pipe(add_column) returns None, and print(result) outputs 'None'. No error is raised.🚀 Application
advanced2:00remaining
Using pipe() to apply multiple cleaning steps with parameters
You want to apply these cleaning steps to a DataFrame using
pipe(): remove rows where 'score' is below a threshold, then add a new column 'passed' that is True if 'score' >= pass_mark. Which code correctly applies these steps with parameters using pipe()?Pandas
import pandas as pd def filter_score(df, threshold): return df[df['score'] >= threshold] def add_passed_column(df, pass_mark): df['passed'] = df['score'] >= pass_mark return df df = pd.DataFrame({'score': [55, 70, 40, 90]})
Attempts:
2 left
💡 Hint
Remember that extra arguments to pipe() after the first are passed as positional or keyword arguments to the function.
✗ Incorrect
Option D correctly passes the threshold and pass_mark as keyword arguments to the respective functions in pipe(). Option D calls the functions immediately instead of passing them to pipe(). Option D passes positional arguments correctly but option D is more explicit and preferred. Option D passes extra arguments incorrectly.
🧠 Conceptual
expert1:30remaining
Why use pipe() in pandas cleaning pipelines?
Which of the following is the best explanation for why
pipe() is useful in building data cleaning pipelines?Attempts:
2 left
💡 Hint
Think about how
pipe() helps organize code.✗ Incorrect
pipe() is designed to pass a DataFrame through a sequence of functions that each take a DataFrame and return a DataFrame. This makes code easier to read and maintain by chaining steps clearly.