What is the output of the following code after applying multiple cleaning steps?
import pandas as pd data = {'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 30, 22], 'City': ['New York', 'Los Angeles', 'New York', None]} df = pd.DataFrame(data) # Cleaning steps result = (df.dropna(subset=['Name']) .fillna({'Age': df['Age'].mean(), 'City': 'Unknown'}) .reset_index(drop=True)) print(result)
Look carefully at how missing values are handled and what the mean age is.
The code first removes rows where 'Name' is missing. Then it fills missing 'Age' with the mean of the 'Age' column (which is (25 + 30 + 22)/3 = 25.6667). Missing 'City' values are replaced with 'Unknown'. The index is reset.
After applying the cleaning steps below, how many rows remain in the DataFrame?
import pandas as pd data = {'Product': ['A', 'B', 'C', 'D', None], 'Price': [10, None, 15, 20, 25], 'Stock': [100, 200, None, 150, 300]} df = pd.DataFrame(data) cleaned = (df.dropna(subset=['Product']) .fillna({'Price': 0, 'Stock': 0}) .query('Price > 0 and Stock > 0')) print(len(cleaned))
Check which rows are dropped and which remain after filtering.
Row with Product None is dropped. Missing Price and Stock are filled with 0. Then rows with Price > 0 and Stock > 0 remain. Rows B (Price NaN replaced with 0) and C (Stock NaN replaced with 0) are filtered out. Remaining are A, D. So 2 rows remain.
What error does the following code raise?
import pandas as pd data = {'A': [1, 2, None], 'B': [4, None, 6]} df = pd.DataFrame(data) result = df.bfill().dropna(subset=['C'])
Check if the column 'C' exists before dropping rows based on it.
The DataFrame has columns 'A' and 'B' only. Trying to drop rows with missing values in column 'C' raises a KeyError because 'C' does not exist.
After cleaning and grouping the data, what does the bar plot show?
import pandas as pd import matplotlib.pyplot as plt data = {'Category': ['X', 'Y', 'X', 'Z', None], 'Value': [10, 20, None, 15, 5]} df = pd.DataFrame(data) cleaned = df.dropna(subset=['Category']).fillna({'Value': 0}) grouped = cleaned.groupby('Category').sum() ax = grouped.plot(kind='bar', legend=False) plt.ylabel('Sum of Values') plt.title('Sum of Values by Category') plt.close() # Prevent actual plot display in test print(grouped)
Look at how missing values in 'Value' are replaced and how grouping sums values.
Rows with missing 'Category' are dropped. Missing 'Value' is replaced with 0. Grouping by 'Category' sums the 'Value' column. For 'X', values are 10 and NaN replaced by 0, sum 10. For 'Y', value 20. For 'Z', value 15.
You have a DataFrame with columns 'ID', 'Score', and 'Grade'. You want to:
- Remove rows where 'ID' is missing.
- Fill missing 'Score' with the median score.
- Replace missing 'Grade' with 'Incomplete'.
- Drop rows where 'Score' is less than 50 after filling.
Which code snippet correctly performs all these steps?
Think about the order of operations: drop missing IDs first, then fill missing values, then filter by score.
Option C first removes rows missing 'ID', then fills missing 'Score' and 'Grade', then filters rows with 'Score' >= 50. Other options either fill before dropping missing IDs or filter before filling, which can cause incorrect results.