Consider a dataset with missing values. What is the output of the mean calculation without cleaning?
import pandas as pd data = {'score': [10, 20, None, 40, 50]} df = pd.DataFrame(data) mean_score = df['score'].mean() print(mean_score)
Think about how pandas handles missing values in calculations.
Pandas automatically ignores None or NaN values when calculating the mean, so the mean is calculated over the available numbers only.
Given a DataFrame with duplicate rows, what is the number of unique rows after removing duplicates?
import pandas as pd data = {'id': [1, 2, 2, 3, 4, 4, 4], 'value': [10, 20, 20, 30, 40, 40, 40]} df = pd.DataFrame(data) unique_rows = df.drop_duplicates().shape[0] print(unique_rows)
Removing duplicates keeps only one instance of each repeated row.
Rows with ids 2 and 4 are duplicated. After dropping duplicates, only unique rows remain, totaling 4.
Which statement best explains why systematic data cleaning is important before analysis?
Think about the role of data quality in making good decisions.
Systematic cleaning finds and fixes problems like missing or wrong data, so the analysis results are trustworthy.
What is the output DataFrame after cleaning missing values and filtering scores above 25?
import pandas as pd data = {'name': ['Anna', 'Bob', 'Cara', 'Dan'], 'score': [20, None, 30, 40]} df = pd.DataFrame(data) df_clean = df.dropna() df_filtered = df_clean[df_clean['score'] > 25] print(df_filtered)
First remove rows with missing scores, then keep only scores above 25.
Row with Bob is removed due to missing score. Anna's score is 20, below 25, so only Cara and Dan remain.
Given this dataset, what is the average score per group if missing values are NOT cleaned?
import pandas as pd import numpy as np data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'], 'score': [10, np.nan, 20, 30, np.nan, 50]} df = pd.DataFrame(data) grouped_mean = df.groupby('group')['score'].mean() print(grouped_mean)
Remember how pandas handles missing values in group calculations.
Pandas ignores missing values in mean calculations per group. Group A has one valid score 10, group B has 20 and 30 averaging 25, group C has one valid score 50.