Which of the following best explains why we perform exploratory data inspection before starting formal analysis?
Think about what problems might exist in raw data that could mislead your analysis.
Exploratory inspection helps identify data issues like missing values or outliers early. This ensures analysis is based on clean, reliable data.
Given the dataset below, what is the output of df.describe()?
import pandas as pd
data = {'age': [25, 30, 22, 40, 28], 'income': [50000, 60000, 45000, 80000, 52000]}
df = pd.DataFrame(data)
print(df.describe())Check the count and mean values carefully for both columns.
The describe() method shows count, mean, std, min, quartiles, and max for numeric columns. The count is 5 for both columns since there are 5 rows.
Which boxplot below correctly shows an outlier in the dataset [10, 12, 12, 13, 14, 15, 100]?
import matplotlib.pyplot as plt import pandas as pd import seaborn as sns data = [10, 12, 12, 13, 14, 15, 100] df = pd.DataFrame({'values': data}) sns.boxplot(x='values', data=df) plt.show()
Outliers appear as points outside the whiskers in a boxplot.
The value 100 is much larger than the rest and appears as a single outlier point above the whisker in the boxplot.
What error does the following code produce?
import pandas as pd
data = {'A': [1, 2, None], 'B': [4, None, 6]}
df = pd.DataFrame(data)
print(df.isnull().sum())
print(df.missing())Check if missing() is a valid pandas DataFrame method.
There is no missing() method in pandas DataFrame. The correct method to check missing values is isnull().
You have a dataset with many missing values in some columns and a few extreme outliers in others. After exploratory inspection, what is the best next step?
Think about how missing data and outliers affect analysis results.
Cleaning missing data and handling outliers ensures more accurate and reliable analysis results.