0
0
Data Analysis Pythondata~20 mins

Why exploratory inspection guides analysis in Data Analysis Python - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Exploratory Inspection Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why is exploratory data inspection important before analysis?

Which of the following best explains why we perform exploratory data inspection before starting formal analysis?

ATo finalize conclusions without looking at the data distribution.
BTo understand data quality, spot missing values, and detect unusual patterns that affect analysis.
CTo immediately apply machine learning models without checking data.
DTo skip data cleaning and jump directly to visualization.
Attempts:
2 left
💡 Hint

Think about what problems might exist in raw data that could mislead your analysis.

data_output
intermediate
2:00remaining
Output of basic exploratory commands on a dataset

Given the dataset below, what is the output of df.describe()?

import pandas as pd
data = {'age': [25, 30, 22, 40, 28], 'income': [50000, 60000, 45000, 80000, 52000]}
df = pd.DataFrame(data)
print(df.describe())
A
       age        income
count   5.0      5.000000
mean   29.0  57400.000000
std     7.0  13629.969289
min    22.0  45000.000000
25%    25.0  50000.000000
50%    28.0  52000.000000
75%    30.0  60000.000000
max    40.0  80000.000000
B
       age        income
count   4.0      4.000000
mean   29.0  57400.000000
std     7.0  13629.969289
min    22.0  45000.000000
25%    25.0  50000.000000
50%    28.0  52000.000000
75%    30.0  60000.000000
max    40.0  80000.000000
C
       age        income
count   5.0      5.000000
mean   30.0  57400.000000
std     7.0  13629.969289
min    22.0  45000.000000
25%    25.0  50000.000000
50%    28.0  52000.000000
75%    30.0  60000.000000
max    40.0  80000.000000
D
       age        income
count   5.0      5.000000
mean   29.0  57000.000000
std     7.0  13629.969289
min    22.0  45000.000000
25%    25.0  50000.000000
50%    28.0  52000.000000
75%    30.0  60000.000000
max    40.0  80000.000000
Attempts:
2 left
💡 Hint

Check the count and mean values carefully for both columns.

visualization
advanced
2:00remaining
Identifying outliers with boxplot visualization

Which boxplot below correctly shows an outlier in the dataset [10, 12, 12, 13, 14, 15, 100]?

Data Analysis Python
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

data = [10, 12, 12, 13, 14, 15, 100]
df = pd.DataFrame({'values': data})
sns.boxplot(x='values', data=df)
plt.show()
ABoxplot with whiskers extending beyond 100.
BBoxplot showing multiple outliers below 10.
CBoxplot with a single point far above the upper whisker at 100, indicating an outlier.
DBoxplot with no points outside whiskers, all data within range 10 to 100.
Attempts:
2 left
💡 Hint

Outliers appear as points outside the whiskers in a boxplot.

🔧 Debug
advanced
2:00remaining
Error in inspecting missing data with pandas

What error does the following code produce?

import pandas as pd
data = {'A': [1, 2, None], 'B': [4, None, 6]}
df = pd.DataFrame(data)
print(df.isnull().sum())
print(df.missing())
ANo error, prints counts of missing values
BTypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
CKeyError: 'missing'
DAttributeError: 'DataFrame' object has no attribute 'missing'
Attempts:
2 left
💡 Hint

Check if missing() is a valid pandas DataFrame method.

🚀 Application
expert
3:00remaining
Choosing next steps after exploratory inspection

You have a dataset with many missing values in some columns and a few extreme outliers in others. After exploratory inspection, what is the best next step?

ARemove or impute missing values and consider transforming or removing outliers before analysis.
BIgnore missing values and outliers and proceed with analysis as is.
COnly visualize data without cleaning or transformation.
DDelete entire dataset and start over with new data.
Attempts:
2 left
💡 Hint

Think about how missing data and outliers affect analysis results.