Recall & Review

beginner

What is a data distribution in machine learning?

A data distribution shows how data points are spread or arranged across different values. It helps us understand the common, rare, or unusual values in the data.

Click to reveal answer

beginner

What is an outlier in a dataset?

An outlier is a data point that is very different from most other points. It can be much higher or lower than the rest and may affect how models learn.

Click to reveal answer

intermediate

Why is it important to detect outliers before training a model?

Outliers can mislead the model by making it learn wrong patterns. Detecting them helps improve model accuracy and reliability.

Click to reveal answer

beginner

Name two common ways to visualize data distributions.

Histograms and box plots are common ways. Histograms show frequency of values, and box plots show spread and outliers.

Click to reveal answer

intermediate

How can you handle outliers in your data?

You can remove them, transform them, or use models that are less sensitive to outliers. The choice depends on the problem and data.

Click to reveal answer

What does a data distribution tell us?

AThe exact number of data points

BThe model's accuracy

COnly the average value

DHow data points are spread across values

Which of these is an example of an outlier?

AA data point close to the average

BA data point far from most others

CA missing data point

DA data point repeated many times

Which visualization is best to spot outliers?

ABox plot

BLine chart

CHistogram

DPie chart

Why might outliers be a problem for machine learning models?

AThey always improve model accuracy

BThey make data easier to understand

CThey can mislead the model to learn wrong patterns

DThey reduce the size of the dataset

What is one way to handle outliers?

ARemove or transform them

BIgnore them always

CAdd more outliers

DReplace them with zeros only

Explain what a data distribution is and why it matters in machine learning.

Describe what outliers are and how they can affect machine learning models.