Recall & Review
beginner
What is a data distribution in machine learning?
A data distribution shows how data points are spread or arranged across different values. It helps us understand the common, rare, or unusual values in the data.
Click to reveal answer
beginner
What is an outlier in a dataset?
An outlier is a data point that is very different from most other points. It can be much higher or lower than the rest and may affect how models learn.
Click to reveal answer
intermediate
Why is it important to detect outliers before training a model?
Outliers can mislead the model by making it learn wrong patterns. Detecting them helps improve model accuracy and reliability.
Click to reveal answer
beginner
Name two common ways to visualize data distributions.
Histograms and box plots are common ways. Histograms show frequency of values, and box plots show spread and outliers.
Click to reveal answer
intermediate
How can you handle outliers in your data?
You can remove them, transform them, or use models that are less sensitive to outliers. The choice depends on the problem and data.
Click to reveal answer
What does a data distribution tell us?
Which of these is an example of an outlier?
Which visualization is best to spot outliers?
Why might outliers be a problem for machine learning models?
What is one way to handle outliers?
Explain what a data distribution is and why it matters in machine learning.
Describe what outliers are and how they can affect machine learning models.