0
0
ML Pythonprogramming~3 mins

Why Data distributions and outliers in ML Python? - Purpose & Use Cases

Choose your learning style9 modes available
The Big Idea

What if you could instantly spot the hidden surprises in your data without endless searching?

The Scenario

Imagine you have a big list of numbers showing daily temperatures for a year. You want to understand the usual weather and spot any strange days that were way hotter or colder.

The Problem

Trying to find these unusual days by looking at each number one by one is slow and easy to mess up. You might miss some strange days or think normal days are strange because you don't see the whole picture.

The Solution

By learning about data distributions and outliers, you can quickly see the overall pattern of your data and automatically find those unusual days. This helps you understand your data better and avoid mistakes.

Before vs After
Before
for temp in temps:
    if temp > 100 or temp < 0:
        print('Unusual temperature:', temp)
After
mean = sum(temps)/len(temps)
std = (sum((x - mean)**2 for x in temps)/len(temps))**0.5
outliers = [x for x in temps if abs(x - mean) > 2*std]
What It Enables

It lets you quickly understand your data's normal range and spot unusual points that might need special attention.

Real Life Example

Doctors use this to find unusual heart rates or blood test results that could mean a patient needs extra care.

Key Takeaways

Manual checks miss the big picture and are slow.

Data distributions show the normal pattern in data.

Outliers highlight unusual data points automatically.