Concept Flow - KDE overlay concept

Start with raw data points

↓

Calculate KDE for dataset 1

↓

Calculate KDE for dataset 2

↓

Plot KDE curves on same graph

↓

Visualize overlapping density

↓

Interpret density peaks and overlaps

We start with raw data, compute KDE for each dataset, then plot them together to see how their densities overlap.

Execution Sample

Matplotlib

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

x1 = np.random.normal(0, 1, 100)
x2 = np.random.normal(2, 1, 100)

kde1 = gaussian_kde(x1)
kde2 = gaussian_kde(x2)

x = np.linspace(-4, 6, 200)
plt.plot(x, kde1(x), label='Dataset 1')
plt.plot(x, kde2(x), label='Dataset 2')
plt.legend()
plt.show()

This code creates two datasets, computes their KDEs, and plots both KDE curves on the same graph to show their density overlap.

Execution Table

Step	Action	Data/Variable	Result/Output
1	Generate dataset 1	x1	100 points from N(0,1)
2	Generate dataset 2	x2	100 points from N(2,1)
3	Compute KDE for dataset 1	kde1 = gaussian_kde(x1)	kde1 is a KDE function
4	Compute KDE for dataset 2	kde2 = gaussian_kde(x2)	kde2 is a KDE function
5	Create x values for plotting	x = np.linspace(-4,6,200)	Array of 200 points from -4 to 6
6	Evaluate kde1 on x	kde1(x)	Density values for dataset 1
7	Evaluate kde2 on x	kde2(x)	Density values for dataset 2
8	Plot kde1 and kde2	plt.plot	Two KDE curves overlayed
9	Show plot	plt.show()	Visual graph with KDE overlays
10	End	-	Execution complete

💡 All steps completed; KDE overlay plot displayed.

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4	After Step 5	After Step 6	After Step 7	Final
x1	None	100 points N(0,1)	100 points N(0,1)	100 points N(0,1)	100 points N(0,1)	100 points N(0,1)	100 points N(0,1)	100 points N(0,1)	100 points N(0,1)
x2	None	None	100 points N(2,1)	100 points N(2,1)	100 points N(2,1)	100 points N(2,1)	100 points N(2,1)	100 points N(2,1)	100 points N(2,1)
kde1	None	None	None	KDE function for x1	KDE function for x1	KDE function for x1	KDE function for x1	KDE function for x1	KDE function for x1
kde2	None	None	None	None	KDE function for x2	KDE function for x2	KDE function for x2	KDE function for x2	KDE function for x2
x	None	None	None	None	None	Array from -4 to 6 (200 pts)	Array from -4 to 6 (200 pts)	Array from -4 to 6 (200 pts)	Array from -4 to 6 (200 pts)
kde1(x)	None	None	None	None	None	None	Density values (dataset 1)	Density values (dataset 1)	Density values (dataset 1)
kde2(x)	None	None	None	None	None	None	None	Density values (dataset 2)	Density values (dataset 2)

Key Moments - 3 Insights

Why do we use the same x values to evaluate both KDEs?

What does the KDE function represent after computation?

Why do the KDE curves overlap instead of being separate?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table at step 5. What does the variable x represent?

AThe KDE function for dataset 1

BThe original dataset 1 points

CAn array of points from -4 to 6 used for plotting KDEs

DThe density values of dataset 2

Concept Snapshot

KDE overlay concept:
- Start with two datasets
- Compute KDE for each using gaussian_kde
- Use same x range to evaluate densities
- Plot both KDE curves on one graph
- Overlapping curves show density similarities
- Useful to compare distributions visually

Full Transcript

This visual execution traces the KDE overlay concept. We begin by generating two datasets from normal distributions. Then, we compute KDE functions for each dataset. Next, we create a common x array to evaluate both KDEs, ensuring they align on the same axis. We evaluate the KDE functions on this x array to get density values. Finally, we plot both KDE curves on the same graph to visualize where their densities overlap. This helps us compare the shape and spread of the two datasets visually. Key moments include understanding why the same x values are used for both KDEs, what the KDE functions represent, and why the curves overlap on the plot. The quiz checks understanding of variable roles and the effect of changing the x range.