0
0
DSA C++programming~3 mins

Why Median of Data Stream Using Two Heaps in DSA C++?

Choose your learning style9 modes available
The Big Idea

What if you could instantly know the middle value of a never-ending list without sorting it all over and over?

The Scenario

Imagine you are tracking the middle value of a long list of numbers that keep coming in one by one, like scores in a game or temperatures throughout the day.

You try to write down all numbers and sort them every time a new number arrives to find the middle one.

The Problem

Sorting the entire list every time a new number arrives is very slow and tiring, especially if the list grows very large.

It also wastes a lot of time and effort, making it hard to get the middle value quickly.

The Solution

Using two heaps (special trees that keep numbers in order), one to keep the smaller half and one for the larger half, lets you quickly find the middle number without sorting everything again.

This way, you only adjust a few numbers each time, making the process fast and easy.

Before vs After
Before
vector<int> numbers;
// On new number:
numbers.push_back(new_number);
sort(numbers.begin(), numbers.end());
int median = numbers[numbers.size() / 2];
After
priority_queue<int> maxHeap; // smaller half
priority_queue<int, vector<int>, greater<int>> minHeap; // larger half
// On new number:
if (maxHeap.empty() || new_number <= maxHeap.top()) maxHeap.push(new_number);
else minHeap.push(new_number);
// Balance heaps
if (maxHeap.size() > minHeap.size() + 1) {
    minHeap.push(maxHeap.top());
    maxHeap.pop();
} else if (minHeap.size() > maxHeap.size()) {
    maxHeap.push(minHeap.top());
    minHeap.pop();
}
// Median is top of maxHeap or average of tops
What It Enables

This method makes it possible to find the middle value instantly as new numbers keep coming in, no matter how many numbers there are.

Real Life Example

Streaming services use this to show the median rating of a movie as users keep rating it live, without waiting for all ratings to finish.

Key Takeaways

Manually sorting every time is slow and inefficient.

Two heaps keep track of smaller and larger halves efficiently.

Median can be found quickly as data streams in.