Overview - Peak finding (find_peaks)

What is it?

Peak finding is the process of identifying points in data where values reach a local maximum, called peaks. The scipy library provides a function called find_peaks that helps locate these peaks in one-dimensional data arrays. This is useful for analyzing signals, detecting important features, or summarizing data patterns. Peaks represent points that stand out compared to their neighbors.

Why it matters

Without peak finding, it would be hard to automatically detect important events or features in data like heartbeats in ECG signals, sales spikes in business data, or sound beats in audio. Manually finding peaks is slow and error-prone. Peak finding automates this, enabling faster, more accurate analysis and decision-making in science, engineering, and business.

Where it fits

Before learning peak finding, you should understand basic Python programming and how to work with arrays or lists of numbers. Familiarity with numpy arrays helps. After mastering peak finding, you can explore signal processing techniques, feature extraction, and time series analysis to build more advanced data science skills.

Mental Model

Core Idea

Peak finding locates points in data that are higher than their neighbors, revealing important local maxima.

Think of it like...

Imagine walking along a mountain trail and noting every hilltop you reach that is higher than the points just before and after you. Each hilltop is like a peak in your data.

Data array:  1  3  7  6  4  5  8  7  6  2
Peaks:           ▲        ▲        ▲

Build-Up - 7 Steps

1

FoundationUnderstanding local maxima in data

Concept: Introduce what a peak or local maximum means in a sequence of numbers.

A local maximum is a point in a list where the value is greater than the values immediately before and after it. For example, in the list [1, 3, 2], the number 3 is a local maximum because it is higher than 1 and 2.

Result

You can identify simple peaks by comparing each number to its neighbors.

Understanding local maxima is the foundation for detecting peaks automatically in data.

2

FoundationWorking with numpy arrays

3

IntermediateBasic usage of scipy find_peaks

4

IntermediateUsing parameters to refine peak detection

5

IntermediateInterpreting peak properties output

6

AdvancedHandling noisy data with peak finding

7

ExpertCustom peak detection with callbacks and masks

Under the Hood

find_peaks works by scanning the data array and comparing each point to its neighbors to find local maxima. It applies optional filters like minimum height or distance by checking conditions on these points. Internally, it uses efficient numpy operations to avoid slow loops. For prominence, it calculates how much a peak stands out by measuring the vertical drop to the lowest contour line connecting it to higher peaks.

Why designed this way?

The function was designed to be fast and flexible for many signal types. Using numpy vectorized operations ensures speed on large data. Parameters like prominence and width were added to handle real-world noisy signals where simple maxima are not enough. Alternatives like manual loops are slower and error-prone, so this design balances performance and usability.

Data array: ┌───────────────┐
             │ 1 3 7 6 4 5 8 7 6 2 │
             └───────────────┘

Process:    ┌───────────────┐
            │ Compare neighbors │
            └───────────────┘
                   ↓
            ┌───────────────┐
            │ Identify local maxima │
            └───────────────┘
                   ↓
            ┌───────────────┐
            │ Apply filters (height, distance, prominence) │
            └───────────────┘
                   ↓
            ┌───────────────┐
            │ Return peak indices and properties │
            └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does find_peaks return the peak values or their indices? Commit to your answer.

Common Belief:find_peaks returns the actual peak values in the data.

Tap to reveal reality

Quick: Do you think find_peaks detects all peaks regardless of noise? Commit to yes or no.

Common Belief:find_peaks always finds every peak in the data, no matter how small or noisy.

Tap to reveal reality

Quick: Does increasing the 'distance' parameter allow peaks closer together or farther apart? Commit to your answer.

Common Belief:Setting a larger distance means peaks can be closer together.

Tap to reveal reality

Quick: Does prominence measure peak height or how much a peak stands out? Commit to your answer.

Common Belief:Prominence is just the height of the peak.

Tap to reveal reality

Expert Zone

1

Prominence calculation depends on the shape of the signal and can differ from simple height, which is crucial for noisy or overlapping peaks.

2

The distance parameter works on sample indices, so understanding your data's sampling rate is key to setting it correctly.

3

find_peaks does not smooth data internally; preprocessing like filtering is often necessary for reliable peak detection.

When NOT to use

find_peaks is not suitable for multi-dimensional data or signals with complex peak shapes that require model fitting. In such cases, use specialized methods like wavelet transforms, machine learning classifiers, or peak fitting algorithms.

Production Patterns

In real-world systems, find_peaks is often combined with preprocessing steps like smoothing or baseline correction. It is used in pipelines for ECG analysis, vibration monitoring, and chromatogram peak detection, often followed by thresholding and feature extraction for classification or anomaly detection.

Connections

Signal smoothing

Builds-on

Understanding smoothing helps improve peak detection by reducing noise that causes false peaks.

Local maxima in calculus

Same pattern

Peak finding in data is a discrete version of finding local maxima in continuous functions, linking data science to math.

Human sensory perception

Analogous process

Just like our brain detects peaks in sound or light intensity to recognize patterns, peak finding algorithms mimic this process in data.

Common Pitfalls

#1Detecting too many false peaks due to noise.

Wrong approach:peaks, _ = find_peaks(data)

Correct approach:peaks, _ = find_peaks(data, prominence=1)

Root cause:Not using parameters like prominence to filter out noise-induced small peaks.

#2Confusing peak indices with peak values.

Wrong approach:peak_values = find_peaks(data)

Correct approach:peak_indices, _ = find_peaks(data) peak_values = data[peak_indices]

Root cause:Misunderstanding that find_peaks returns positions, not values.

#3Setting distance parameter too small, causing overlapping peaks.

Wrong approach:peaks, _ = find_peaks(data, distance=1)

Correct approach:peaks, _ = find_peaks(data, distance=5)

Root cause:Not considering the sampling rate and spacing needed between peaks.

Key Takeaways

Peak finding identifies local maxima in data, which are points higher than their neighbors.

scipy's find_peaks returns indices of peaks and can provide detailed properties like prominence and width.

Parameters like height, distance, and prominence help filter and refine peak detection to suit your data.

Handling noise with smoothing or parameter tuning is essential for reliable peak detection.

Understanding the difference between peak indices and values prevents common mistakes in analysis.