0
0
SciPydata~15 mins

Peak finding (find_peaks) in SciPy - Deep Dive

Choose your learning style9 modes available
Overview - Peak finding (find_peaks)
What is it?
Peak finding is the process of identifying points in data where values reach a local maximum, called peaks. The scipy library provides a function called find_peaks that helps locate these peaks in one-dimensional data arrays. This is useful for analyzing signals, detecting important features, or summarizing data patterns. Peaks represent points that stand out compared to their neighbors.
Why it matters
Without peak finding, it would be hard to automatically detect important events or features in data like heartbeats in ECG signals, sales spikes in business data, or sound beats in audio. Manually finding peaks is slow and error-prone. Peak finding automates this, enabling faster, more accurate analysis and decision-making in science, engineering, and business.
Where it fits
Before learning peak finding, you should understand basic Python programming and how to work with arrays or lists of numbers. Familiarity with numpy arrays helps. After mastering peak finding, you can explore signal processing techniques, feature extraction, and time series analysis to build more advanced data science skills.
Mental Model
Core Idea
Peak finding locates points in data that are higher than their neighbors, revealing important local maxima.
Think of it like...
Imagine walking along a mountain trail and noting every hilltop you reach that is higher than the points just before and after you. Each hilltop is like a peak in your data.
Data array:  1  3  7  6  4  5  8  7  6  2
Peaks:           ▲        ▲        ▲
Build-Up - 7 Steps
1
FoundationUnderstanding local maxima in data
🤔
Concept: Introduce what a peak or local maximum means in a sequence of numbers.
A local maximum is a point in a list where the value is greater than the values immediately before and after it. For example, in the list [1, 3, 2], the number 3 is a local maximum because it is higher than 1 and 2.
Result
You can identify simple peaks by comparing each number to its neighbors.
Understanding local maxima is the foundation for detecting peaks automatically in data.
2
FoundationWorking with numpy arrays
🤔
Concept: Learn how to store and manipulate data using numpy arrays, which are needed for find_peaks.
Numpy arrays are like lists but optimized for numbers and math operations. You can create one with np.array([1, 3, 2, 5]). They allow fast processing and easy slicing to compare neighbors.
Result
You can efficiently access and analyze data points and their neighbors.
Using numpy arrays enables fast and simple peak detection with scipy.
3
IntermediateBasic usage of scipy find_peaks
🤔Before reading on: do you think find_peaks returns the peak values or their positions? Commit to your answer.
Concept: Learn how to call find_peaks to get the indices of peaks in data.
Import find_peaks from scipy.signal. Pass your data array to find_peaks. It returns the indices where peaks occur. For example: import numpy as np from scipy.signal import find_peaks data = np.array([1, 3, 2, 5, 4]) peaks, _ = find_peaks(data) print(peaks) # Output: [1 3] These indices correspond to values 3 and 5, which are peaks.
Result
You get the positions of peaks in your data array.
Knowing that find_peaks returns indices lets you easily extract peak values or analyze their locations.
4
IntermediateUsing parameters to refine peak detection
🤔Before reading on: do you think setting a minimum height filters out smaller peaks or larger peaks? Commit to your answer.
Concept: Explore parameters like height, distance, and prominence to control which peaks are detected.
find_peaks accepts options to filter peaks: - height: minimum height of peaks - distance: minimum number of samples between peaks - prominence: how much a peak stands out from neighbors Example: peaks, props = find_peaks(data, height=4, distance=2) This finds peaks taller than 4 and at least 2 samples apart.
Result
You detect only peaks that meet your criteria, ignoring noise or small bumps.
Using parameters helps tailor peak detection to your specific data and goals.
5
IntermediateInterpreting peak properties output
🤔Before reading on: do you think the 'prominence' property measures peak height or how distinct the peak is? Commit to your answer.
Concept: Understand the extra information find_peaks can return about each peak.
Besides indices, find_peaks can return a dictionary with properties like: - 'prominences': how much a peak stands out - 'widths': width of the peak at half prominence - 'left_ips' and 'right_ips': positions of peak bases Example: peaks, props = find_peaks(data, prominence=1) print(props['prominences']) This helps analyze peak shape and importance.
Result
You gain detailed insights about each peak beyond just location.
Knowing peak properties allows deeper analysis and better filtering.
6
AdvancedHandling noisy data with peak finding
🤔Before reading on: do you think smoothing data before peak finding always improves results? Commit to your answer.
Concept: Learn strategies to find peaks in noisy or complex data.
Real data often has noise causing false peaks. Techniques include: - Smoothing data with filters before find_peaks - Using prominence and width parameters to ignore small noise peaks - Combining peak finding with thresholding Example: from scipy.ndimage import gaussian_filter1d data_smooth = gaussian_filter1d(data, sigma=2) peaks, _ = find_peaks(data_smooth, prominence=1) This reduces noise impact.
Result
You detect meaningful peaks even in noisy signals.
Understanding noise handling prevents false detections and improves reliability.
7
ExpertCustom peak detection with callbacks and masks
🤔Before reading on: do you think find_peaks can detect peaks based on custom rules beyond built-in parameters? Commit to your answer.
Concept: Explore advanced customization by combining find_peaks with masks or custom logic.
While find_peaks has many parameters, sometimes you need custom peak criteria: - Use boolean masks to limit search to certain data regions - Post-process peaks with custom filters - Combine with other signal processing methods Example: data = np.array([...]) mask = (data > threshold) peaks, _ = find_peaks(data) filtered_peaks = [p for p in peaks if mask[p]] This approach adapts peak detection to complex needs.
Result
You can tailor peak finding to very specific or unusual data patterns.
Knowing how to combine find_peaks with custom logic unlocks flexible, powerful analysis.
Under the Hood
find_peaks works by scanning the data array and comparing each point to its neighbors to find local maxima. It applies optional filters like minimum height or distance by checking conditions on these points. Internally, it uses efficient numpy operations to avoid slow loops. For prominence, it calculates how much a peak stands out by measuring the vertical drop to the lowest contour line connecting it to higher peaks.
Why designed this way?
The function was designed to be fast and flexible for many signal types. Using numpy vectorized operations ensures speed on large data. Parameters like prominence and width were added to handle real-world noisy signals where simple maxima are not enough. Alternatives like manual loops are slower and error-prone, so this design balances performance and usability.
Data array: ┌───────────────┐
             │ 1 3 7 6 4 5 8 7 6 2 │
             └───────────────┘

Process:    ┌───────────────┐
            │ Compare neighbors │
            └───────────────┘
                   ↓
            ┌───────────────┐
            │ Identify local maxima │
            └───────────────┘
                   ↓
            ┌───────────────┐
            │ Apply filters (height, distance, prominence) │
            └───────────────┘
                   ↓
            ┌───────────────┐
            │ Return peak indices and properties │
            └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does find_peaks return the peak values or their indices? Commit to your answer.
Common Belief:find_peaks returns the actual peak values in the data.
Tap to reveal reality
Reality:find_peaks returns the indices (positions) of peaks, not the values themselves.
Why it matters:Confusing indices with values can cause errors when interpreting results or plotting peaks.
Quick: Do you think find_peaks detects all peaks regardless of noise? Commit to yes or no.
Common Belief:find_peaks always finds every peak in the data, no matter how small or noisy.
Tap to reveal reality
Reality:find_peaks detects peaks based on parameters; without tuning, it may find false peaks caused by noise or miss subtle peaks.
Why it matters:Assuming all peaks are found can lead to wrong conclusions or noisy analysis.
Quick: Does increasing the 'distance' parameter allow peaks closer together or farther apart? Commit to your answer.
Common Belief:Setting a larger distance means peaks can be closer together.
Tap to reveal reality
Reality:A larger distance parameter forces peaks to be farther apart, filtering out peaks that are too close.
Why it matters:Misunderstanding distance can cause missing important peaks or detecting too many.
Quick: Does prominence measure peak height or how much a peak stands out? Commit to your answer.
Common Belief:Prominence is just the height of the peak.
Tap to reveal reality
Reality:Prominence measures how much a peak stands out relative to its surroundings, not just its height.
Why it matters:Ignoring prominence can cause selecting peaks that are tall but not distinct, leading to poor feature detection.
Expert Zone
1
Prominence calculation depends on the shape of the signal and can differ from simple height, which is crucial for noisy or overlapping peaks.
2
The distance parameter works on sample indices, so understanding your data's sampling rate is key to setting it correctly.
3
find_peaks does not smooth data internally; preprocessing like filtering is often necessary for reliable peak detection.
When NOT to use
find_peaks is not suitable for multi-dimensional data or signals with complex peak shapes that require model fitting. In such cases, use specialized methods like wavelet transforms, machine learning classifiers, or peak fitting algorithms.
Production Patterns
In real-world systems, find_peaks is often combined with preprocessing steps like smoothing or baseline correction. It is used in pipelines for ECG analysis, vibration monitoring, and chromatogram peak detection, often followed by thresholding and feature extraction for classification or anomaly detection.
Connections
Signal smoothing
Builds-on
Understanding smoothing helps improve peak detection by reducing noise that causes false peaks.
Local maxima in calculus
Same pattern
Peak finding in data is a discrete version of finding local maxima in continuous functions, linking data science to math.
Human sensory perception
Analogous process
Just like our brain detects peaks in sound or light intensity to recognize patterns, peak finding algorithms mimic this process in data.
Common Pitfalls
#1Detecting too many false peaks due to noise.
Wrong approach:peaks, _ = find_peaks(data)
Correct approach:peaks, _ = find_peaks(data, prominence=1)
Root cause:Not using parameters like prominence to filter out noise-induced small peaks.
#2Confusing peak indices with peak values.
Wrong approach:peak_values = find_peaks(data)
Correct approach:peak_indices, _ = find_peaks(data) peak_values = data[peak_indices]
Root cause:Misunderstanding that find_peaks returns positions, not values.
#3Setting distance parameter too small, causing overlapping peaks.
Wrong approach:peaks, _ = find_peaks(data, distance=1)
Correct approach:peaks, _ = find_peaks(data, distance=5)
Root cause:Not considering the sampling rate and spacing needed between peaks.
Key Takeaways
Peak finding identifies local maxima in data, which are points higher than their neighbors.
scipy's find_peaks returns indices of peaks and can provide detailed properties like prominence and width.
Parameters like height, distance, and prominence help filter and refine peak detection to suit your data.
Handling noise with smoothing or parameter tuning is essential for reliable peak detection.
Understanding the difference between peak indices and values prevents common mistakes in analysis.