Overview - Resampling time series data

What is it?

Resampling time series data means changing the frequency of your data points. You can make data points less frequent (downsampling) or more frequent (upsampling). This helps to analyze trends over different time scales or fill in missing data. It is done by grouping data by time intervals and applying calculations like sums or averages.

Why it matters

Without resampling, it is hard to compare data recorded at different time intervals or to see patterns over longer or shorter periods. For example, daily sales data might be noisy, but weekly totals show clearer trends. Resampling helps clean, summarize, and prepare time data for better decisions and predictions.

Where it fits

Before learning resampling, you should understand basic time series data and how to work with dates and times in pandas. After mastering resampling, you can explore time series forecasting, rolling windows, and advanced time-based feature engineering.

Mental Model

Core Idea

Resampling groups time series data into new time intervals and summarizes or fills data to change its frequency.

Think of it like...

Imagine you have a photo album with pictures taken every day. Resampling is like making a new album with one picture per week by choosing the best photo or combining them, or making a new album with hourly snapshots by guessing what happened between pictures.

Time Series Data
┌─────────────┬─────────────┬─────────────┐
│ 2024-01-01 │ 2024-01-02 │ 2024-01-03 │ ... Original daily data
├─────────────┼─────────────┼─────────────┤
│     10      │     15      │     20      │
└─────────────┴─────────────┴─────────────┘

Resample to Weekly
┌─────────────┬─────────────┐
│ 2024-01-01 │ 2024-01-08 │ ... Weekly sums or averages
├─────────────┼─────────────┤
│     45      │     ...     │
└─────────────┴─────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding time series basics

Concept: Learn what time series data is and how pandas stores dates and times.

Time series data is a sequence of data points recorded at specific times. In pandas, dates and times are stored using the DateTimeIndex or datetime columns. This allows pandas to understand the order and spacing of data points.

Result

You can load data with dates and pandas recognizes the time order.

Understanding how pandas handles dates is essential because resampling depends on knowing the time intervals between data points.

2

FoundationBasic pandas grouping by time

3

IntermediateDownsampling with aggregation

4

IntermediateUpsampling and filling missing data

5

IntermediateCustom aggregation functions

6

AdvancedHandling irregular time series

7

ExpertPerformance and memory considerations

Under the Hood

Pandas resampling works by creating a new time index at the target frequency. It then groups original data points that fall into each new time bin. For downsampling, it applies aggregation functions to combine grouped points. For upsampling, it inserts new time points and fills missing values using specified methods. Internally, pandas uses efficient Cython code to handle grouping and aggregation.

Why designed this way?

Resampling was designed to handle the common need to analyze time series at different scales. Grouping by fixed time bins is intuitive and flexible. Aggregation functions allow summarizing data meaningfully. Filling methods for upsampling address the problem of missing data in higher frequency views. Alternatives like manual looping would be slower and error-prone.

Original Data (daily) ──▶ Group by Time Bins ──▶ Aggregate or Fill ──▶ Resampled Data

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ 2024-01-01 │     │ 2024-01-01  │     │ Aggregate  │     │ 2024-01-01  │
│ 2024-01-02 │ ──▶ │ to 2024-01-07│ ──▶ │ or Fill   │ ──▶ │ to 2024-01-07│
│ 2024-01-03 │     │ (weekly bin) │     │ Function   │     │ (weekly data)│
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does resampling always keep the original number of data points? Commit to yes or no.

Common Belief:Resampling keeps all original data points but just changes their labels.

Tap to reveal reality

Quick: When upsampling, does pandas automatically fill missing values? Commit to yes or no.

Common Belief:Upsampling automatically fills all new time points with meaningful data.

Tap to reveal reality

Quick: Is resampling only useful for regular time series? Commit to yes or no.

Common Belief:Resampling only works well if data points are evenly spaced in time.

Tap to reveal reality

Quick: Does resampling change the original data values? Commit to yes or no.

Common Belief:Resampling changes the original data values directly.

Tap to reveal reality

Expert Zone

1

Resampling with time zones requires careful alignment to avoid shifting data incorrectly.

2

Choosing the right aggregation function affects the meaning of resampled data; sum vs mean can tell very different stories.

3

Upsampling with interpolation methods can introduce artificial trends if not chosen carefully.

When NOT to use

Avoid resampling when your analysis depends on exact original timestamps or when data irregularity is meaningful. Instead, use time-aware models or irregular time series methods that do not require fixed intervals.

Production Patterns

In production, resampling is used to prepare data for machine learning models that require fixed time steps, to generate reports at business-relevant intervals, and to align multiple time series with different frequencies before merging.

Connections

Aggregation functions

Resampling builds on aggregation by applying it over time groups.

Understanding aggregation helps grasp how resampling summarizes data over new time intervals.

Interpolation in numerical analysis

Upsampling uses interpolation methods similar to those in numerical analysis to estimate missing values.

Knowing interpolation techniques improves how you fill missing time points during upsampling.

Signal processing

Resampling in time series is similar to changing sampling rates in signal processing.

Recognizing this connection helps understand the effects of downsampling and upsampling on data quality and information loss.

Common Pitfalls

#1Not specifying an aggregation function when downsampling.

Wrong approach:df.resample('W')

Correct approach:df.resample('W').sum()

Root cause:Pandas requires an aggregation method to combine multiple data points; omitting it causes errors or unexpected results.

#2Upsampling without filling missing values.

Wrong approach:df.resample('H').asfreq()

Correct approach:df.resample('H').ffill()

Root cause:Upsampling creates new time points with NaN values; not filling them leads to missing data in analysis.

#3Applying resampling on non-datetime index.

Wrong approach:df.resample('M').sum() # when index is not datetime

Correct approach:df.set_index('date_column').resample('M').sum()

Root cause:Resampling requires a datetime-like index to group data by time intervals.

Key Takeaways

Resampling changes the frequency of time series data by grouping or expanding time points.

Downsampling reduces data points by aggregating over larger time intervals, while upsampling increases points by inserting new times and filling missing values.

Choosing the right aggregation and filling methods is crucial for meaningful resampled data.

Resampling works best with a datetime index and requires careful handling of irregular data and missing values.

Understanding resampling prepares you to analyze time series data at different scales and prepare it for modeling.