How to Do RFM Analysis in Python: Simple Steps and Example
To do
RFM analysis in Python, calculate Recency, Frequency, and Monetary values from your customer data using pandas. Then, score each metric and combine them to segment customers based on their purchase behavior.Syntax
RFM analysis involves these steps:
- Recency: Days since last purchase.
- Frequency: Number of purchases in a period.
- Monetary: Total money spent.
- Use
pandasto group data by customer and calculate these metrics. - Assign scores (e.g., 1-5) to each metric based on quantiles.
- Combine scores to create an RFM segment.
python
import pandas as pd def calculate_rfm(df, customer_id_col, date_col, amount_col, current_date): # Calculate Recency recency_df = df.groupby(customer_id_col)[date_col].max().reset_index() recency_df['Recency'] = (current_date - recency_df[date_col]).dt.days # Calculate Frequency frequency_df = df.groupby(customer_id_col)[date_col].count().reset_index() frequency_df.columns = [customer_id_col, 'Frequency'] # Calculate Monetary monetary_df = df.groupby(customer_id_col)[amount_col].sum().reset_index() monetary_df.columns = [customer_id_col, 'Monetary'] # Merge all rfm = recency_df.merge(frequency_df, on=customer_id_col).merge(monetary_df, on=customer_id_col) return rfm
Example
This example shows how to calculate RFM metrics and assign scores to segment customers.
python
import pandas as pd from datetime import datetime # Sample data data = { 'CustomerID': [1, 2, 1, 3, 2, 1], 'OrderDate': [ '2024-05-01', '2024-05-03', '2024-05-10', '2024-04-25', '2024-05-15', '2024-05-20' ], 'Amount': [100, 200, 150, 300, 250, 100] } # Create DataFrame df = pd.DataFrame(data) df['OrderDate'] = pd.to_datetime(df['OrderDate']) # Current date for recency calculation current_date = datetime(2024, 5, 21) # Calculate RFM rfm = calculate_rfm(df, 'CustomerID', 'OrderDate', 'Amount', current_date) # Assign scores 1-5 based on quantiles rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1]) rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5]) rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5]) # Combine scores rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str) rfm['RFM_Score'] = rfm[['R_Score', 'F_Score', 'M_Score']].astype(int).sum(axis=1) print(rfm)
Output
CustomerID OrderDate Recency Frequency Monetary R_Score F_Score M_Score RFM_Segment RFM_Score
0 1 2024-05-20 1 3 350 5 3 5 535 13
1 2 2024-05-15 6 2 450 4 2 5 425 11
2 3 2024-04-25 26 1 300 1 1 4 114 6
Common Pitfalls
- Not converting dates to
datetimetype causes errors in recency calculation. - Using raw counts for frequency without grouping by customer.
- Assigning scores incorrectly by not using quantiles or ranking.
- Mixing up score directions: lower recency means more recent, so higher score.
python
import pandas as pd # Wrong: recency calculated without datetime conversion # df['OrderDate'] is string, so subtraction fails # Right way: df['OrderDate'] = pd.to_datetime(df['OrderDate']) # Wrong: frequency calculated without grouping # frequency = df['OrderDate'].count() # This counts all orders, not per customer # Right way: frequency = df.groupby('CustomerID')['OrderDate'].count()
Quick Reference
- Recency: Days since last purchase (lower is better).
- Frequency: Number of purchases (higher is better).
- Monetary: Total spent (higher is better).
- Use
pandas.qcutto assign scores 1-5 based on quantiles. - Combine scores as strings or sum for segmentation.
Key Takeaways
Convert date columns to datetime before calculating recency.
Group data by customer to calculate frequency and monetary values correctly.
Use quantiles with pandas.qcut to assign RFM scores fairly.
Remember lower recency means more recent, so assign higher scores accordingly.
Combine RFM scores to segment customers for targeted marketing.