0
0
Data-analysis-pythonHow-ToBeginner ยท 4 min read

How to Do RFM Analysis in Python: Simple Steps and Example

To do RFM analysis in Python, calculate Recency, Frequency, and Monetary values from your customer data using pandas. Then, score each metric and combine them to segment customers based on their purchase behavior.
๐Ÿ“

Syntax

RFM analysis involves these steps:

  • Recency: Days since last purchase.
  • Frequency: Number of purchases in a period.
  • Monetary: Total money spent.
  • Use pandas to group data by customer and calculate these metrics.
  • Assign scores (e.g., 1-5) to each metric based on quantiles.
  • Combine scores to create an RFM segment.
python
import pandas as pd

def calculate_rfm(df, customer_id_col, date_col, amount_col, current_date):
    # Calculate Recency
    recency_df = df.groupby(customer_id_col)[date_col].max().reset_index()
    recency_df['Recency'] = (current_date - recency_df[date_col]).dt.days

    # Calculate Frequency
    frequency_df = df.groupby(customer_id_col)[date_col].count().reset_index()
    frequency_df.columns = [customer_id_col, 'Frequency']

    # Calculate Monetary
    monetary_df = df.groupby(customer_id_col)[amount_col].sum().reset_index()
    monetary_df.columns = [customer_id_col, 'Monetary']

    # Merge all
    rfm = recency_df.merge(frequency_df, on=customer_id_col).merge(monetary_df, on=customer_id_col)

    return rfm
๐Ÿ’ป

Example

This example shows how to calculate RFM metrics and assign scores to segment customers.

python
import pandas as pd
from datetime import datetime

# Sample data
data = {
    'CustomerID': [1, 2, 1, 3, 2, 1],
    'OrderDate': [
        '2024-05-01', '2024-05-03', '2024-05-10',
        '2024-04-25', '2024-05-15', '2024-05-20'
    ],
    'Amount': [100, 200, 150, 300, 250, 100]
}

# Create DataFrame
df = pd.DataFrame(data)
df['OrderDate'] = pd.to_datetime(df['OrderDate'])

# Current date for recency calculation
current_date = datetime(2024, 5, 21)

# Calculate RFM
rfm = calculate_rfm(df, 'CustomerID', 'OrderDate', 'Amount', current_date)

# Assign scores 1-5 based on quantiles
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])

# Combine scores
rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)
rfm['RFM_Score'] = rfm[['R_Score', 'F_Score', 'M_Score']].astype(int).sum(axis=1)

print(rfm)
Output
CustomerID OrderDate Recency Frequency Monetary R_Score F_Score M_Score RFM_Segment RFM_Score 0 1 2024-05-20 1 3 350 5 3 5 535 13 1 2 2024-05-15 6 2 450 4 2 5 425 11 2 3 2024-04-25 26 1 300 1 1 4 114 6
โš ๏ธ

Common Pitfalls

  • Not converting dates to datetime type causes errors in recency calculation.
  • Using raw counts for frequency without grouping by customer.
  • Assigning scores incorrectly by not using quantiles or ranking.
  • Mixing up score directions: lower recency means more recent, so higher score.
python
import pandas as pd

# Wrong: recency calculated without datetime conversion
# df['OrderDate'] is string, so subtraction fails

# Right way:
df['OrderDate'] = pd.to_datetime(df['OrderDate'])

# Wrong: frequency calculated without grouping
# frequency = df['OrderDate'].count()  # This counts all orders, not per customer

# Right way:
frequency = df.groupby('CustomerID')['OrderDate'].count()
๐Ÿ“Š

Quick Reference

  • Recency: Days since last purchase (lower is better).
  • Frequency: Number of purchases (higher is better).
  • Monetary: Total spent (higher is better).
  • Use pandas.qcut to assign scores 1-5 based on quantiles.
  • Combine scores as strings or sum for segmentation.
โœ…

Key Takeaways

Convert date columns to datetime before calculating recency.
Group data by customer to calculate frequency and monetary values correctly.
Use quantiles with pandas.qcut to assign RFM scores fairly.
Remember lower recency means more recent, so assign higher scores accordingly.
Combine RFM scores to segment customers for targeted marketing.