Problem Statement
When too many requests hit a service at once, it can slow down or crash, causing poor user experience and downtime. Without control, abusive or accidental traffic spikes can overwhelm resources and break service availability.
This diagram shows a client sending requests through a rate limiter before reaching the service. The rate limiter tracks request counts in a cache to enforce limits.
### Before (no rate limiting) from flask import Flask, request app = Flask(__name__) @app.route('/api') def api(): return 'Success' ### After (with simple token bucket rate limiting) import time from flask import Flask, request, jsonify app = Flask(__name__) RATE_LIMIT = 5 # requests TIME_WINDOW = 10 # seconds clients = {} @app.route('/api') def api(): client_ip = request.remote_addr now = time.time() if client_ip not in clients: clients[client_ip] = {'tokens': RATE_LIMIT, 'last': now} elapsed = now - clients[client_ip]['last'] clients[client_ip]['tokens'] += elapsed * (RATE_LIMIT / TIME_WINDOW) if clients[client_ip]['tokens'] > RATE_LIMIT: clients[client_ip]['tokens'] = RATE_LIMIT clients[client_ip]['last'] = now if clients[client_ip]['tokens'] < 1: return jsonify({'error': 'Rate limit exceeded'}), 429 else: clients[client_ip]['tokens'] -= 1 return 'Success'