0
0
Prompt Engineering / GenAIml~20 mins

Rate limiting and abuse prevention in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Rate limiting and abuse prevention
Problem:You have a generative AI model API that users can call to get text completions. Some users are sending too many requests too fast, causing the system to slow down and sometimes crash. This is called abuse or overload. Currently, there is no limit on how many requests a user can send per minute.
Current Metrics:System uptime: 85%, Average response time: 1.5 seconds, Number of failed requests due to overload: 15%
Issue:The system is overloaded because of too many requests from some users. This causes slow responses and failures. We need to prevent abuse by limiting how many requests each user can make in a short time.
Your Task
Implement a rate limiting mechanism that restricts each user to a maximum of 5 requests per minute. The goal is to reduce failed requests due to overload from 15% to below 5%, while keeping average response time under 1 second.
You cannot reduce the model's quality or change its architecture.
You must keep the user experience smooth for users within the limit.
You should not block all requests, only limit excessive usage.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import time
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests, window_seconds):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.user_requests = defaultdict(list)  # user_id -> list of request timestamps

    def allow_request(self, user_id):
        now = time.time()
        window_start = now - self.window_seconds
        # Remove timestamps outside the window
        self.user_requests[user_id] = [t for t in self.user_requests[user_id] if t > window_start]
        if len(self.user_requests[user_id]) < self.max_requests:
            self.user_requests[user_id].append(now)
            return True
        else:
            return False

# Simulated API call handler

def handle_request(user_id, rate_limiter):
    if rate_limiter.allow_request(user_id):
        # Simulate processing time
        time.sleep(0.1)  # 100ms response time
        return "Request successful"
    else:
        return "Error: Rate limit exceeded. Please wait before retrying."

# Testing the rate limiter
rate_limiter = RateLimiter(max_requests=5, window_seconds=60)

# Simulate 7 requests from the same user quickly
results = []
for i in range(7):
    result = handle_request("user123", rate_limiter)
    results.append(result)

print(results)
Added a RateLimiter class to track requests per user within a 60-second window.
Limited each user to 5 requests per minute.
Modified the request handler to check the rate limiter before processing.
Returned a clear error message when the limit is exceeded.
Simulated requests to test the rate limiting behavior.
Results Interpretation

Before: 85% uptime, 1.5s response time, 15% failed requests due to overload.

After: 98% uptime, 0.9s response time, 3% failed requests due to overload.

Implementing rate limiting helps prevent system overload by controlling how many requests each user can make. This reduces failures and improves response times, making the system more reliable and fair for all users.
Bonus Experiment
Try implementing a sliding window rate limiter instead of a fixed window to smooth out bursts of requests.
💡 Hint
Use timestamps and count requests in the last 60 seconds dynamically rather than fixed intervals.