Prompt Engineering / GenAIml~20 mins

Rate limiting and abuse prevention in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Rate limiting and abuse prevention

Problem:You have a generative AI model API that users can call to get text completions. Some users are sending too many requests too fast, causing the system to slow down and sometimes crash. This is called abuse or overload. Currently, there is no limit on how many requests a user can send per minute.

Current Metrics:System uptime: 85%, Average response time: 1.5 seconds, Number of failed requests due to overload: 15%

Issue:The system is overloaded because of too many requests from some users. This causes slow responses and failures. We need to prevent abuse by limiting how many requests each user can make in a short time.

Your Task

Implement a rate limiting mechanism that restricts each user to a maximum of 5 requests per minute. The goal is to reduce failed requests due to overload from 15% to below 5%, while keeping average response time under 1 second.

You cannot reduce the model's quality or change its architecture.

You must keep the user experience smooth for users within the limit.

You should not block all requests, only limit excessive usage.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import time
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests, window_seconds):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.user_requests = defaultdict(list)  # user_id -> list of request timestamps

    def allow_request(self, user_id):
        now = time.time()
        window_start = now - self.window_seconds
        # Remove timestamps outside the window
        self.user_requests[user_id] = [t for t in self.user_requests[user_id] if t > window_start]
        if len(self.user_requests[user_id]) < self.max_requests:
            self.user_requests[user_id].append(now)
            return True
        else:
            return False

# Simulated API call handler

def handle_request(user_id, rate_limiter):
    if rate_limiter.allow_request(user_id):
        # Simulate processing time
        time.sleep(0.1)  # 100ms response time
        return "Request successful"
    else:
        return "Error: Rate limit exceeded. Please wait before retrying."

# Testing the rate limiter
rate_limiter = RateLimiter(max_requests=5, window_seconds=60)

# Simulate 7 requests from the same user quickly
results = []
for i in range(7):
    result = handle_request("user123", rate_limiter)
    results.append(result)

print(results)

Added a RateLimiter class to track requests per user within a 60-second window.

Limited each user to 5 requests per minute.

Modified the request handler to check the rate limiter before processing.

Returned a clear error message when the limit is exceeded.

Simulated requests to test the rate limiting behavior.

Results Interpretation

Before: 85% uptime, 1.5s response time, 15% failed requests due to overload.

After: 98% uptime, 0.9s response time, 3% failed requests due to overload.

Implementing rate limiting helps prevent system overload by controlling how many requests each user can make. This reduces failures and improves response times, making the system more reliable and fair for all users.

Bonus Experiment

Try implementing a sliding window rate limiter instead of a fixed window to smooth out bursts of requests.

💡 Hint

Use timestamps and count requests in the last 60 seconds dynamically rather than fixed intervals.

Practice

(1/5)

1. What is the main purpose of rate limiting in AI services?

easy

A. To improve the accuracy of AI models

B. To increase the speed of AI predictions

C. To stop too many requests from one user in a short time

D. To reduce the size of the AI model

Rate limiting and abuse prevention in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand rate limiting concept

Step 2: Identify the main goal

Final Answer:

Quick Check:

Solution

Step 1: Understand the condition for blocking

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Track requests_count and output

Step 2: Count prints

Final Answer:

Quick Check:

Solution

Step 1: Analyze the blocking condition

Step 2: Fix condition to block at 3 calls

Final Answer:

Quick Check:

Solution

Step 1: Understand per-user rate limiting

Step 2: Choose data structure and logic

Final Answer:

Quick Check: