Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Rate limiting and abuse prevention in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Rate limiting and abuse prevention
Problem:You have a generative AI model API that users can call to get text completions. Some users are sending too many requests too fast, causing the system to slow down and sometimes crash. This is called abuse or overload. Currently, there is no limit on how many requests a user can send per minute.
Current Metrics:System uptime: 85%, Average response time: 1.5 seconds, Number of failed requests due to overload: 15%
Issue:The system is overloaded because of too many requests from some users. This causes slow responses and failures. We need to prevent abuse by limiting how many requests each user can make in a short time.
Your Task
Implement a rate limiting mechanism that restricts each user to a maximum of 5 requests per minute. The goal is to reduce failed requests due to overload from 15% to below 5%, while keeping average response time under 1 second.
You cannot reduce the model's quality or change its architecture.
You must keep the user experience smooth for users within the limit.
You should not block all requests, only limit excessive usage.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import time
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests, window_seconds):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.user_requests = defaultdict(list)  # user_id -> list of request timestamps

    def allow_request(self, user_id):
        now = time.time()
        window_start = now - self.window_seconds
        # Remove timestamps outside the window
        self.user_requests[user_id] = [t for t in self.user_requests[user_id] if t > window_start]
        if len(self.user_requests[user_id]) < self.max_requests:
            self.user_requests[user_id].append(now)
            return True
        else:
            return False

# Simulated API call handler

def handle_request(user_id, rate_limiter):
    if rate_limiter.allow_request(user_id):
        # Simulate processing time
        time.sleep(0.1)  # 100ms response time
        return "Request successful"
    else:
        return "Error: Rate limit exceeded. Please wait before retrying."

# Testing the rate limiter
rate_limiter = RateLimiter(max_requests=5, window_seconds=60)

# Simulate 7 requests from the same user quickly
results = []
for i in range(7):
    result = handle_request("user123", rate_limiter)
    results.append(result)

print(results)
Added a RateLimiter class to track requests per user within a 60-second window.
Limited each user to 5 requests per minute.
Modified the request handler to check the rate limiter before processing.
Returned a clear error message when the limit is exceeded.
Simulated requests to test the rate limiting behavior.
Results Interpretation

Before: 85% uptime, 1.5s response time, 15% failed requests due to overload.

After: 98% uptime, 0.9s response time, 3% failed requests due to overload.

Implementing rate limiting helps prevent system overload by controlling how many requests each user can make. This reduces failures and improves response times, making the system more reliable and fair for all users.
Bonus Experiment
Try implementing a sliding window rate limiter instead of a fixed window to smooth out bursts of requests.
💡 Hint
Use timestamps and count requests in the last 60 seconds dynamically rather than fixed intervals.

Practice

(1/5)
1. What is the main purpose of rate limiting in AI services?
easy
A. To improve the accuracy of AI models
B. To increase the speed of AI predictions
C. To stop too many requests from one user in a short time
D. To reduce the size of the AI model

Solution

  1. Step 1: Understand rate limiting concept

    Rate limiting is designed to control how many requests a user can make in a short period.
  2. Step 2: Identify the main goal

    The goal is to prevent overload and abuse by stopping too many requests quickly.
  3. Final Answer:

    To stop too many requests from one user in a short time -> Option C
  4. Quick Check:

    Rate limiting = stop excess requests [OK]
Hint: Rate limiting controls request frequency to prevent overload [OK]
Common Mistakes:
  • Confusing rate limiting with improving model accuracy
  • Thinking rate limiting speeds up predictions
  • Assuming rate limiting reduces model size
2. Which Python code snippet correctly implements a simple rate limiter that blocks requests after 5 calls?
easy
A. if requests_count >= 5: block_request()
B. if requests_count == 5: allow_request()
C. if requests_count < 5: block_request()
D. if requests_count > 5: block_request()

Solution

  1. Step 1: Understand the condition for blocking

    We want to block requests when the count reaches or exceeds 5, so >= 5 is correct.
  2. Step 2: Check each option

    if requests_count >= 5: block_request() uses '>= 5' to block requests, which matches the requirement.
  3. Final Answer:

    if requests_count >= 5: block_request() -> Option A
  4. Quick Check:

    Block when count is 5 or more = >= 5 [OK]
Hint: Use '>=' to include the limit value when blocking [OK]
Common Mistakes:
  • Using '>' misses blocking exactly at 5
  • Using '<' blocks too early
  • Allowing request at count 5 instead of blocking
3. Given the code below, what will be printed after 7 calls to check_request()?
requests_count = 0
def block_request():
    print('Blocked')
def allow_request():
    print('Allowed')
def check_request():
    global requests_count
    requests_count += 1
    if requests_count >= 5:
        block_request()
    else:
        allow_request()

for _ in range(7):
    check_request()
medium
A. Allowed printed 7 times
B. Blocked printed 5 times, Allowed printed 2 times
C. Allowed printed 5 times, Blocked printed 2 times
D. Allowed printed 4 times, Blocked printed 3 times

Solution

  1. Step 1: Track requests_count and output

    For calls 1 to 4, requests_count is less than 5, so 'Allowed' prints. For calls 5 to 7, requests_count is 5 or more, so 'Blocked' prints.
  2. Step 2: Count prints

    'Allowed' prints 4 times, 'Blocked' prints 3 times.
  3. Final Answer:

    Allowed printed 4 times, Blocked printed 3 times -> Option D
  4. Quick Check:

    4 Allowed + 3 Blocked = 7 calls [OK]
Hint: Count calls before and after limit to find outputs [OK]
Common Mistakes:
  • Counting 'Allowed' as 5 times instead of 4
  • Confusing when blocking starts
  • Ignoring global variable increment
4. The following code is meant to block requests after 2 calls, but it blocks after 3 calls instead. What is the error?
requests_count = 0
def check_request():
    global requests_count
    requests_count += 1
    if requests_count > 3:
        print('Blocked')
    else:
        print('Allowed')
medium
A. The requests_count should start at 1, not 0
B. The condition should be '>= 3' instead of '> 3'
C. The print statements are reversed
D. The global keyword is missing

Solution

  1. Step 1: Analyze the blocking condition

    The code blocks only when requests_count > 3, so blocking starts at 4th call, not 3rd.
  2. Step 2: Fix condition to block at 3 calls

    Changing condition to '>= 3' will block starting at the 3rd call as intended.
  3. Final Answer:

    The condition should be '>= 3' instead of '> 3' -> Option B
  4. Quick Check:

    Block at 3 calls means '>= 3' [OK]
Hint: Use '>=' to include the limit call in blocking [OK]
Common Mistakes:
  • Using '>' blocks too late
  • Starting count at 1 instead of 0 is unnecessary
  • Forgetting global keyword (but it's present here)
5. You want to prevent abuse by limiting users to 10 requests per minute. Which approach best combines rate limiting with user tracking in Python?
hard
A. Use a dictionary to store user IDs with timestamps of their requests, then block if more than 10 in last 60 seconds
B. Reset a global request count every minute without user distinction
C. Block all requests after 10 total requests regardless of user
D. Allow unlimited requests but slow down responses after 10 requests

Solution

  1. Step 1: Understand per-user rate limiting

    To limit requests per user, we must track each user's request times separately.
  2. Step 2: Choose data structure and logic

    A dictionary with user IDs as keys and timestamps as values lets us count requests in the last 60 seconds and block if over 10.
  3. Final Answer:

    Use a dictionary to store user IDs with timestamps of their requests, then block if more than 10 in last 60 seconds -> Option A
  4. Quick Check:

    Per-user tracking + time window = dictionary with timestamps [OK]
Hint: Track each user's timestamps to count requests per minute [OK]
Common Mistakes:
  • Using global count ignores individual users
  • Blocking all users after total requests causes unfair blocking
  • Slowing responses is not strict rate limiting