Bird
Raised Fist0
Prompt Engineering / GenAIml~8 mins

Rate limiting and abuse prevention in Prompt Engineering / GenAI - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Rate limiting and abuse prevention
Which metric matters for Rate limiting and abuse prevention and WHY

For rate limiting and abuse prevention, the key metrics are False Positive Rate and False Negative Rate. False positives mean blocking good users, which hurts user experience. False negatives mean letting bad users abuse the system, which causes harm. Balancing these is critical.

Precision and recall are also important: precision shows how many blocked users were truly abusive, and recall shows how many abusive users were caught. High recall prevents abuse, high precision avoids blocking good users.

Confusion matrix example
    |---------------------------|
    |           | Predicted     |
    | Actual    | Abuse | Good  |
    |-----------|-------|-------|
    | Abuse     | 90    | 10    |
    | Good User | 15    | 885   |
    |---------------------------|

    TP = 90 (abusive users correctly blocked)
    FN = 10 (abusive users missed)
    FP = 15 (good users wrongly blocked)
    TN = 885 (good users correctly allowed)
    Total = 1000
    
Precision vs Recall tradeoff with examples

If you set strict limits, you catch almost all abusers (high recall) but block many good users (low precision). This frustrates real users.

If you set loose limits, you block fewer good users (high precision) but miss many abusers (low recall), risking system abuse.

Example: A chat app wants to stop spam. High recall means catching most spammers but may block some normal users. High precision means blocking only real spammers but some spam may get through.

Good vs Bad metric values for this use case

Good: Precision around 90% or more and recall above 85%. This means most blocked users are truly abusive and most abusers are caught.

Bad: Precision below 50% means many good users blocked. Recall below 50% means many abusers slip through.

Common pitfalls in metrics
  • Accuracy paradox: If abuse is rare, a model blocking no one can have high accuracy but is useless.
  • Data leakage: Using future or leaked info inflates metrics but fails in real use.
  • Overfitting: Model works well on training data but fails to generalize, causing poor real-world abuse detection.
Self-check question

Your abuse prevention model has 98% accuracy but only 12% recall on abusive users. Is it good for production?

Answer: No. The high accuracy is misleading because abuse is rare. The very low recall means it misses most abusers, so it won't effectively prevent abuse.

Key Result
Balancing high recall and precision is key to effective rate limiting and abuse prevention.

Practice

(1/5)
1. What is the main purpose of rate limiting in AI services?
easy
A. To improve the accuracy of AI models
B. To increase the speed of AI predictions
C. To stop too many requests from one user in a short time
D. To reduce the size of the AI model

Solution

  1. Step 1: Understand rate limiting concept

    Rate limiting is designed to control how many requests a user can make in a short period.
  2. Step 2: Identify the main goal

    The goal is to prevent overload and abuse by stopping too many requests quickly.
  3. Final Answer:

    To stop too many requests from one user in a short time -> Option C
  4. Quick Check:

    Rate limiting = stop excess requests [OK]
Hint: Rate limiting controls request frequency to prevent overload [OK]
Common Mistakes:
  • Confusing rate limiting with improving model accuracy
  • Thinking rate limiting speeds up predictions
  • Assuming rate limiting reduces model size
2. Which Python code snippet correctly implements a simple rate limiter that blocks requests after 5 calls?
easy
A. if requests_count >= 5: block_request()
B. if requests_count == 5: allow_request()
C. if requests_count < 5: block_request()
D. if requests_count > 5: block_request()

Solution

  1. Step 1: Understand the condition for blocking

    We want to block requests when the count reaches or exceeds 5, so >= 5 is correct.
  2. Step 2: Check each option

    if requests_count >= 5: block_request() uses '>= 5' to block requests, which matches the requirement.
  3. Final Answer:

    if requests_count >= 5: block_request() -> Option A
  4. Quick Check:

    Block when count is 5 or more = >= 5 [OK]
Hint: Use '>=' to include the limit value when blocking [OK]
Common Mistakes:
  • Using '>' misses blocking exactly at 5
  • Using '<' blocks too early
  • Allowing request at count 5 instead of blocking
3. Given the code below, what will be printed after 7 calls to check_request()?
requests_count = 0
def block_request():
    print('Blocked')
def allow_request():
    print('Allowed')
def check_request():
    global requests_count
    requests_count += 1
    if requests_count >= 5:
        block_request()
    else:
        allow_request()

for _ in range(7):
    check_request()
medium
A. Allowed printed 7 times
B. Blocked printed 5 times, Allowed printed 2 times
C. Allowed printed 5 times, Blocked printed 2 times
D. Allowed printed 4 times, Blocked printed 3 times

Solution

  1. Step 1: Track requests_count and output

    For calls 1 to 4, requests_count is less than 5, so 'Allowed' prints. For calls 5 to 7, requests_count is 5 or more, so 'Blocked' prints.
  2. Step 2: Count prints

    'Allowed' prints 4 times, 'Blocked' prints 3 times.
  3. Final Answer:

    Allowed printed 4 times, Blocked printed 3 times -> Option D
  4. Quick Check:

    4 Allowed + 3 Blocked = 7 calls [OK]
Hint: Count calls before and after limit to find outputs [OK]
Common Mistakes:
  • Counting 'Allowed' as 5 times instead of 4
  • Confusing when blocking starts
  • Ignoring global variable increment
4. The following code is meant to block requests after 2 calls, but it blocks after 3 calls instead. What is the error?
requests_count = 0
def check_request():
    global requests_count
    requests_count += 1
    if requests_count > 3:
        print('Blocked')
    else:
        print('Allowed')
medium
A. The requests_count should start at 1, not 0
B. The condition should be '>= 3' instead of '> 3'
C. The print statements are reversed
D. The global keyword is missing

Solution

  1. Step 1: Analyze the blocking condition

    The code blocks only when requests_count > 3, so blocking starts at 4th call, not 3rd.
  2. Step 2: Fix condition to block at 3 calls

    Changing condition to '>= 3' will block starting at the 3rd call as intended.
  3. Final Answer:

    The condition should be '>= 3' instead of '> 3' -> Option B
  4. Quick Check:

    Block at 3 calls means '>= 3' [OK]
Hint: Use '>=' to include the limit call in blocking [OK]
Common Mistakes:
  • Using '>' blocks too late
  • Starting count at 1 instead of 0 is unnecessary
  • Forgetting global keyword (but it's present here)
5. You want to prevent abuse by limiting users to 10 requests per minute. Which approach best combines rate limiting with user tracking in Python?
hard
A. Use a dictionary to store user IDs with timestamps of their requests, then block if more than 10 in last 60 seconds
B. Reset a global request count every minute without user distinction
C. Block all requests after 10 total requests regardless of user
D. Allow unlimited requests but slow down responses after 10 requests

Solution

  1. Step 1: Understand per-user rate limiting

    To limit requests per user, we must track each user's request times separately.
  2. Step 2: Choose data structure and logic

    A dictionary with user IDs as keys and timestamps as values lets us count requests in the last 60 seconds and block if over 10.
  3. Final Answer:

    Use a dictionary to store user IDs with timestamps of their requests, then block if more than 10 in last 60 seconds -> Option A
  4. Quick Check:

    Per-user tracking + time window = dictionary with timestamps [OK]
Hint: Track each user's timestamps to count requests per minute [OK]
Common Mistakes:
  • Using global count ignores individual users
  • Blocking all users after total requests causes unfair blocking
  • Slowing responses is not strict rate limiting