Error handling and rate limits focus on system reliability and user experience rather than traditional ML accuracy metrics. Key metrics include error rate (how often requests fail), latency (response time), and throughput (requests handled per second). Monitoring these helps ensure the system responds well under load and recovers gracefully from errors.
Error handling and rate limits in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Request Outcome Confusion Table:
| Outcome | Count |
|---------------|-------|
| Successful | 950 |
| Rate Limited | 30 |
| Error (500s) | 20 |
| Timeout | 0 |
Total Requests = 1000
This table shows how many requests succeeded, were blocked by rate limits, or failed due to errors.
In error handling and rate limits, the tradeoff is between strict limits and user experience. Setting very low rate limits reduces errors but may block legitimate users (false positives). Setting high limits improves access but risks system overload (false negatives).
Example: A chat app with strict rate limits may block users sending many messages quickly (high precision in blocking bad requests) but annoy fast users (low recall of good requests). A looser limit improves recall but risks slowdowns.
- Good: Error rate under 1%, rate limit triggered only on abuse, latency under 200ms, throughput meets demand.
- Bad: Error rate above 5%, frequent rate limit blocks for normal users, latency spikes over 1 second, system crashes under load.
- Ignoring error types: Treating all errors equally hides critical failures.
- Data leakage: Not separating test and production logs can mislead error rates.
- Overfitting to metrics: Tuning only to reduce error rate may cause overly strict rate limits harming users.
- Accuracy paradox: High success rate may hide many blocked users if rate limits are too strict.
Your system shows 98% success rate but 12% of legitimate users get blocked by rate limits. Is it good for production? Why or why not?
Answer: No, because even though most requests succeed, blocking 12% of good users harms user experience and may reduce trust. Rate limits need adjustment to balance protection and access.
Practice
Solution
Step 1: Understand error handling purpose
Error handling is used to manage unexpected problems during app execution.Step 2: Connect to AI app context
In AI apps, error handling helps keep the app running smoothly despite issues.Final Answer:
To keep the app running smoothly even when problems happen -> Option AQuick Check:
Error handling = keep app running smoothly [OK]
- Thinking error handling speeds up training
- Confusing error handling with increasing requests
- Believing error handling reduces model size
Solution
Step 1: Identify correct try-except syntax
Python uses try: block followed by except: to catch errors.Step 2: Match syntax with options
try: response = call_api() except: print('Error occurred') uses correct try-except structure; others use invalid keywords.Final Answer:
try:\n response = call_api()\nexcept:\n print('Error occurred') -> Option AQuick Check:
try-except syntax = try: response = call_api() except: print('Error occurred') [OK]
- Using catch instead of except
- Using if error instead of try-except
- Writing invalid keywords like error handling:
import time
try:
response = call_api()
except RateLimitError:
print('Rate limit hit, waiting...')
time.sleep(10)
response = call_api()
print('Done')Solution
Step 1: Understand try-except with RateLimitError
If call_api() raises RateLimitError, except block runs printing message and waits 10 seconds.Step 2: After waiting, call_api() runs again and then prints 'Done'
So output includes the message and 'Done' on separate lines.Final Answer:
Rate limit hit, waiting...\nDone -> Option DQuick Check:
RateLimitError caught, message + Done printed [OK]
- Assuming no message prints
- Thinking program crashes on rate limit
- Ignoring the second call_api() after sleep
try:
response = call_api()
except RateLimitError
print('Too many requests')
time.sleep(5)
response = call_api()Solution
Step 1: Check except syntax
Python requires a colon ':' after except RateLimitError to start the block.Step 2: Verify other parts
time.sleep() is valid, retrying call_api() is allowed, print syntax is correct.Final Answer:
Missing colon after except RateLimitError -> Option CQuick Check:
except needs colon ':' [OK]
- Forgetting colon after except
- Thinking sleep() is invalid in except
- Believing retry is not allowed
Solution
Step 1: Understand exponential backoff with retries
We want to retry after waiting, doubling wait time each failure, and stop on success.Step 2: Analyze options for correct loop and break
import time wait = 1 while True: try: response = call_api() break except RateLimitError: time.sleep(wait) wait *= 2 uses while True loop, tries call_api(), breaks on success, and doubles wait after RateLimitError.Step 3: Check other options
import time wait = 1 for _ in range(3): try: response = call_api() break except RateLimitError: time.sleep(wait) wait *= 2 breaks on success but uses for loop with fixed tries (less flexible). import time wait = 1 while True: try: response = call_api() except RateLimitError: time.sleep(wait) wait += 1 else: break increments wait linearly, not exponential. import time wait = 1 for _ in range(3): try: response = call_api() except RateLimitError: wait *= 2 time.sleep(wait) else: break doubles wait before sleep, but order is less clear.Final Answer:
import time wait = 1 while True: try: response = call_api() break except RateLimitError: time.sleep(wait) wait *= 2 -> Option BQuick Check:
Retry loop with exponential backoff = import time wait = 1 while True: try: response = call_api() break except RateLimitError: time.sleep(wait) wait *= 2 [OK]
- Using for loop limits retries too strictly
- Incrementing wait linearly instead of doubling
- Not breaking loop on success
