Experiment - Rate limiting and abuse prevention
Problem:You have a generative AI model API that users can call to get text completions. Some users are sending too many requests too fast, causing the system to slow down and sometimes crash. This is called abuse or overload. Currently, there is no limit on how many requests a user can send per minute.
Current Metrics:System uptime: 85%, Average response time: 1.5 seconds, Number of failed requests due to overload: 15%
Issue:The system is overloaded because of too many requests from some users. This causes slow responses and failures. We need to prevent abuse by limiting how many requests each user can make in a short time.