Design: Horizontal Pod Autoscaler (HPA) System
Design the autoscaling control loop and its integration with Kubernetes. Out of scope: detailed Kubernetes cluster management, pod scheduling, and application-level scaling logic.
Functional Requirements
FR1: Automatically scale the number of pod replicas in a Kubernetes cluster based on observed metrics.
FR2: Support scaling based on CPU utilization and custom metrics like request rate or memory usage.
FR3: Ensure minimum and maximum pod replica limits are respected.
FR4: Provide near real-time scaling decisions with latency under 30 seconds.
FR5: Maintain system availability during scaling operations.
FR6: Expose metrics and scaling status for monitoring.
Non-Functional Requirements
NFR1: Handle up to 10,000 pods across multiple namespaces.
NFR2: Scaling decisions must be made every 15 seconds or less.
NFR3: System availability target of 99.9% uptime.
NFR4: Scaling actions should avoid thrashing (rapid scale up/down).
NFR5: Integrate with Kubernetes API and metrics server.