Bird
Raised Fist0
Microservicessystem_design~5 mins

Rate limiting in Microservices - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is rate limiting in system design?
Rate limiting is a technique to control how many requests a user or client can make to a service in a given time. It helps prevent overload and abuse.
Click to reveal answer
intermediate
Name two common algorithms used for rate limiting.
Two common algorithms are Token Bucket and Leaky Bucket. They help decide when to allow or reject requests based on limits.
Click to reveal answer
beginner
Why is rate limiting important in microservices?
Rate limiting protects microservices from too many requests that can cause crashes or slowdowns. It ensures fair use and system stability.
Click to reveal answer
intermediate
What is the difference between fixed window and sliding window rate limiting?
Fixed window counts requests in fixed time blocks (like per minute). Sliding window tracks requests continuously over time for smoother limits.
Click to reveal answer
advanced
How can distributed rate limiting be implemented across multiple servers?
Distributed rate limiting can use a shared data store like Redis to keep counters consistent across servers, ensuring limits apply globally.
Click to reveal answer
What is the main goal of rate limiting?
AIncrease the number of requests a user can make
BPrevent system overload by limiting requests
CStore user data securely
DImprove database query speed
Which algorithm allows bursts of requests but controls average rate over time?
ASliding window
BFixed window
CToken bucket
DRound robin
In microservices, why might you use a shared Redis store for rate limiting?
ATo synchronize rate limits across servers
BTo cache images
CTo store user passwords
DTo log errors
What happens when a client exceeds the rate limit?
AThe request is rejected or throttled
BThe request is delayed indefinitely
CThe client is permanently banned
DThe request is accepted and processed
Which rate limiting method provides smoother control by tracking requests continuously?
ALeaky bucket
BFixed window
CToken bucket
DSliding window
Explain how rate limiting helps maintain system stability in microservices.
Think about what happens if too many requests hit a service at once.
You got /4 concepts.
    Describe the difference between fixed window and sliding window rate limiting techniques.
    Consider how requests are counted over time.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the main purpose of rate limiting in microservices?
      easy
      A. To control how many requests a user can make in a given time
      B. To increase the speed of the service
      C. To store user data securely
      D. To balance the load between servers

      Solution

      1. Step 1: Understand the concept of rate limiting

        Rate limiting is designed to restrict the number of requests a user or client can send to a service within a certain time frame.
      2. Step 2: Identify the main goal of rate limiting

        The main goal is to prevent overload and abuse by controlling request frequency, not to speed up services or store data.
      3. Final Answer:

        To control how many requests a user can make in a given time -> Option A
      4. Quick Check:

        Rate limiting = Control request count [OK]
      Hint: Rate limiting limits request count per time [OK]
      Common Mistakes:
      • Confusing rate limiting with load balancing
      • Thinking rate limiting speeds up the service
      • Mixing rate limiting with data storage
      2. Which of the following is the correct way to represent a fixed window rate limiter allowing 100 requests per minute in pseudocode?
      easy
      A. if requests_in_last_minute < 100 then block else allow
      B. if requests_in_last_hour > 100 then block else allow
      C. if requests_in_last_minute > 100 then block else allow
      D. if requests_in_last_second > 100 then allow else block

      Solution

      1. Step 1: Understand fixed window rate limiting logic

        Fixed window rate limiting counts requests in a fixed time window (e.g., 1 minute) and blocks if the count exceeds the limit.
      2. Step 2: Match the correct condition for allowing or blocking

        If requests exceed 100 in the last minute, block; otherwise, allow. if requests_in_last_minute > 100 then block else allow matches this logic exactly.
      3. Final Answer:

        if requests_in_last_minute > 100 then block else allow -> Option C
      4. Quick Check:

        Fixed window limit = block if over limit [OK]
      Hint: Block when requests exceed limit in fixed window [OK]
      Common Mistakes:
      • Using wrong time window (hour instead of minute)
      • Reversing the condition (blocking when under limit)
      • Allowing requests when they should be blocked
      3. Given this pseudocode for a token bucket rate limiter:
      bucket_capacity = 5
      refill_rate = 1 token per second
      current_tokens = 3
      request_tokens = 2
      if current_tokens >= request_tokens:
          current_tokens -= request_tokens
          allow request
      else:
          block request

      What happens if a request for 4 tokens arrives immediately?
      medium
      A. Request is allowed and tokens reduce to -1
      B. Request is blocked because refill rate is too low
      C. Request is allowed and tokens reduce to 1
      D. Request is blocked because not enough tokens

      Solution

      1. Step 1: Check current tokens against requested tokens

        Current tokens are 3, request needs 4 tokens, which is more than available.
      2. Step 2: Determine if request is allowed or blocked

        Since current tokens (3) < request tokens (4), the request is blocked.
      3. Final Answer:

        Request is blocked because not enough tokens -> Option D
      4. Quick Check:

        Tokens < request = block [OK]
      Hint: Allow only if tokens ≥ requested tokens [OK]
      Common Mistakes:
      • Allowing request when tokens are insufficient
      • Ignoring token count and refill rate
      • Assuming tokens can go negative
      4. A microservice uses a sliding window rate limiter but users report some requests are blocked even when they seem under the limit. Which is the most likely cause?
      medium
      A. The sliding window is not updating timestamps correctly
      B. The service has too many servers without shared state
      C. The rate limit is set too high
      D. The users are sending requests too slowly

      Solution

      1. Step 1: Understand sliding window rate limiter behavior

        Sliding window requires accurate tracking of request timestamps across all servers to count requests correctly.
      2. Step 2: Identify issue with multiple servers and no shared state

        If servers do not share state, each counts requests independently, causing incorrect blocking even if total requests are under limit.
      3. Final Answer:

        The service has too many servers without shared state -> Option B
      4. Quick Check:

        Multiple servers need shared state for sliding window [OK]
      Hint: Sliding window needs shared state across servers [OK]
      Common Mistakes:
      • Blaming slow user requests
      • Assuming rate limit is too high causes blocking
      • Ignoring distributed state issues
      5. You design a rate limiter for a microservice that must handle 10 million users, each allowed 100 requests per hour. Which approach best balances accuracy and scalability?
      hard
      A. Use distributed token buckets with local caches and periodic sync
      B. Use a centralized fixed window counter stored in a single database
      C. Use client-side rate limiting without server checks
      D. Use a sliding window log storing every request timestamp centrally

      Solution

      1. Step 1: Analyze scalability needs for 10 million users

        A centralized database (Use a centralized fixed window counter stored in a single database) or storing every timestamp centrally (Use a sliding window log storing every request timestamp centrally) will cause bottlenecks and high latency.
      2. Step 2: Evaluate distributed token bucket with local caches

        Distributed token buckets with local caches reduce central load and sync periodically, balancing accuracy and scalability well.
      3. Step 3: Consider client-side rate limiting

        Client-side (Use client-side rate limiting without server checks) is unreliable as clients can bypass limits.
      4. Final Answer:

        Use distributed token buckets with local caches and periodic sync -> Option A
      5. Quick Check:

        Distributed token bucket = scalable + accurate [OK]
      Hint: Distributed token buckets scale best for millions [OK]
      Common Mistakes:
      • Choosing centralized storage causing bottlenecks
      • Relying only on client-side limits
      • Storing all request logs centrally causing overload