0
0
HLDsystem_design~10 mins

DNS and how domain resolution works in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - DNS and how domain resolution works
Growth Table: DNS Resolution at Different Scales
UsersQueries per Second (QPS)DNS Server LoadLatencyCache Hit Rate
100 users~10 QPSSingle DNS server handles easilyLow (~10-20 ms)High (local resolver cache effective)
10,000 users~1,000 QPSSingle DNS server or small clusterLow to moderate (~20-50 ms)High (recursive resolvers cache)
1,000,000 users~100,000 QPSMultiple DNS servers with load balancingModerate (~50-100 ms)High (global caching, CDN involvement)
100,000,000 users~10,000,000 QPSLarge distributed DNS infrastructure, AnycastLow to moderate (~20-80 ms)Very high (multi-layer caching, CDNs)
First Bottleneck

The first bottleneck in DNS resolution is usually the authoritative DNS servers for the domain. As user queries grow, these servers receive more requests than they can handle, causing increased latency or dropped queries. Local recursive resolvers and caches reduce load but cannot eliminate it. Network bandwidth and latency also impact resolution speed at very large scales.

Scaling Solutions
  • Caching: Use recursive resolvers and local caches to reduce repeated queries.
  • Load Balancing: Distribute queries across multiple authoritative DNS servers.
  • Anycast Routing: Deploy authoritative servers globally with the same IP to route queries to nearest server.
  • DNS Hierarchy: Leverage root and TLD servers to reduce load on authoritative servers.
  • CDNs: Use CDN DNS services to offload traffic and improve latency.
  • Rate Limiting and Security: Protect servers from DNS floods and attacks.
Back-of-Envelope Cost Analysis
  • At 1M users generating ~100K QPS, authoritative servers need to handle ~100K queries per second.
  • Each DNS query is small (~100 bytes), so bandwidth per server is ~10 MB/s at 100K QPS.
  • Storage is minimal, mostly for zone files (few MBs to GBs depending on domain complexity).
  • Network infrastructure must support low latency and high availability.
  • Scaling beyond 10M QPS requires global distribution and Anycast to keep latency low.
Interview Tip

When discussing DNS scalability in an interview, start by explaining the DNS hierarchy and caching layers. Then identify the bottleneck (authoritative servers) and propose solutions like caching, Anycast, and load balancing. Use numbers to justify your reasoning and mention security considerations. Keep your explanation clear and structured.

Self Check Question

Your authoritative DNS server handles 1,000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: First, add caching layers and deploy additional authoritative servers with load balancing or Anycast to distribute the increased query load. This reduces pressure on any single server and maintains low latency.

Key Result
DNS resolution scales well with caching and distributed authoritative servers, but the first bottleneck is the authoritative DNS servers which require horizontal scaling and Anycast routing to handle millions of queries per second efficiently.