HLDsystem_design~25 mins

News feed generation in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: News Feed Generation System

Design covers feed generation, storage, and retrieval. Does not cover user authentication or content moderation.

Functional Requirements

FR1: Users can follow other users and see their posts in a personalized feed

FR2: Feed should show posts sorted by recency and relevance

FR3: Support 10 million active users with 1 million new posts per day

FR4: Feed updates should appear within 5 seconds of new posts

FR5: Users can like and comment on posts

FR6: Support pagination and infinite scrolling in the feed

FR7: Allow filtering feed by topics or hashtags

Non-Functional Requirements

NFR1: System must handle 100,000 concurrent feed requests

NFR2: API response time for feed retrieval should be under 200ms (p99)

NFR3: System availability should be 99.9% uptime

NFR4: Data consistency for likes and comments must be strong

NFR5: Storage should be scalable for growing user and post data

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

User service to manage follow relationships

Post storage database

Feed generation service (push or pull model)

Cache layer for fast feed retrieval

Ranking and personalization engine

API gateway for client requests

Design Patterns

Fan-out on write vs fan-out on read

Caching strategies (e.g., Redis, Memcached)

Message queues for asynchronous processing

Sharding and partitioning for scalability

Event-driven architecture for updates

Reference Architecture

Client
  |
  v
API Gateway
  |
  v
Feed Service <--> User Service
  |               |
  |               v
  |           Follow DB
  |
  v
Cache (Redis)
  |
  v
Post Storage (NoSQL DB)
  |
  v
Message Queue --> Feed Generator Worker
  |
  v
Ranking Engine

Components

API Gateway

Nginx or AWS API Gateway

Handles client requests, routes to feed service

Feed Service

Node.js/Java microservice

Handles feed retrieval, pagination, filtering

User Service

Java/Spring Boot

Manages user data and follow relationships

Follow DB

Relational DB (PostgreSQL)

Stores user follow relationships

Cache

Redis

Stores precomputed feeds for fast retrieval

Post Storage

NoSQL DB (Cassandra or DynamoDB)

Stores posts and metadata

Message Queue

Kafka or RabbitMQ

Queues new posts for feed generation

Feed Generator Worker

Python/Java worker

Processes new posts, pushes updates to followers' feeds

Ranking Engine

Custom service or ML model

Ranks posts by relevance and recency

Request Flow

1. User posts new content via API Gateway

2. Post stored in Post Storage

3. Post event sent to Message Queue

4. Feed Generator Worker consumes event, fetches followers from Follow DB

5. Worker pushes post to followers' feed cache in Redis

6. User requests feed via API Gateway

7. Feed Service fetches feed from Redis cache

8. Ranking Engine adjusts order based on relevance

9. Feed Service returns sorted feed to user

Database Schema

Entities: - User(user_id PK, name, ...) - Follow(follower_id FK->User, followee_id FK->User, PK(follower_id, followee_id)) - Post(post_id PK, user_id FK->User, content, timestamp, topics) - Like(user_id FK->User, post_id FK->Post, PK(user_id, post_id)) - Comment(comment_id PK, post_id FK->Post, user_id FK->User, content, timestamp) Relationships: - User to Follow is 1:N (one user can follow many users) - User to Post is 1:N - Post to Like is 1:N - Post to Comment is 1:N

Scaling Discussion

Bottlenecks

Feed cache size grows too large for Redis memory

Message queue overwhelmed by high post volume

Ranking engine latency increases with feed size

Database hotspots on popular users or posts

API Gateway throttling under heavy concurrent requests

Solutions

Shard Redis cache by user ID or region; use eviction policies

Partition message queue topics; scale consumers horizontally

Use approximate ranking or pre-rank feeds offline

Use database sharding and read replicas

Use load balancers and autoscaling for API Gateway

Interview Tips

Time: 10 min for requirements and clarifications, 15 min for architecture and components, 10 min for data flow and database design, 10 min for scaling and trade-offs discussion

Clarify real-time vs batch feed generation trade-offs

Explain fan-out on write vs fan-out on read

Discuss caching strategies and cache invalidation

Describe how to handle large scale with sharding and partitioning

Mention consistency needs for likes and comments

Highlight how ranking improves user experience