Bird
Raised Fist0
HLDsystem_design~25 mins

News feed generation in HLD - System Design Exercise

Choose your learning style9 modes available
Design: News Feed Generation System
Design covers feed generation, storage, and retrieval. Does not cover user authentication or content moderation.
Functional Requirements
FR1: Users can follow other users and see their posts in a personalized feed
FR2: Feed should show posts sorted by recency and relevance
FR3: Support 10 million active users with 1 million new posts per day
FR4: Feed updates should appear within 5 seconds of new posts
FR5: Users can like and comment on posts
FR6: Support pagination and infinite scrolling in the feed
FR7: Allow filtering feed by topics or hashtags
Non-Functional Requirements
NFR1: System must handle 100,000 concurrent feed requests
NFR2: API response time for feed retrieval should be under 200ms (p99)
NFR3: System availability should be 99.9% uptime
NFR4: Data consistency for likes and comments must be strong
NFR5: Storage should be scalable for growing user and post data
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
User service to manage follow relationships
Post storage database
Feed generation service (push or pull model)
Cache layer for fast feed retrieval
Ranking and personalization engine
API gateway for client requests
Design Patterns
Fan-out on write vs fan-out on read
Caching strategies (e.g., Redis, Memcached)
Message queues for asynchronous processing
Sharding and partitioning for scalability
Event-driven architecture for updates
Reference Architecture
Client
  |
  v
API Gateway
  |
  v
Feed Service <--> User Service
  |               |
  |               v
  |           Follow DB
  |
  v
Cache (Redis)
  |
  v
Post Storage (NoSQL DB)
  |
  v
Message Queue --> Feed Generator Worker
  |
  v
Ranking Engine

Components
API Gateway
Nginx or AWS API Gateway
Handles client requests, routes to feed service
Feed Service
Node.js/Java microservice
Handles feed retrieval, pagination, filtering
User Service
Java/Spring Boot
Manages user data and follow relationships
Follow DB
Relational DB (PostgreSQL)
Stores user follow relationships
Cache
Redis
Stores precomputed feeds for fast retrieval
Post Storage
NoSQL DB (Cassandra or DynamoDB)
Stores posts and metadata
Message Queue
Kafka or RabbitMQ
Queues new posts for feed generation
Feed Generator Worker
Python/Java worker
Processes new posts, pushes updates to followers' feeds
Ranking Engine
Custom service or ML model
Ranks posts by relevance and recency
Request Flow
1. User posts new content via API Gateway
2. Post stored in Post Storage
3. Post event sent to Message Queue
4. Feed Generator Worker consumes event, fetches followers from Follow DB
5. Worker pushes post to followers' feed cache in Redis
6. User requests feed via API Gateway
7. Feed Service fetches feed from Redis cache
8. Ranking Engine adjusts order based on relevance
9. Feed Service returns sorted feed to user
Database Schema
Entities: - User(user_id PK, name, ...) - Follow(follower_id FK->User, followee_id FK->User, PK(follower_id, followee_id)) - Post(post_id PK, user_id FK->User, content, timestamp, topics) - Like(user_id FK->User, post_id FK->Post, PK(user_id, post_id)) - Comment(comment_id PK, post_id FK->Post, user_id FK->User, content, timestamp) Relationships: - User to Follow is 1:N (one user can follow many users) - User to Post is 1:N - Post to Like is 1:N - Post to Comment is 1:N
Scaling Discussion
Bottlenecks
Feed cache size grows too large for Redis memory
Message queue overwhelmed by high post volume
Ranking engine latency increases with feed size
Database hotspots on popular users or posts
API Gateway throttling under heavy concurrent requests
Solutions
Shard Redis cache by user ID or region; use eviction policies
Partition message queue topics; scale consumers horizontally
Use approximate ranking or pre-rank feeds offline
Use database sharding and read replicas
Use load balancers and autoscaling for API Gateway
Interview Tips
Time: 10 min for requirements and clarifications, 15 min for architecture and components, 10 min for data flow and database design, 10 min for scaling and trade-offs discussion
Clarify real-time vs batch feed generation trade-offs
Explain fan-out on write vs fan-out on read
Discuss caching strategies and cache invalidation
Describe how to handle large scale with sharding and partitioning
Mention consistency needs for likes and comments
Highlight how ranking improves user experience