0
0
HldHow-ToIntermediate ยท 4 min read

How to Design a Scalable News Feed System

To design a news feed, create a system that collects user posts and interactions, ranks them by relevance using ranking algorithms, and delivers them efficiently via caching and database sharding. Use fan-out on write or fan-out on read strategies to handle feed generation at scale.
๐Ÿ“

Syntax

A news feed system typically involves these parts:

  • Data ingestion: Collect posts and user actions.
  • Storage: Use databases to store posts and user info.
  • Feed generation: Create personalized feeds using ranking algorithms.
  • Delivery: Serve feeds quickly using caching and APIs.

Key design patterns include fan-out on write (push updates to followers when a post is created) and fan-out on read (generate feed when user requests it).

python
class NewsFeedSystem:
    def ingest_post(self, post):
        # Save post to database
        pass

    def generate_feed(self, user_id):
        # Fetch posts, rank, and return feed
        pass

    def deliver_feed(self, user_id):
        # Use cache or database to serve feed
        pass
๐Ÿ’ป

Example

This example shows a simple feed generation using fan-out on read. It fetches posts from followed users, sorts by timestamp, and returns the latest posts.

python
from datetime import datetime

class Post:
    def __init__(self, user_id, content, timestamp):
        self.user_id = user_id
        self.content = content
        self.timestamp = timestamp

class NewsFeed:
    def __init__(self):
        self.posts = []
        self.follow_map = {}

    def add_post(self, post):
        self.posts.append(post)

    def follow(self, follower, followee):
        self.follow_map.setdefault(follower, set()).add(followee)

    def get_feed(self, user_id, limit=5):
        followees = self.follow_map.get(user_id, set())
        feed_posts = [p for p in self.posts if p.user_id in followees]
        feed_posts.sort(key=lambda p: p.timestamp, reverse=True)
        return feed_posts[:limit]

# Usage
feed = NewsFeed()
feed.follow('user1', 'user2')
feed.add_post(Post('user2', 'Hello from user2', datetime(2024, 6, 1, 10, 0)))
feed.add_post(Post('user3', 'Hello from user3', datetime(2024, 6, 1, 9, 0)))
feed.add_post(Post('user2', 'Another post', datetime(2024, 6, 1, 11, 0)))

user1_feed = feed.get_feed('user1')
for post in user1_feed:
    print(f"{post.user_id}: {post.content} at {post.timestamp}")
Output
user2: Another post at 2024-06-01 11:00:00 user2: Hello from user2 at 2024-06-01 10:00:00
โš ๏ธ

Common Pitfalls

Common mistakes when designing news feeds include:

  • Not handling scale: Without caching or sharding, feed generation becomes slow.
  • Ignoring personalization: Showing all posts without ranking reduces user engagement.
  • Using fan-out on write without limits: Can overload the system when a user has millions of followers.
  • Not updating feeds in real-time: Users expect fresh content quickly.
none
## Wrong approach: Fan-out on write without limits
# Push every post to all followers immediately
# This can cause overload if a user has many followers

## Better approach: Use queues and batch updates
# Fan-out in batches or on read to reduce load
๐Ÿ“Š

Quick Reference

  • Fan-out on write: Push posts to followers' feeds when created. Fast reads, costly writes.
  • Fan-out on read: Generate feed when user requests. Slower reads, cheaper writes.
  • Ranking: Use factors like recency, user interaction, and popularity.
  • Caching: Store popular feeds to reduce database load.
  • Sharding: Split data by user or post ID to scale databases.
โœ…

Key Takeaways

Use fan-out on write or fan-out on read strategies based on scale and latency needs.
Rank feed items by relevance using recency and user interactions to improve engagement.
Implement caching and database sharding to handle large user bases efficiently.
Avoid pushing updates to millions of followers instantly to prevent system overload.
Design for real-time updates to keep the feed fresh and engaging.