You need to design a system to store and query a social graph with billions of users and their connections. Which architecture best supports fast retrieval of a user's friends and mutual friends?
Think about which storage type naturally represents relationships and supports fast graph queries.
Graph databases are designed to efficiently store and traverse relationships, making them ideal for social graphs where queries like mutual friends are common.
Your social graph system must handle millions of new friendship connections per minute. Which approach best supports this write load while maintaining query performance?
Consider how to distribute data to avoid bottlenecks and allow parallel writes.
Partitioning the graph by user ID distributes write load across servers, enabling high throughput and maintaining query speed by limiting data scope per server.
In a distributed social graph system, which choice best describes the tradeoff when prioritizing availability over consistency?
Think about the CAP theorem and what happens when availability is prioritized.
Prioritizing availability means the system continues to operate during network issues but may serve stale or inconsistent data temporarily.
Which data model best supports efficient queries for "friends of friends" in a social graph?
Consider which model naturally expresses multi-level relationships.
The graph model allows direct traversal of edges to find friends of friends efficiently without expensive joins or lookups.
Estimate the storage needed to store a social graph with 1 billion users, each having an average of 200 friends. Assume each user ID and friend ID requires 8 bytes, and each friendship is stored bidirectionally.
Calculate total friendships, multiply by bytes per friendship, and consider bidirectional storage.
1 billion users × 200 friends = 200 billion directed edges (bidirectional storage). Each edge stores two 8-byte IDs = 16 bytes. Total: 200B × 16 bytes = 3.2 TB.
