Bird
Raised Fist0
HLDsystem_design~25 mins

Online presence system in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Online Presence System
Design covers real-time presence tracking, status updates, and querying presence. Does not cover user authentication system, messaging, or notification delivery.
Functional Requirements
FR1: Track and display users' online/offline status in real-time
FR2: Support up to 100,000 concurrent users
FR3: Allow users to see the presence status of their contacts/friends
FR4: Update presence status within 2 seconds of change
FR5: Provide an API for clients to query presence status
FR6: Handle user login, logout, and idle states
FR7: Ensure data consistency and availability
Non-Functional Requirements
NFR1: System must have 99.9% uptime
NFR2: API response latency p99 under 200ms
NFR3: Scale to 100,000 concurrent connections
NFR4: Support mobile and web clients
NFR5: Data retention for presence status is 24 hours
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Presence status store (in-memory cache and persistent storage)
Real-time communication layer (WebSocket or similar)
API gateway for client requests
User session manager
Message broker for event propagation
Database for storing presence history
Design Patterns
Publish-Subscribe for real-time updates
Cache aside pattern for presence data
Heartbeat mechanism to detect client disconnects
Sharding to scale presence data storage
Event sourcing for presence changes
Reference Architecture
Client Devices (Web/Mobile)
       |
       | WebSocket / API Requests
       v
  +-------------------+
  |   API Gateway     |
  +-------------------+
       |
       | REST API / WS
       v
  +-------------------+       +-------------------+
  | Presence Service   |<----->| Message Broker    |
  +-------------------+       +-------------------+
       |
       | Cache (Redis) for fast presence data
       v
  +-------------------+
  | Persistent Store  |
  | (NoSQL DB)       |
  +-------------------+
Components
API Gateway
Nginx / Envoy
Handles client connections, routes API and WebSocket requests
Presence Service
Node.js / Go microservice
Manages user presence state, updates cache and database, publishes events
Message Broker
Apache Kafka / Redis PubSub
Distributes presence change events to interested services and clients
Cache
Redis
Stores current presence status for fast read/write access
Persistent Store
Cassandra / DynamoDB
Stores presence history and durable data
Client Devices
Web browsers, Mobile apps
Send presence updates and receive real-time presence info
Request Flow
1. User logs in from client device and establishes WebSocket connection via API Gateway.
2. Presence Service registers the user as online, updates Redis cache and persistent store.
3. Presence Service publishes 'user online' event to Message Broker.
4. Subscribed clients receive presence update events via WebSocket.
5. When user goes idle or offline, client sends update; Presence Service updates cache and DB, publishes event.
6. Clients query presence status via API Gateway; Presence Service reads from Redis cache for low latency.
7. Heartbeat messages from clients help Presence Service detect disconnects and update status accordingly.
Database Schema
Entities: - UserPresence(user_id PK, status ENUM('online', 'offline', 'idle', 'busy'), last_updated TIMESTAMP) - PresenceHistory(id PK, user_id FK, status ENUM('online', 'offline', 'idle', 'busy'), timestamp TIMESTAMP) Relationships: - UserPresence stores current status per user - PresenceHistory stores time-series records of status changes - user_id links presence data to user identity
Scaling Discussion
Bottlenecks
API Gateway handling large number of concurrent WebSocket connections
Redis cache memory limits for storing presence of all users
Message Broker throughput for event distribution
Database write throughput for presence history
Detecting client disconnects accurately at scale
Solutions
Use multiple API Gateway instances with load balancer and sticky sessions for WebSocket connections
Shard Redis cache by user ID or region to distribute memory load
Partition Message Broker topics and use consumer groups for parallel processing
Use a scalable NoSQL database with write-optimized design for presence history
Implement heartbeat and timeout mechanisms with distributed coordination to detect disconnects
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Clarify real-time requirements and presence states
Explain choice of technologies for low latency and scalability
Describe how cache and persistent storage complement each other
Discuss how message broker enables event-driven updates
Address handling of client disconnects and stale data
Outline scaling strategies and bottleneck mitigation