HldHow-ToIntermediate · 4 min read

How to Design Distributed Cache: Key Concepts and Example

To design a distributed cache, split cached data across multiple nodes to improve speed and scalability, using consistent hashing or sharding to distribute keys. Ensure data consistency with cache invalidation or TTL, and handle node failures with replication or fallback to the main database.

📐

Syntax

A distributed cache system typically involves these parts:

Cache Nodes: Multiple servers storing parts of the cache.
Consistent Hashing: A method to assign keys to cache nodes evenly.
Cache Client: The application component that queries the cache.
Cache Invalidation: Mechanism to keep cache data fresh.
Replication: Copying data to multiple nodes for fault tolerance.

javascript

class DistributedCache {
    constructor(nodes) {
        this.nodes = nodes; // list of cache servers
        this.hashRing = this.createHashRing(nodes);
    }

    createHashRing(nodes) {
        // Map nodes to points on a hash ring for consistent hashing
        // Simplified example
        return nodes;
    }

    getNode(key) {
        // Use consistent hashing to find node for key
        return this.nodes[key.length % this.nodes.length];
    }

    get(key) {
        const node = this.getNode(key);
        return node.get(key);
    }

    set(key, value) {
        const node = this.getNode(key);
        node.set(key, value);
    }
}

💻

Example

This example shows a simple distributed cache with two nodes using consistent hashing to store and retrieve values.

javascript

class CacheNode {
    constructor(name) {
        this.name = name;
        this.store = new Map();
    }

    get(key) {
        return this.store.get(key) || null;
    }

    set(key, value) {
        this.store.set(key, value);
    }
}

class DistributedCache {
    constructor(nodes) {
        this.nodes = nodes;
    }

    getNode(key) {
        // Simple hash: sum char codes mod nodes count
        const hash = [...key].reduce((acc, c) => acc + c.charCodeAt(0), 0);
        return this.nodes[hash % this.nodes.length];
    }

    get(key) {
        const node = this.getNode(key);
        return node.get(key);
    }

    set(key, value) {
        const node = this.getNode(key);
        node.set(key, value);
    }
}

const nodeA = new CacheNode('NodeA');
const nodeB = new CacheNode('NodeB');
const cache = new DistributedCache([nodeA, nodeB]);

cache.set('apple', 'fruit');
cache.set('carrot', 'vegetable');

console.log(cache.get('apple'));
console.log(cache.get('carrot'));
console.log(cache.get('banana'));

Output

fruit vegetable null

⚠️

Common Pitfalls

Common mistakes when designing distributed caches include:

Ignoring cache consistency: Not updating or invalidating cache leads to stale data.
Uneven data distribution: Poor hashing causes some nodes to be overloaded.
No fault tolerance: Single node failure causes data loss or downtime.
Over-caching: Caching too much data wastes memory and slows down cache.

Always plan for cache invalidation, use consistent hashing, and replicate data for reliability.

javascript

/* Wrong: Using simple modulo without consistent hashing can cause uneven load */
function getNodeSimple(key, nodes) {
    return nodes[key.length % nodes.length];
}

/* Right: Use consistent hashing library or algorithm to distribute keys evenly */
// Pseudocode: hash key, find closest node on hash ring
function getNodeConsistentHash(key, hashRing) {
    // find node clockwise from key hash
    return hashRing.find(node => node.hash >= hash(key)) || hashRing[0];
}

📊

Quick Reference

Consistent Hashing: Distributes keys evenly and minimizes rebalancing.
Cache Invalidation: Use TTL or event-based invalidation to keep data fresh.
Replication: Store copies on multiple nodes to handle failures.
Fallback: On cache miss or failure, query the main database.
Monitoring: Track cache hit/miss rates and node health.

✅

Key Takeaways

Use consistent hashing to distribute cache keys evenly across nodes.

Implement cache invalidation or TTL to avoid stale data.

Replicate cache data to handle node failures and improve reliability.

Design clients to fallback to the main database on cache misses.

Monitor cache performance and node health regularly.