0
0
HldConceptBeginner · 3 min read

Bloom Filter: What It Is and How It Works in System Design

A Bloom filter is a space-efficient data structure used to test whether an element is in a set, allowing false positives but no false negatives. It uses multiple hash functions to set bits in a bit array, quickly checking membership with minimal memory.
⚙️

How It Works

Imagine you have a large list of items, and you want to quickly check if a new item is already in that list without storing the entire list. A Bloom filter helps by using a simple bit array and several hash functions. When you add an item, each hash function points to a position in the bit array, and those bits are set to 1.

Later, to check if an item is in the set, you run the same hash functions and check those bit positions. If any bit is 0, the item is definitely not in the set. If all bits are 1, the item might be in the set, but there is a small chance of a false positive. This trade-off lets Bloom filters use very little memory and answer membership queries very fast.

💻

Example

This example shows a simple Bloom filter in Python using built-in hash functions and a bit array to add and check items.
python
class BloomFilter:
    def __init__(self, size=1000, hash_count=3):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size

    def _hashes(self, item):
        hashes = []
        for i in range(self.hash_count):
            combined = f"{item}-{i}"
            hash_val = hash(combined) % self.size
            hashes.append(hash_val)
        return hashes

    def add(self, item):
        for hash_val in self._hashes(item):
            self.bit_array[hash_val] = 1

    def check(self, item):
        return all(self.bit_array[hash_val] for hash_val in self._hashes(item))

# Usage example
bf = BloomFilter(size=20, hash_count=3)
bf.add("apple")
bf.add("banana")
print(bf.check("apple"))   # True
print(bf.check("banana"))  # True
print(bf.check("cherry"))  # False or True (false positive possible)
Output
True True False
🎯

When to Use

Use Bloom filters when you need to quickly check if an item is in a large set but can tolerate some false positives. They are great for saving memory and speeding up lookups.

Common uses include:

  • Web browsers to check if a URL is in a blacklist
  • Databases to avoid expensive disk lookups
  • Distributed systems to reduce network calls
  • Spam filters and caching systems

Key Points

  • Bloom filters use multiple hash functions and a bit array.
  • They never give false negatives but can give false positives.
  • They are very memory efficient compared to storing full data.
  • They are fast for membership queries.
  • They do not support removing items easily.

Key Takeaways

Bloom filters efficiently test set membership with some false positives but no false negatives.
They use multiple hash functions to set bits in a fixed-size bit array.
Ideal for large data sets where memory and speed matter more than perfect accuracy.
Commonly used in caching, databases, and network systems to reduce expensive lookups.
They do not support item removal without special variants.