Bloom Filter: What It Is and How It Works in System Design
Bloom filter is a space-efficient data structure used to test whether an element is in a set, allowing false positives but no false negatives. It uses multiple hash functions to set bits in a bit array, quickly checking membership with minimal memory.How It Works
Imagine you have a large list of items, and you want to quickly check if a new item is already in that list without storing the entire list. A Bloom filter helps by using a simple bit array and several hash functions. When you add an item, each hash function points to a position in the bit array, and those bits are set to 1.
Later, to check if an item is in the set, you run the same hash functions and check those bit positions. If any bit is 0, the item is definitely not in the set. If all bits are 1, the item might be in the set, but there is a small chance of a false positive. This trade-off lets Bloom filters use very little memory and answer membership queries very fast.
Example
class BloomFilter: def __init__(self, size=1000, hash_count=3): self.size = size self.hash_count = hash_count self.bit_array = [0] * size def _hashes(self, item): hashes = [] for i in range(self.hash_count): combined = f"{item}-{i}" hash_val = hash(combined) % self.size hashes.append(hash_val) return hashes def add(self, item): for hash_val in self._hashes(item): self.bit_array[hash_val] = 1 def check(self, item): return all(self.bit_array[hash_val] for hash_val in self._hashes(item)) # Usage example bf = BloomFilter(size=20, hash_count=3) bf.add("apple") bf.add("banana") print(bf.check("apple")) # True print(bf.check("banana")) # True print(bf.check("cherry")) # False or True (false positive possible)
When to Use
Use Bloom filters when you need to quickly check if an item is in a large set but can tolerate some false positives. They are great for saving memory and speeding up lookups.
Common uses include:
- Web browsers to check if a URL is in a blacklist
- Databases to avoid expensive disk lookups
- Distributed systems to reduce network calls
- Spam filters and caching systems
Key Points
- Bloom filters use multiple hash functions and a bit array.
- They never give false negatives but can give false positives.
- They are very memory efficient compared to storing full data.
- They are fast for membership queries.
- They do not support removing items easily.