0
0
C Sharp (C#)programming~15 mins

HashSet for unique elements in C Sharp (C#) - Deep Dive

Choose your learning style9 modes available
Overview - HashSet for unique elements
What is it?
A HashSet is a collection in C# that stores unique elements only. It automatically prevents duplicates, so each item appears once. You can add, remove, and check for items quickly. It is useful when you want to keep a list without repeated values.
Why it matters
Without a HashSet, you would have to manually check for duplicates when adding items, which is slow and error-prone. HashSet makes it easy and fast to keep only unique items, saving time and avoiding bugs. This helps in tasks like filtering data, tracking unique users, or managing sets of options.
Where it fits
Before learning HashSet, you should understand basic collections like arrays and lists. After HashSet, you can explore other set operations like intersections and unions, or learn about dictionaries for key-value pairs.
Mental Model
Core Idea
A HashSet is like a special box that only lets you keep one copy of each item, ignoring duplicates automatically.
Think of it like...
Imagine a guest list for a party where each name can only appear once. If someone tries to add the same name again, the list stays the same. The HashSet works like that guest list, ensuring no duplicate names.
HashSet Structure:
┌───────────────┐
│   HashSet     │
│ ┌───────────┐ │
│ │ Unique    │ │
│ │ Elements  │ │
│ └───────────┘ │
│ Add()        │
│ Remove()     │
│ Contains()   │
└───────────────┘
Build-Up - 6 Steps
1
FoundationWhat is a HashSet in C#
🤔
Concept: Introducing the HashSet collection and its purpose.
In C#, a HashSet is a collection that stores unique elements of type T. When you add an item, it checks if it already exists. If yes, it ignores the new addition. This means no duplicates can exist in the HashSet.
Result
You get a collection that automatically filters out repeated items.
Understanding that HashSet enforces uniqueness by design helps you avoid manual duplicate checks.
2
FoundationBasic HashSet operations
🤔
Concept: Learn how to add, remove, and check items in a HashSet.
You can use Add() to insert items, Remove() to delete them, and Contains() to check if an item exists. For example: var set = new HashSet(); set.Add("apple"); set.Add("banana"); bool hasApple = set.Contains("apple"); set.Remove("banana");
Result
You can manage unique items easily with simple methods.
Knowing these basic methods lets you manipulate unique collections efficiently.
3
IntermediateHow HashSet prevents duplicates
🤔Before reading on: do you think HashSet checks duplicates by scanning all items or using a faster method? Commit to your answer.
Concept: HashSet uses a hash function to quickly find if an item exists instead of scanning all items.
Each item is processed by a hash function that converts it into a number (hash code). This number helps the HashSet find the item quickly in its internal structure. If two items have the same hash code, it compares them directly to confirm uniqueness.
Result
Adding or checking items happens very fast, even with many elements.
Understanding hashing explains why HashSet is much faster than lists for uniqueness checks.
4
IntermediateSet operations with HashSet
🤔Before reading on: do you think HashSet can combine two sets keeping only unique items or not? Commit to your answer.
Concept: HashSet supports operations like union, intersection, and difference to combine or compare sets.
You can use methods like UnionWith(), IntersectWith(), and ExceptWith() to perform set operations. For example, UnionWith() adds all unique items from another collection: var set1 = new HashSet{1,2,3}; var set2 = new HashSet{3,4,5}; set1.UnionWith(set2); // set1 now contains 1,2,3,4,5
Result
You can easily combine or compare unique collections without duplicates.
Knowing these operations lets you solve complex problems involving groups of unique items.
5
AdvancedCustomizing uniqueness with IEqualityComparer
🤔Before reading on: do you think HashSet can treat 'apple' and 'APPLE' as the same or different by default? Commit to your answer.
Concept: HashSet can use custom rules to decide if two items are equal by providing an equality comparer.
By default, HashSet uses the default equality for the type (case-sensitive for strings). You can pass an IEqualityComparer to change this behavior. For example, to ignore case: var set = new HashSet(StringComparer.OrdinalIgnoreCase); set.Add("apple"); set.Add("APPLE"); // ignored as duplicate This lets you control what 'unique' means.
Result
You can define your own uniqueness rules for complex types or special cases.
Understanding equality customization prevents bugs when default rules don't fit your needs.
6
ExpertHashSet internal resizing and performance
🤔Before reading on: do you think HashSet size stays fixed or changes as you add items? Commit to your answer.
Concept: HashSet dynamically resizes its internal storage to keep operations fast as it grows.
Internally, HashSet uses buckets to store items based on hash codes. When many items fill these buckets, performance slows. To fix this, HashSet automatically increases the number of buckets (resizes) and redistributes items. This resizing is costly but rare, keeping most operations fast.
Result
HashSet maintains fast add and lookup times even with many elements.
Knowing about resizing helps you understand performance trade-offs and when to pre-size a HashSet.
Under the Hood
HashSet stores items in an array of buckets indexed by the hash code of each item. When adding or searching, it computes the hash code, finds the bucket, and checks for equality with existing items. If a collision occurs (different items with same hash), it uses a linked list or similar structure inside the bucket to store multiple items. When the load factor (items per bucket) grows too high, HashSet resizes by creating a bigger bucket array and rehashing all items.
Why designed this way?
HashSet was designed to provide very fast membership tests and insertions, unlike lists that scan all items. Using hashing allows near constant time operations. The resizing balances memory use and speed. Alternatives like balanced trees exist but are slower for simple uniqueness checks. HashSet's design is a tradeoff optimized for average fast performance.
HashSet Internal Structure:

[Item] --hash--> [Bucket Array]
┌───────────────┐
│ Bucket 0      │ -> Item A
│ Bucket 1      │ -> Item B -> Item C (collision)
│ Bucket 2      │ -> empty
│ ...           │
└───────────────┘

Resize triggers when buckets fill up:
Old Buckets -> New Larger Buckets
Rehash all items to new buckets
Myth Busters - 4 Common Misconceptions
Quick: Does HashSet preserve the order of items added? Commit to yes or no.
Common Belief:HashSet keeps items in the order you add them.
Tap to reveal reality
Reality:HashSet does NOT preserve insertion order; items are stored based on hash codes and internal structure.
Why it matters:Relying on order can cause bugs when iterating over a HashSet expecting the original sequence.
Quick: Can HashSet store multiple identical items if you add them repeatedly? Commit to yes or no.
Common Belief:HashSet allows duplicates if you add the same item multiple times.
Tap to reveal reality
Reality:HashSet ignores duplicate additions; only one copy of each unique item exists.
Why it matters:Expecting duplicates can lead to incorrect assumptions about collection size or contents.
Quick: Does HashSet use the object's memory address to check uniqueness? Commit to yes or no.
Common Belief:HashSet checks uniqueness by comparing memory addresses of objects.
Tap to reveal reality
Reality:HashSet uses the object's hash code and equality methods, not memory address, to determine uniqueness.
Why it matters:Custom objects need proper GetHashCode and Equals implementations to work correctly in a HashSet.
Quick: Is HashSet always faster than a List for all operations? Commit to yes or no.
Common Belief:HashSet is always faster than List for any operation.
Tap to reveal reality
Reality:HashSet is faster for membership tests and uniqueness but slower for ordered operations or small collections.
Why it matters:Choosing HashSet blindly can cause performance or logic issues if order or small size matters.
Expert Zone
1
HashSet's performance depends heavily on the quality of the hash function; poor hash functions cause many collisions and slow operations.
2
When using mutable objects as keys, changing their state after adding to HashSet can break uniqueness guarantees and cause hard-to-find bugs.
3
Pre-sizing a HashSet with the expected number of elements reduces costly resizing and improves performance in large data scenarios.
When NOT to use
Avoid HashSet when you need to preserve insertion order; use OrderedSet or List instead. Also, if you need key-value pairs, use Dictionary. For small collections where performance is not critical, a List with manual checks might be simpler.
Production Patterns
In real systems, HashSet is used for filtering duplicates from large data streams, managing unique user IDs, implementing fast lookups in caching layers, and performing set operations in algorithms like graph traversal or recommendation engines.
Connections
Dictionary
HashSet is like a Dictionary without values, storing only keys uniquely.
Understanding HashSet helps grasp how Dictionary manages keys and values efficiently.
Mathematical Set Theory
HashSet implements the concept of a mathematical set with unique elements and set operations.
Knowing set theory clarifies why operations like union and intersection behave as they do in HashSet.
Database Indexing
HashSet's hashing mechanism is similar to how database indexes quickly find records.
Recognizing this connection helps understand performance optimization in both programming and databases.
Common Pitfalls
#1Assuming HashSet preserves the order of added items.
Wrong approach:var set = new HashSet(); set.Add(3); set.Add(1); set.Add(2); foreach(var item in set) { Console.WriteLine(item); } // expects 3,1,2
Correct approach:var list = new List{3,1,2}; foreach(var item in list) { Console.WriteLine(item); } // preserves order
Root cause:Misunderstanding that HashSet is unordered and does not track insertion sequence.
#2Using mutable objects as HashSet elements and modifying them after insertion.
Wrong approach:class Person { public string Name; public override int GetHashCode() => Name.GetHashCode(); public override bool Equals(object obj) => ((Person)obj).Name == Name; } var set = new HashSet(); var p = new Person { Name = "Alice" }; set.Add(p); p.Name = "Bob"; // changes hash code bool contains = set.Contains(p); // returns false unexpectedly
Correct approach:Use immutable objects or avoid changing properties used in GetHashCode and Equals after adding to HashSet.
Root cause:Changing object state breaks the hash code and equality contract required by HashSet.
#3Not providing a custom comparer when needed, causing unexpected duplicates.
Wrong approach:var set = new HashSet(); set.Add("apple"); set.Add("APPLE"); // both added, duplicates ignored
Correct approach:var set = new HashSet(StringComparer.OrdinalIgnoreCase); set.Add("apple"); set.Add("APPLE"); // second ignored as duplicate
Root cause:Ignoring case sensitivity or custom equality needs leads to logical duplicates.
Key Takeaways
HashSet is a collection that automatically keeps only unique elements, preventing duplicates.
It uses hashing to quickly add, remove, and check items, making it faster than lists for uniqueness tasks.
HashSet does not preserve the order of items; it focuses on uniqueness and speed.
Custom equality comparers let you define what 'unique' means for your data types.
Understanding HashSet internals helps avoid common bugs and optimize performance in real applications.