Skip to main content

6.8 - Sets

Sets are collections that contain no duplicate elements. They model the mathematical concept of a set and are particularly useful when you need to ensure uniqueness of elements or perform set operations like union, intersection, and difference. This section explores set implementations, operations, and applications in C#.

6.8.1 - Using HashSet<T> for Unique Collections

HashSet<T> is a collection that contains unique elements and provides high-performance set operations. It uses a hash table for storage, which allows for fast lookups, insertions, and deletions.

6.8.1.1 - Creating and Populating a HashSet

using System.Collections.Generic;

// Creating an empty HashSet
HashSet<int> numbers = new HashSet<int>();

// Adding elements
numbers.Add(1);
numbers.Add(2);
numbers.Add(3);

// Adding duplicate elements (will be ignored)
bool added = numbers.Add(1); // Returns false, as 1 is already in the set

// Creating a HashSet from an existing collection
int[] array = { 1, 2, 3, 4, 5, 3, 2, 1 };
HashSet<int> uniqueNumbers = new HashSet<int>(array); // Contains 1, 2, 3, 4, 5

// Creating a HashSet with a custom comparer
HashSet<string> caseInsensitiveSet = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
caseInsensitiveSet.Add("apple");
bool contains = caseInsensitiveSet.Contains("APPLE"); // Returns true

6.8.1.2 - Basic Operations

// Checking if an element exists
bool contains = numbers.Contains(2); // Returns true

// Removing an element
bool removed = numbers.Remove(2); // Returns true if 2 was in the set

// Getting the count of elements
int count = numbers.Count;

// Clearing all elements
numbers.Clear();

// Checking if the set is empty
bool isEmpty = numbers.Count == 0;

6.8.1.3 - Iterating Through a HashSet

// Using a foreach loop
foreach (int number in numbers)
{
Console.WriteLine(number);
}

// Using LINQ
var evenNumbers = numbers.Where(n => n % 2 == 0);

6.8.1.4 - Set Operations

HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4, 5 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5, 6, 7 };

// Union: Modifies set1 to contain all elements from both sets
set1.UnionWith(set2); // set1 now contains 1, 2, 3, 4, 5, 6, 7

// Intersection: Modifies set1 to contain only elements common to both sets
set1 = new HashSet<int> { 1, 2, 3, 4, 5 };
set1.IntersectWith(set2); // set1 now contains 3, 4, 5

// Difference: Modifies set1 to contain elements not in set2
set1 = new HashSet<int> { 1, 2, 3, 4, 5 };
set1.ExceptWith(set2); // set1 now contains 1, 2

// Symmetric Difference: Modifies set1 to contain elements in either set but not both
set1 = new HashSet<int> { 1, 2, 3, 4, 5 };
set1.SymmetricExceptWith(set2); // set1 now contains 1, 2, 6, 7

6.8.1.5 - Set Predicates

HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 1, 2, 3, 4, 5 };
HashSet<int> set3 = new HashSet<int> { 4, 5, 6 };

// IsSubsetOf: Checks if set1 is a subset of set2
bool isSubset = set1.IsSubsetOf(set2); // Returns true

// IsSupersetOf: Checks if set2 is a superset of set1
bool isSuperset = set2.IsSupersetOf(set1); // Returns true

// IsProperSubsetOf: Checks if set1 is a proper subset of set2
bool isProperSubset = set1.IsProperSubsetOf(set2); // Returns true

// IsProperSupersetOf: Checks if set2 is a proper superset of set1
bool isProperSuperset = set2.IsProperSupersetOf(set1); // Returns true

// Overlaps: Checks if set1 and set3 have at least one common element
bool overlaps = set1.Overlaps(set3); // Returns false

// SetEquals: Checks if set1 and set2 contain the same elements
bool setEquals = set1.SetEquals(set2); // Returns false

6.8.1.6 - Performance Characteristics

HashSet<T> provides the following average time complexities:

  • Add: O(1)
  • Remove: O(1)
  • Contains: O(1)
  • Count: O(1)
  • Union, Intersection, Difference: O(n), where n is the number of elements in the other set

These operations are fast because HashSet<T> uses a hash table internally. However, the actual performance depends on the quality of the hash function for the type T and the number of collisions.

6.8.2 - SortedSet<T> for Ordered Unique Collections

SortedSet<T> is a collection that contains unique elements and maintains them in sorted order. It uses a balanced binary search tree (specifically, a red-black tree) for storage, which allows for efficient insertions, deletions, and lookups while maintaining the elements in sorted order.

6.8.2.1 - Creating and Populating a SortedSet

using System.Collections.Generic;

// Creating an empty SortedSet
SortedSet<int> numbers = new SortedSet<int>();

// Adding elements
numbers.Add(5);
numbers.Add(3);
numbers.Add(1);
numbers.Add(4);
numbers.Add(2);

// Elements are automatically sorted
// numbers now contains 1, 2, 3, 4, 5 in that order

// Creating a SortedSet from an existing collection
int[] array = { 5, 3, 1, 4, 2 };
SortedSet<int> sortedNumbers = new SortedSet<int>(array);

// Creating a SortedSet with a custom comparer
SortedSet<string> reverseSortedStrings = new SortedSet<string>(Comparer<string>.Create((a, b) => b.CompareTo(a)));
reverseSortedStrings.Add("apple");
reverseSortedStrings.Add("banana");
reverseSortedStrings.Add("cherry");
// reverseSortedStrings now contains "cherry", "banana", "apple" in that order

6.8.2.2 - Basic Operations

// Checking if an element exists
bool contains = numbers.Contains(3); // Returns true

// Removing an element
bool removed = numbers.Remove(3); // Returns true if 3 was in the set

// Getting the count of elements
int count = numbers.Count;

// Clearing all elements
numbers.Clear();

// Checking if the set is empty
bool isEmpty = numbers.Count == 0;

6.8.2.3 - Range Operations

SortedSet<T> provides efficient range operations:

SortedSet<int> numbers = new SortedSet<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

// Getting a view of a range
SortedSet<int> subset = numbers.GetViewBetween(3, 7);
// subset now contains 3, 4, 5, 6, 7

// Removing a range
numbers.RemoveWhere(n => n >= 8); // Removes 8, 9, 10

// Finding the minimum and maximum elements
int min = numbers.Min; // Returns 1
int max = numbers.Max; // Returns 7

6.8.2.4 - Set Operations

SortedSet<T> supports the same set operations as HashSet<T>:

SortedSet<int> set1 = new SortedSet<int> { 1, 2, 3, 4, 5 };
SortedSet<int> set2 = new SortedSet<int> { 3, 4, 5, 6, 7 };

// Union: Modifies set1 to contain all elements from both sets
set1.UnionWith(set2); // set1 now contains 1, 2, 3, 4, 5, 6, 7

// Intersection: Modifies set1 to contain only elements common to both sets
set1 = new SortedSet<int> { 1, 2, 3, 4, 5 };
set1.IntersectWith(set2); // set1 now contains 3, 4, 5

// Difference: Modifies set1 to contain elements not in set2
set1 = new SortedSet<int> { 1, 2, 3, 4, 5 };
set1.ExceptWith(set2); // set1 now contains 1, 2

// Symmetric Difference: Modifies set1 to contain elements in either set but not both
set1 = new SortedSet<int> { 1, 2, 3, 4, 5 };
set1.SymmetricExceptWith(set2); // set1 now contains 1, 2, 6, 7

6.8.2.5 - Performance Characteristics

SortedSet<T> provides the following average time complexities:

  • Add: O(log n)
  • Remove: O(log n)
  • Contains: O(log n)
  • Count: O(1)
  • Min, Max: O(1)
  • GetViewBetween: O(log n)
  • Union, Intersection, Difference: O(n log n), where n is the number of elements in the other set

These operations are slower than HashSet<T> but provide the benefit of maintaining elements in sorted order.

6.8.3 - Set Operations (Union, Intersection, Difference)

Set operations are fundamental to working with sets. C# provides methods for performing these operations on both HashSet<T> and SortedSet<T>.

6.8.3.1 - Union

The union of two sets A and B is the set of all elements that are in either A or B or both.

HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };

// Method 1: Using UnionWith (modifies set1)
set1.UnionWith(set2);
// set1 now contains 1, 2, 3, 4, 5

// Method 2: Using LINQ (creates a new set)
HashSet<int> union = new HashSet<int>(set1.Union(set2));

6.8.3.2 - Intersection

The intersection of two sets A and B is the set of all elements that are in both A and B.

HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5, 6 };

// Method 1: Using IntersectWith (modifies set1)
set1.IntersectWith(set2);
// set1 now contains 3, 4

// Method 2: Using LINQ (creates a new set)
HashSet<int> intersection = new HashSet<int>(set1.Intersect(set2));

6.8.3.3 - Difference

The difference of two sets A and B (A - B) is the set of all elements that are in A but not in B.

HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5, 6 };

// Method 1: Using ExceptWith (modifies set1)
set1.ExceptWith(set2);
// set1 now contains 1, 2

// Method 2: Using LINQ (creates a new set)
HashSet<int> difference = new HashSet<int>(set1.Except(set2));

6.8.3.4 - Symmetric Difference

The symmetric difference of two sets A and B is the set of all elements that are in either A or B but not in both.

HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5, 6 };

// Method 1: Using SymmetricExceptWith (modifies set1)
set1.SymmetricExceptWith(set2);
// set1 now contains 1, 2, 5, 6

// Method 2: Using LINQ (creates a new set)
HashSet<int> symmetricDifference = new HashSet<int>(set1.Except(set2).Union(set2.Except(set1)));

6.8.3.5 - Set Predicates

Set predicates allow you to check relationships between sets:

HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 1, 2, 3, 4, 5 };
HashSet<int> set3 = new HashSet<int> { 1, 2, 3 };

// IsSubsetOf: Checks if set1 is a subset of set2
bool isSubset = set1.IsSubsetOf(set2); // Returns true

// IsSupersetOf: Checks if set2 is a superset of set1
bool isSuperset = set2.IsSupersetOf(set1); // Returns true

// IsProperSubsetOf: Checks if set1 is a proper subset of set2
bool isProperSubset = set1.IsProperSubsetOf(set2); // Returns true

// IsProperSupersetOf: Checks if set2 is a proper superset of set1
bool isProperSuperset = set2.IsProperSupersetOf(set1); // Returns true

// SetEquals: Checks if set1 and set3 contain the same elements
bool setEquals = set1.SetEquals(set3); // Returns true

6.8.3.6 - Implementing Custom Set Operations

You can implement custom set operations using LINQ:

// Cartesian product of two sets
static IEnumerable<(T, U)> CartesianProduct<T, U>(IEnumerable<T> set1, IEnumerable<U> set2)
{
return from item1 in set1
from item2 in set2
select (item1, item2);
}

// Power set (set of all subsets)
static IEnumerable<IEnumerable<T>> PowerSet<T>(IEnumerable<T> set)
{
List<T> list = set.ToList();
int n = list.Count;
int powerSetCount = 1 << n; // 2^n

for (int i = 0; i < powerSetCount; i++)
{
List<T> subset = new List<T>();
for (int j = 0; j < n; j++)
{
if ((i & (1 << j)) > 0)
subset.Add(list[j]);
}
yield return subset;
}
}

// Usage
HashSet<int> set1 = new HashSet<int> { 1, 2 };
HashSet<string> set2 = new HashSet<string> { "a", "b" };

var product = CartesianProduct(set1, set2).ToList();
// product contains (1, "a"), (1, "b"), (2, "a"), (2, "b")

var powerSet = PowerSet(set1).ToList();
// powerSet contains {}, {1}, {2}, {1, 2}

6.8.4 - Performance Considerations

When working with sets, consider the following performance aspects:

6.8.4.1 - HashSet<T> vs. SortedSet<T>

OperationHashSet<T>SortedSet<T>
AddO(1)O(log n)
RemoveO(1)O(log n)
ContainsO(1)O(log n)
Min/MaxO(n)O(1)
IterationUnorderedOrdered

Choose HashSet&lt;T&gt; when:

  • You need fast lookups, insertions, and deletions
  • You don't care about the order of elements
  • You need to check for membership frequently

Choose SortedSet&lt;T&gt; when:

  • You need elements to be in sorted order
  • You need to perform range operations
  • You need to find the minimum or maximum element quickly

6.8.4.2 - Hash Function Quality

The performance of HashSet&lt;T&gt; depends on the quality of the hash function for type T. A good hash function should:

  1. Distribute elements uniformly across the hash table
  2. Be fast to compute
  3. Generate the same hash code for equal objects
  4. Generate different hash codes for unequal objects (ideally)

If you're using custom types with HashSet<T>, ensure that you override GetHashCode() and Equals() appropriately:

public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }

public override bool Equals(object obj)
{
if (obj is not Person other)
return false;

return FirstName == other.FirstName && LastName == other.LastName;
}

public override int GetHashCode()
{
return HashCode.Combine(FirstName, LastName);
}
}

### 6.4.3 - Set Operations Performance

Set operations like union, intersection, and difference can be expensive, especially for large sets. Consider the following:

- `UnionWith`, `IntersectWith`, `ExceptWith`, and `SymmetricExceptWith` modify the set in place, which is more efficient than creating a new set.
- LINQ methods like `Union`, `Intersect`, and `Except` create new collections, which can be less efficient for large sets.
- For `SortedSet<T>`, set operations are slower than for `HashSet<T>` due to the need to maintain the sorted order.

### 6.6.4.4 - Memory Usage

Sets can use a significant amount of memory, especially `HashSet<T>` with its hash table. Consider the following:

- `HashSet<T>` typically uses more memory than a simple list or array due to the hash table structure.
- `SortedSet<T>` uses memory for the balanced binary search tree structure.
- Both `HashSet<T>` and `SortedSet<T>` store only unique elements, which can save memory compared to storing duplicates.

### 6.6.4.5 - Capacity and Load Factor

`HashSet<T>` has a capacity and a load factor that affect its performance:

- The capacity is the number of buckets in the hash table.
- The load factor is the ratio of the number of elements to the capacity.
- When the load factor exceeds a threshold (typically 0.7), the hash table is resized, which is an expensive operation.

You can specify the initial capacity when creating a `HashSet<T>` to avoid resizing:

```csharp
// Create a HashSet with an initial capacity of 1000
HashSet<int> largeSet = new HashSet<int>(1000);

6.6.5 - Set Applications

Sets are used in various applications:

  1. Removing Duplicates: Sets automatically remove duplicates from a collection.
  2. Membership Testing: Sets provide efficient membership testing.
  3. Mathematical Set Operations: Sets support union, intersection, difference, and other set operations.
  4. Unique Constraint: Sets enforce uniqueness of elements.
  5. Graph Algorithms: Sets are used in graph algorithms like breadth-first search and depth-first search.
  6. Database Operations: Sets are used in database operations like joins and set operations.
  7. Spell Checking: Sets are used in spell checking algorithms.
  8. Network Packet Filtering: Sets are used in network packet filtering.

6.6.6 - Best Practices

Here are some best practices for working with sets in C#:

  1. Choose the Right Set Type: Use HashSet<T> for fast lookups and SortedSet<T> for ordered elements.
  2. Override GetHashCode and Equals: If you're using custom types with HashSet<T>, ensure that you override GetHashCode() and Equals() appropriately.
  3. Use Set Operations: Use set operations like union, intersection, and difference when appropriate.
  4. Consider Performance: Be mindful of the performance characteristics of different set operations.
  5. Avoid Unnecessary Conversions: Avoid unnecessary conversions between different collection types.
  6. Use LINQ with Care: Use LINQ methods with care, as they create new collections.
  7. Initialize with Capacity: Initialize HashSet<T> with an appropriate capacity when you know the approximate number of elements.
  8. Use ReadOnlyCollection: Use ReadOnlyCollection<T> to expose a read-only view of a set when appropriate.

In the next section, we'll explore advanced data structures that combine the concepts we've learned so far to solve specific problems efficiently.