Skip to main content

6.5 - Dictionaries and Hash Tables

Hash tables and dictionaries are data structures that store key-value pairs and provide efficient lookup, insertion, and deletion operations. They use a hash function to map keys to array indices, allowing for constant-time access in the average case. This section explores hash tables and dictionaries in detail, including their implementations, operations, and applications in C#.

6.5.1 - Key-Value Pair Storage with Dictionary<TKey, TValue>

6.5.1.1 - Dictionary Basics

A hash function is a function that maps data of arbitrary size to fixed-size values. In the context of hash tables, a hash function maps keys to array indices. A good hash function should:

  1. Be deterministic: The same key should always produce the same hash value.
  2. Be efficient to compute.
  3. Distribute keys uniformly across the array to minimize collisions.
  4. Generate a hash value within the range of array indices.

6.5.1.2 - Collisions

A collision occurs when two different keys hash to the same array index. There are several strategies to handle collisions:

  1. Separate Chaining: Each array element contains a linked list of key-value pairs that hash to the same index.
  2. Open Addressing: When a collision occurs, the algorithm searches for the next available slot in the array.
    • Linear Probing: Check the next slot, then the next, and so on.
    • Quadratic Probing: Check slots at quadratic intervals.
    • Double Hashing: Use a second hash function to determine the interval.

6.5.1.3 - Load Factor

The load factor of a hash table is the ratio of the number of stored elements to the size of the array. As the load factor increases, the probability of collisions also increases, which can degrade performance. Most hash table implementations resize the array when the load factor exceeds a certain threshold.

6.5.2 - Hashing Concepts and Collision Resolution

C# provides several hash table implementations:

6.5.2.1 - Hashtable (Non-Generic)

The Hashtable class is a non-generic collection that stores key-value pairs as objects:

using System.Collections;

// Creating a hashtable
Hashtable hashtable = new Hashtable();

// Adding key-value pairs
hashtable.Add("key1", "value1");
hashtable.Add("key2", "value2");
hashtable.Add(3, "value3"); // Different key type

// Accessing values
string value1 = (string)hashtable["key1"];

// Checking if a key exists
bool containsKey = hashtable.ContainsKey("key1");

// Removing a key-value pair
hashtable.Remove("key2");

// Getting the number of key-value pairs
int count = hashtable.Count;

// Iterating through the hashtable
foreach (DictionaryEntry entry in hashtable)
{
Console.WriteLine($"Key: {entry.Key}, Value: {entry.Value}");
}

// Clearing the hashtable
hashtable.Clear();

6.5.2.2 - Dictionary<TKey, TValue> (Generic)

The Dictionary<TKey, TValue> class is a generic collection that provides type safety:

using System.Collections.Generic;

// Creating a dictionary
Dictionary<string, int> dictionary = new Dictionary<string, int>();

// Adding key-value pairs
dictionary.Add("one", 1);
dictionary.Add("two", 2);
dictionary.Add("three", 3);

// Alternative way to add or update a key-value pair
dictionary["four"] = 4;

// Accessing values
int value = dictionary["one"];

// Trying to get a value safely
if (dictionary.TryGetValue("five", out int result))
{
Console.WriteLine($"Value: {result}");
}
else
{
Console.WriteLine("Key not found");
}

// Checking if a key exists
bool containsKey = dictionary.ContainsKey("one");

// Checking if a value exists
bool containsValue = dictionary.ContainsValue(2);

// Removing a key-value pair
dictionary.Remove("two");

// Getting the number of key-value pairs
int count = dictionary.Count;

// Iterating through the dictionary
foreach (KeyValuePair<string, int> kvp in dictionary)
{
Console.WriteLine($"Key: {kvp.Key}, Value: {kvp.Value}");
}

// Iterating through keys
foreach (string key in dictionary.Keys)
{
Console.WriteLine($"Key: {key}");
}

// Iterating through values
foreach (int value in dictionary.Values)
{
Console.WriteLine($"Value: {value}");
}

// Clearing the dictionary
dictionary.Clear();

6.5.2.3 - ConcurrentDictionary<TKey, TValue>

The ConcurrentDictionary<TKey, TValue> class is a thread-safe dictionary that can be used in multi-threaded applications:

using System.Collections.Concurrent;

// Creating a concurrent dictionary
ConcurrentDictionary<string, int> concurrentDictionary = new ConcurrentDictionary<string, int>();

// Adding key-value pairs
concurrentDictionary.TryAdd("one", 1);
concurrentDictionary.TryAdd("two", 2);

// Getting or adding a value
int value = concurrentDictionary.GetOrAdd("three", 3);

// Updating a value
concurrentDictionary.AddOrUpdate("one", 10, (key, oldValue) => oldValue + 10);

// Trying to remove a key-value pair
if (concurrentDictionary.TryRemove("two", out int removedValue))
{
Console.WriteLine($"Removed value: {removedValue}");
}

// Iterating through the concurrent dictionary
foreach (KeyValuePair<string, int> kvp in concurrentDictionary)
{
Console.WriteLine($"Key: {kvp.Key}, Value: {kvp.Value}");
}

6.5.3 - SortedDictionary and SortedList

C# provides sorted dictionary implementations that maintain keys in a sorted order:

// SortedDictionary example
SortedDictionary<string, int> sortedDict = new SortedDictionary<string, int>();
sortedDict.Add("zebra", 5);
sortedDict.Add("apple", 1);
sortedDict.Add("banana", 2);

// Keys are automatically sorted
foreach (var kvp in sortedDict)
{
Console.WriteLine($"{kvp.Key}: {kvp.Value}");
}
// Output:
// apple: 1
// banana: 2
// zebra: 5

// SortedList example
SortedList<string, int> sortedList = new SortedList<string, int>();
sortedList.Add("zebra", 5);
sortedList.Add("apple", 1);
sortedList.Add("banana", 2);

public CustomHashTable()
{
buckets = new Entry[DefaultCapacity];
count = 0;
}

public int Count => count;

public void Add(TKey key, TValue value)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

// Check if we need to resize
if ((float)count / buckets.Length >= LoadFactorThreshold)
Resize();

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];

// Check if the key already exists
while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
throw new ArgumentException("An element with the same key already exists");

entry = entry.Next;
}

// Add the new entry at the beginning of the bucket
Entry newEntry = new Entry(key, value);
newEntry.Next = buckets[bucketIndex];
buckets[bucketIndex] = newEntry;
count++;
}

public bool TryGetValue(TKey key, out TValue value)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];

while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
{
value = entry.Value;
return true;
}

entry = entry.Next;
}

value = default;
return false;
}

public bool Remove(TKey key)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];
Entry previous = null;

while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
{
if (previous == null)
buckets[bucketIndex] = entry.Next;
else
previous.Next = entry.Next;

count--;
return true;
}

previous = entry;
entry = entry.Next;
}

return false;
}

public bool ContainsKey(TKey key)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];

while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
return true;

entry = entry.Next;
}

return false;
}

public void Clear()
{
Array.Clear(buckets, 0, buckets.Length);
count = 0;
}

private int GetBucketIndex(TKey key)
{
// Get the hash code and ensure it's positive
int hashCode = key.GetHashCode() & 0x7FFFFFFF;
// Map the hash code to a bucket index
return hashCode % buckets.Length;
}

private void Resize()
{
Entry[] oldBuckets = buckets;
buckets = new Entry[oldBuckets.Length * 2];
count = 0;

// Rehash all entries
for (int i = 0; i < oldBuckets.Length; i++)
{
Entry entry = oldBuckets[i];
while (entry != null)
{
Add(entry.Key, entry.Value);
entry = entry.Next;
}
}
}
}

6.5.4 - Hash Table Operations

Here are the common operations performed on hash tables:

6.5.4.1 - Insertion

Insertion in a hash table involves computing the hash of the key, finding the corresponding bucket, and adding the key-value pair to that bucket:

public void Add(TKey key, TValue value)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

// Check if we need to resize
if ((float)count / buckets.Length >= LoadFactorThreshold)
Resize();

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];

// Check if the key already exists
while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
throw new ArgumentException("An element with the same key already exists");

entry = entry.Next;
}

// Add the new entry at the beginning of the bucket
Entry newEntry = new Entry(key, value);
newEntry.Next = buckets[bucketIndex];
buckets[bucketIndex] = newEntry;
count++;
}

6.5.4.2 - Lookup

Lookup in a hash table involves computing the hash of the key, finding the corresponding bucket, and searching for the key in that bucket:

public bool TryGetValue(TKey key, out TValue value)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];

while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
{
value = entry.Value;
return true;
}

entry = entry.Next;
}

value = default;
return false;
}

6.5.4.3 - Deletion

Deletion in a hash table involves computing the hash of the key, finding the corresponding bucket, and removing the key-value pair from that bucket:

public bool Remove(TKey key)
{
if (key == null)
throw new ArgumentNullException(nameof(key));

int bucketIndex = GetBucketIndex(key);
Entry entry = buckets[bucketIndex];
Entry previous = null;

while (entry != null)
{
if (EqualityComparer<TKey>.Default.Equals(entry.Key, key))
{
if (previous == null)
buckets[bucketIndex] = entry.Next;
else
previous.Next = entry.Next;

count--;
return true;
}

previous = entry;
entry = entry.Next;
}

return false;
}

6.5.5 - Hash Table Applications

Hash tables are used in various applications:

  1. Database Indexing: Hash tables are used to create indices for database tables.
  2. Caching: Hash tables are used to implement caches, such as DNS caches and web caches.
  3. Symbol Tables: Hash tables are used in compilers and interpreters to store variable names and their values.
  4. Associative Arrays: Hash tables are used to implement associative arrays, which map keys to values.
  5. Sets: Hash tables are used to implement sets, which store unique elements.

6.5.4 - Performance Characteristics

Different dictionary implementations have different performance characteristics:

6.5.4.1 - Dictionary<TKey, TValue>

Dictionary<TKey, TValue> offers the following performance characteristics:

  • Access: O(1) - Constant time on average
  • Search: O(1) - Constant time on average
  • Insertion: O(1) - Constant time on average
  • Deletion: O(1) - Constant time on average
  • Memory Usage: Moderate, as it uses an array of buckets and linked lists

These are average-case complexities. In the worst case (many collisions), the performance can degrade to O(n).

6.5.4.2 - SortedDictionary<TKey, TValue>

SortedDictionary<TKey, TValue> offers the following performance characteristics:

  • Access: O(log n) - Logarithmic time
  • Search: O(log n) - Logarithmic time
  • Insertion: O(log n) - Logarithmic time
  • Deletion: O(log n) - Logarithmic time
  • Memory Usage: Higher than SortedList, as it uses a self-balancing binary search tree

SortedDictionary<TKey, TValue> is optimized for frequent insertions and deletions.

6.5.4.3 - SortedList<TKey, TValue>

SortedList<TKey, TValue> offers the following performance characteristics:

  • Access: O(log n) - Logarithmic time
  • Search: O(log n) - Logarithmic time
  • Insertion: O(n) - Linear time (may require shifting elements)
  • Deletion: O(n) - Linear time (may require shifting elements)
  • Memory Usage: Lower than SortedDictionary, as it uses a sorted array

SortedList<TKey, TValue> is optimized for scenarios where insertions and deletions are infrequent but memory usage is a concern.

6.5.4.4 - ConcurrentDictionary<TKey, TValue>

ConcurrentDictionary<TKey, TValue> offers the following performance characteristics:

  • Access: O(1) - Constant time on average
  • Search: O(1) - Constant time on average
  • Insertion: O(1) - Constant time on average
  • Deletion: O(1) - Constant time on average
  • Thread Safety: Excellent, designed for concurrent access
  • Memory Usage: Higher than Dictionary<TKey, TValue> due to synchronization overhead

ConcurrentDictionary<TKey, TValue> is optimized for scenarios where multiple threads need to access and modify the dictionary simultaneously.

6.5.5 - Common Dictionary Operations

Here's a comparison of Dictionary<TKey, TValue> and Hashtable:

FeatureDictionary<TKey, TValue>Hashtable
Type SafetyYes (with generics)No (uses objects)
PerformanceBetter for value typesBetter for reference types
Thread SafetyNoNo
Null KeysNoYes
Memory UsageLower for value typesHigher due to boxing/unboxing
OrderingUnorderedUnordered

6.5.6 - Performance Considerations

When working with hash tables and dictionaries, consider the following performance aspects:

  1. Hash Function Quality: The quality of the hash function affects the distribution of keys and, consequently, the performance of the hash table.
  2. Load Factor: A high load factor increases the probability of collisions, which can degrade performance.
  3. Collision Resolution Strategy: The choice of collision resolution strategy affects the performance of the hash table, especially in the case of many collisions.
  4. Key Type: The choice of key type affects the performance of the hash table, as different types have different hash functions and equality comparisons.
  5. Memory Usage: Hash tables typically use more memory than other data structures due to the need to allocate an array of buckets.

6.5.7 - Best Practices

Here are some best practices for working with hash tables and dictionaries in C#:

  1. Choose the Right Collection: Use Dictionary<TKey, TValue> for most applications, as it provides type safety and good performance.
  2. Implement GetHashCode and Equals Correctly: If you use custom types as keys, ensure that they correctly implement GetHashCode and Equals.
  3. Consider Thread Safety: Use ConcurrentDictionary<TKey, TValue> for multi-threaded applications.
  4. Be Mindful of Memory Usage: Hash tables can use a significant amount of memory, especially with a low load factor.
  5. Use TryGetValue: Use TryGetValue instead of the indexer when you're not sure if a key exists, to avoid exceptions.
  6. Consider Ordering Requirements: If you need keys to be ordered, use SortedDictionary<TKey, TValue> or SortedList<TKey, TValue>.

In the next section, we'll explore trees, which are hierarchical data structures with various applications in computer science.