4 - Search Algorithms in Data Structures
While array-based searching algorithms like Linear Search and Binary Search are fundamental, many specialized data structures are designed specifically to optimize search operations. These data structures organize data in ways that enable more efficient searching, often achieving better time complexity than what's possible with simple arrays.
Search in a Binary Search Tree (BST)
A Binary Search Tree is a tree data structure where each node has at most two children (left and right), and for each node, all elements in the left subtree are less than the node's value, and all elements in the right subtree are greater.
How It Works
- Start at the root node.
- If the target equals the current node's value, return the node.
- If the target is less than the current node's value, search in the left subtree.
- If the target is greater than the current node's value, search in the right subtree.
- Repeat until the target is found or a leaf node is reached without finding the target.
Implementation
/// <summary>
/// Represents a binary search tree data structure that maintains a sorted collection of elements.
/// </summary>
/// <typeparam name="T">The type of elements in the tree, must implement IComparable.</typeparam>
public class BinarySearchTree<T> where T : IComparable<T>
{
/// <summary>
/// Represents a node in the binary search tree.
/// </summary>
private class Node
{
/// <summary>
/// Gets or sets the value stored in this node.
/// </summary>
public T Value { get; set; }
/// <summary>
/// Gets or sets the left child node (contains values less than this node's value).
/// </summary>
public Node Left { get; set; }
/// <summary>
/// Gets or sets the right child node (contains values greater than this node's value).
/// </summary>
public Node Right { get; set; }
/// <summary>
/// Initializes a new instance of the Node class with the specified value.
/// </summary>
/// <param name="value">The value to store in this node.</param>
public Node(T value)
{
Value = value;
Left = null;
Right = null;
}
}
/// <summary>
/// The root node of the binary search tree.
/// </summary>
private Node root;
/// <summary>
/// Initializes a new instance of the BinarySearchTree class.
/// </summary>
public BinarySearchTree()
{
root = null;
}
/// <summary>
/// Inserts a new value into the binary search tree.
/// </summary>
/// <param name="value">The value to insert.</param>
/// <remarks>
/// If the value already exists in the tree, this implementation ignores it
/// (no duplicates are stored).
/// </remarks>
public void Insert(T value)
{
// Validate input
if (value == null)
throw new ArgumentNullException(nameof(value), "Value cannot be null");
root = InsertRecursive(root, value);
}
/// <summary>
/// Recursively inserts a value into the binary search tree.
/// </summary>
/// <param name="node">The current node being examined.</param>
/// <param name="value">The value to insert.</param>
/// <returns>The updated node reference after insertion.</returns>
private Node InsertRecursive(Node node, T value)
{
// Base case: If we've reached a null node, create a new node with the value
if (node == null)
{
return new Node(value);
}
// Compare the value to the current node's value
int comparison = value.CompareTo(node.Value);
if (comparison < 0)
{
// Value is less than the current node's value, insert in the left subtree
node.Left = InsertRecursive(node.Left, value);
}
else if (comparison > 0)
{
// Value is greater than the current node's value, insert in the right subtree
node.Right = InsertRecursive(node.Right, value);
}
// If value equals node.Value (comparison == 0), we ignore it (no duplicates)
return node;
}
/// <summary>
/// Searches for a value in the binary search tree using a recursive approach.
/// </summary>
/// <param name="value">The value to search for.</param>
/// <returns>True if the value is found; otherwise, false.</returns>
public bool Search(T value)
{
// Validate input
if (value == null)
throw new ArgumentNullException(nameof(value), "Value cannot be null");
return SearchRecursive(root, value);
}
/// <summary>
/// Recursively searches for a value in the binary search tree.
/// </summary>
/// <param name="node">The current node being examined.</param>
/// <param name="value">The value to search for.</param>
/// <returns>True if the value is found; otherwise, false.</returns>
private bool SearchRecursive(Node node, T value)
{
// Base case: If we've reached a null node, the value is not in the tree
if (node == null)
{
return false;
}
// Compare the value to the current node's value
int comparison = value.CompareTo(node.Value);
if (comparison == 0)
{
return true; // Value found
}
if (comparison < 0)
{
// Value is less than the current node's value, search in the left subtree
return SearchRecursive(node.Left, value);
}
// Value is greater than the current node's value, search in the right subtree
return SearchRecursive(node.Right, value);
}
/// <summary>
/// Searches for a value in the binary search tree using an iterative approach.
/// </summary>
/// <param name="value">The value to search for.</param>
/// <returns>True if the value is found; otherwise, false.</returns>
/// <remarks>
/// This iterative implementation is more efficient in terms of space complexity
/// compared to the recursive approach, as it doesn't use the call stack.
/// </remarks>
public bool SearchIterative(T value)
{
// Validate input
if (value == null)
throw new ArgumentNullException(nameof(value), "Value cannot be null");
Node current = root;
// Traverse the tree until we find the value or reach a null node
while (current != null)
{
// Compare the value to the current node's value
int comparison = value.CompareTo(current.Value);
if (comparison == 0)
{
return true; // Value found
}
if (comparison < 0)
{
// Value is less than the current node's value, move to the left subtree
current = current.Left;
}
else
{
// Value is greater than the current node's value, move to the right subtree
current = current.Right;
}
}
return false; // Value not found
}
/// <summary>
/// Gets the number of nodes in the binary search tree.
/// </summary>
/// <returns>The number of nodes in the tree.</returns>
public int Count()
{
return CountNodes(root);
}
/// <summary>
/// Recursively counts the number of nodes in the tree.
/// </summary>
/// <param name="node">The current node being examined.</param>
/// <returns>The number of nodes in the subtree rooted at the given node.</returns>
private int CountNodes(Node node)
{
if (node == null)
return 0;
return 1 + CountNodes(node.Left) + CountNodes(node.Right);
}
}
Characteristics
- Time Complexity:
- Average Case: O(log n) for a balanced tree
- Worst Case: O(n) for a degenerate tree (essentially a linked list)
- Space Complexity:
- Recursive: O(h) where h is the height of the tree
- Iterative: O(1)
- Advantages:
- Efficient search, insert, and delete operations
- Maintains sorted order
- Can be used for range queries
- Disadvantages:
- Performance degrades if the tree becomes unbalanced
- More complex than array-based structures
- Requires more memory than arrays
Balanced Binary Search Trees
To address the potential O(n) worst-case performance of a standard BST, several balanced BST variants ensure that the tree remains approximately balanced:
- AVL Tree: Maintains strict balance by ensuring that the heights of the left and right subtrees differ by at most 1.
- Red-Black Tree: Maintains approximate balance using a set of properties and color-based rules.
- B-Tree: A generalization of BST that allows nodes to have more than two children, commonly used in databases and file systems.
These balanced variants guarantee O(log n) worst-case performance for search operations.
Search in a Hash Table
A Hash Table (or Hash Map) is a data structure that implements an associative array, mapping keys to values using a hash function to compute an index into an array of buckets.
How It Works
- Compute the hash code of the key.
- Map the hash code to an index in the array (typically using modulo).
- Check if the key exists at that index.
- If there's a collision (multiple keys map to the same index), resolve it using techniques like chaining or open addressing.
Implementation
/// <summary>
/// Represents a hash table data structure that maps keys to values using a hash function.
/// This implementation uses separate chaining for collision resolution.
/// </summary>
/// <typeparam name="TKey">The type of keys in the hash table.</typeparam>
/// <typeparam name="TValue">The type of values in the hash table.</typeparam>
public class HashTable<TKey, TValue>
{
/// <summary>
/// Represents an entry (key-value pair) in the hash table.
/// </summary>
private class Entry
{
/// <summary>
/// Gets or sets the key of this entry.
/// </summary>
public TKey Key { get; set; }
/// <summary>
/// Gets or sets the value associated with the key.
/// </summary>
public TValue Value { get; set; }
/// <summary>
/// Gets or sets the next entry in the linked list (for collision resolution).
/// </summary>
public Entry Next { get; set; }
/// <summary>
/// Initializes a new instance of the Entry class with the specified key and value.
/// </summary>
/// <param name="key">The key for this entry.</param>
/// <param name="value">The value associated with the key.</param>
public Entry(TKey key, TValue value)
{
Key = key;
Value = value;
Next = null;
}
}
/// <summary>
/// The array of buckets, each containing a linked list of entries.
/// </summary>
private Entry[] buckets;
/// <summary>
/// The current capacity of the hash table (number of buckets).
/// </summary>
private int capacity;
/// <summary>
/// The current number of key-value pairs in the hash table.
/// </summary>
private int count;
/// <summary>
/// The load factor threshold that triggers resizing (default: 0.75).
/// </summary>
private readonly double loadFactorThreshold;
/// <summary>
/// Initializes a new instance of the HashTable class with the specified initial capacity
/// and load factor threshold.
/// </summary>
/// <param name="capacity">The initial capacity (number of buckets).</param>
/// <param name="loadFactorThreshold">The load factor threshold that triggers resizing.</param>
/// <exception cref="ArgumentOutOfRangeException">
/// Thrown when capacity is less than 1 or loadFactorThreshold is not positive.
/// </exception>
public HashTable(int capacity = 16, double loadFactorThreshold = 0.75)
{
if (capacity < 1)
throw new ArgumentOutOfRangeException(nameof(capacity), "Capacity must be at least 1");
if (loadFactorThreshold <= 0 || loadFactorThreshold >= 1)
throw new ArgumentOutOfRangeException(nameof(loadFactorThreshold),
"Load factor threshold must be between 0 and 1 exclusive");
this.capacity = capacity;
this.loadFactorThreshold = loadFactorThreshold;
buckets = new Entry[capacity];
count = 0;
}
/// <summary>
/// Gets the current number of key-value pairs in the hash table.
/// </summary>
public int Count => count;
/// <summary>
/// Adds a key-value pair to the hash table or updates the value if the key already exists.
/// </summary>
/// <param name="key">The key to add or update.</param>
/// <param name="value">The value to associate with the key.</param>
/// <exception cref="ArgumentNullException">Thrown when the key is null.</exception>
public void Put(TKey key, TValue value)
{
if (key == null)
throw new ArgumentNullException(nameof(key), "Key cannot be null");
// Check if we need to resize the hash table
if ((double)count / capacity >= loadFactorThreshold)
{
Resize();
}
// Calculate the bucket index for this key
int index = GetIndex(key);
// Check if the key already exists in the bucket
Entry current = buckets[index];
while (current != null)
{
if (current.Key.Equals(key))
{
current.Value = value; // Update the existing value
return;
}
current = current.Next;
}
// Key doesn't exist, add a new entry at the beginning of the chain
// (This is O(1) compared to adding at the end which would be O(n))
Entry newEntry = new Entry(key, value);
newEntry.Next = buckets[index];
buckets[index] = newEntry;
count++;
}
/// <summary>
/// Attempts to get the value associated with the specified key.
/// </summary>
/// <param name="key">The key to locate.</param>
/// <param name="value">
/// When this method returns, contains the value associated with the specified key,
/// if the key is found; otherwise, the default value for the type of the value parameter.
/// </param>
/// <returns>True if the hash table contains an element with the specified key; otherwise, false.</returns>
/// <exception cref="ArgumentNullException">Thrown when the key is null.</exception>
public bool TryGetValue(TKey key, out TValue value)
{
if (key == null)
throw new ArgumentNullException(nameof(key), "Key cannot be null");
// Calculate the bucket index for this key
int index = GetIndex(key);
// Search for the key in the bucket's linked list
Entry current = buckets[index];
while (current != null)
{
if (current.Key.Equals(key))
{
value = current.Value;
return true;
}
current = current.Next;
}
// Key not found
value = default;
return false;
}
/// <summary>
/// Determines whether the hash table contains the specified key.
/// </summary>
/// <param name="key">The key to locate.</param>
/// <returns>True if the hash table contains an element with the specified key; otherwise, false.</returns>
/// <exception cref="ArgumentNullException">Thrown when the key is null.</exception>
public bool ContainsKey(TKey key)
{
if (key == null)
throw new ArgumentNullException(nameof(key), "Key cannot be null");
// Calculate the bucket index for this key
int index = GetIndex(key);
// Search for the key in the bucket's linked list
Entry current = buckets[index];
while (current != null)
{
if (current.Key.Equals(key))
{
return true;
}
current = current.Next;
}
return false;
}
/// <summary>
/// Removes the value with the specified key from the hash table.
/// </summary>
/// <param name="key">The key of the element to remove.</param>
/// <returns>True if the element is successfully found and removed; otherwise, false.</returns>
/// <exception cref="ArgumentNullException">Thrown when the key is null.</exception>
public bool Remove(TKey key)
{
if (key == null)
throw new ArgumentNullException(nameof(key), "Key cannot be null");
// Calculate the bucket index for this key
int index = GetIndex(key);
Entry previous = null;
Entry current = buckets[index];
// Search for the key in the bucket's linked list
while (current != null)
{
if (current.Key.Equals(key))
{
// Remove the entry from the linked list
if (previous == null)
{
// The entry is the first in the list
buckets[index] = current.Next;
}
else
{
// The entry is not the first in the list
previous.Next = current.Next;
}
count--;
return true;
}
previous = current;
current = current.Next;
}
return false; // Key not found
}
/// <summary>
/// Calculates the bucket index for a key.
/// </summary>
/// <param name="key">The key to calculate the index for.</param>
/// <returns>The bucket index for the key.</returns>
private int GetIndex(TKey key)
{
// Get the hash code of the key
int hashCode = key.GetHashCode();
// Ensure the hash code is positive
if (hashCode < 0)
hashCode = -hashCode;
// Map the hash code to a bucket index
return hashCode % capacity;
}
/// <summary>
/// Resizes the hash table by doubling its capacity and rehashing all entries.
/// </summary>
private void Resize()
{
int newCapacity = capacity * 2;
Entry[] newBuckets = new Entry[newCapacity];
// Rehash all entries into the new buckets array
for (int i = 0; i < capacity; i++)
{
Entry current = buckets[i];
while (current != null)
{
// Save the next entry before we modify current.Next
Entry next = current.Next;
// Calculate the new bucket index for this entry
int newIndex = current.Key.GetHashCode() % newCapacity;
if (newIndex < 0)
newIndex = -newIndex;
// Insert the entry at the beginning of the new bucket's linked list
current.Next = newBuckets[newIndex];
newBuckets[newIndex] = current;
// Move to the next entry in the original bucket
current = next;
}
}
// Update the hash table's buckets and capacity
buckets = newBuckets;
capacity = newCapacity;
}
/// <summary>
/// Clears all key-value pairs from the hash table.
/// </summary>
public void Clear()
{
// Reset all buckets to null
for (int i = 0; i < capacity; i++)
{
buckets[i] = null;
}
count = 0;
}
}
Collision Resolution Strategies
1. Chaining
In the implementation above, we used chaining, where each bucket contains a linked list of entries. When a collision occurs, the new entry is added to the list.
2. Open Addressing
Another approach is open addressing, where all entries are stored in the bucket array itself. When a collision occurs, we probe for the next available slot according to a probing sequence:
- Linear Probing: Check the next slot, then the next, and so on.
- Quadratic Probing: Check slots at quadratic intervals (1, 4, 9, ...).
- Double Hashing: Use a second hash function to determine the interval.
// Example of linear probing
private int FindSlot(TKey key, bool forInsertion)
{
int hashCode = key.GetHashCode();
if (hashCode < 0)
hashCode = -hashCode;
int index = hashCode % capacity;
int step = 0;
while (true)
{
int currentIndex = (index + step) % capacity;
if (buckets[currentIndex] == null)
{
return forInsertion ? currentIndex : -1;
}
if (buckets[currentIndex].Key.Equals(key))
{
return currentIndex;
}
step++;
if (step >= capacity)
{
return -1; // Table is full or key not found
}
}
}
Characteristics
- Time Complexity:
- Average Case: O(1) for search, insert, and delete
- Worst Case: O(n) when many collisions occur
- Space Complexity: O(n)
- Advantages:
- Very fast average-case performance
- Flexible key types (with appropriate hash functions)
- Constant-time operations regardless of data size
- Disadvantages:
- Performance degrades with high collision rates
- Doesn't maintain order
- Requires a good hash function
- May require resizing, which is expensive
When to Use
- When fast lookups by key are the primary operation
- When the data doesn't need to be ordered
- When the keys have a good hash function
- When memory usage is not a critical constraint
Search in a Trie
A Trie (also called a prefix tree) is a tree-like data structure used to store a dynamic set of strings, where the keys are usually strings. It's particularly efficient for prefix-based searches.
How It Works
- Each node in the trie represents a character in a string.
- The path from the root to a node forms a prefix of one or more strings.
- Leaf nodes or specially marked nodes indicate complete strings.
- To search for a string, follow the path corresponding to each character.
Implementation
/// <summary>
/// Represents a trie (prefix tree) data structure for efficient string operations.
/// </summary>
/// <remarks>
/// A trie is a tree-like data structure that stores a dynamic set of strings, where the keys
/// are usually strings. Unlike a binary search tree, no node in the tree stores the key
/// associated with that node; instead, its position in the tree defines the key with which
/// it is associated. All the descendants of a node have a common prefix of the string
/// associated with that node, and the root is associated with the empty string.
/// </remarks>
public class Trie
{
/// <summary>
/// Represents a node in the trie.
/// </summary>
private class TrieNode
{
/// <summary>
/// Gets or sets the dictionary of child nodes, keyed by character.
/// </summary>
public Dictionary<char, TrieNode> Children { get; set; }
/// <summary>
/// Gets or sets a value indicating whether this node represents the end of a word.
/// </summary>
public bool IsEndOfWord { get; set; }
/// <summary>
/// Initializes a new instance of the TrieNode class.
/// </summary>
public TrieNode()
{
Children = new Dictionary<char, TrieNode>();
IsEndOfWord = false;
}
}
/// <summary>
/// The root node of the trie.
/// </summary>
private readonly TrieNode root;
/// <summary>
/// Initializes a new instance of the Trie class.
/// </summary>
public Trie()
{
root = new TrieNode();
}
/// <summary>
/// Inserts a word into the trie.
/// </summary>
/// <param name="word">The word to insert.</param>
/// <exception cref="ArgumentNullException">Thrown when the word is null.</exception>
public void Insert(string word)
{
// Validate input
if (word == null)
throw new ArgumentNullException(nameof(word), "Word cannot be null");
TrieNode current = root;
// Traverse the trie for each character in the word
foreach (char c in word)
{
// If the current character doesn't exist in the children,
// create a new node for it
if (!current.Children.ContainsKey(c))
{
current.Children[c] = new TrieNode();
}
// Move to the child node corresponding to the current character
current = current.Children[c];
}
// Mark the end of the word
current.IsEndOfWord = true;
}
/// <summary>
/// Searches for a word in the trie.
/// </summary>
/// <param name="word">The word to search for.</param>
/// <returns>True if the word is found; otherwise, false.</returns>
/// <exception cref="ArgumentNullException">Thrown when the word is null.</exception>
public bool Search(string word)
{
// Validate input
if (word == null)
throw new ArgumentNullException(nameof(word), "Word cannot be null");
// Find the node corresponding to the last character of the word
TrieNode node = FindNode(word);
// The word exists if the node is found and it's marked as the end of a word
return node != null && node.IsEndOfWord;
}
/// <summary>
/// Determines whether the trie contains any words that start with the specified prefix.
/// </summary>
/// <param name="prefix">The prefix to search for.</param>
/// <returns>True if any word in the trie starts with the specified prefix; otherwise, false.</returns>
/// <exception cref="ArgumentNullException">Thrown when the prefix is null.</exception>
public bool StartsWith(string prefix)
{
// Validate input
if (prefix == null)
throw new ArgumentNullException(nameof(prefix), "Prefix cannot be null");
// The prefix exists if we can find a node corresponding to its last character
return FindNode(prefix) != null;
}
/// <summary>
/// Finds the node corresponding to the last character of a string.
/// </summary>
/// <param name="prefix">The string to find.</param>
/// <returns>The node corresponding to the last character if found; otherwise, null.</returns>
private TrieNode FindNode(string prefix)
{
TrieNode current = root;
// Traverse the trie for each character in the prefix
foreach (char c in prefix)
{
// If the current character doesn't exist in the children, the prefix doesn't exist
if (!current.Children.ContainsKey(c))
{
return null; // Prefix not found
}
// Move to the child node corresponding to the current character
current = current.Children[c];
}
// Return the node corresponding to the last character
return current;
}
/// <summary>
/// Gets all words in the trie with the specified prefix.
/// </summary>
/// <param name="prefix">The prefix to search for.</param>
/// <returns>A list of all words with the specified prefix.</returns>
/// <exception cref="ArgumentNullException">Thrown when the prefix is null.</exception>
public List<string> GetAllWordsWithPrefix(string prefix)
{
// Validate input
if (prefix == null)
throw new ArgumentNullException(nameof(prefix), "Prefix cannot be null");
List<string> results = new List<string>();
// Find the node corresponding to the prefix
TrieNode prefixNode = FindNode(prefix);
// If the prefix exists in the trie, collect all words starting with it
if (prefixNode != null)
{
CollectWords(prefixNode, prefix, results);
}
return results;
}
/// <summary>
/// Recursively collects all words starting from a node.
/// </summary>
/// <param name="node">The current node being examined.</param>
/// <param name="currentPrefix">The prefix accumulated so far.</param>
/// <param name="results">The list to store the collected words.</param>
private void CollectWords(TrieNode node, string currentPrefix, List<string> results)
{
// If this node is the end of a word, add the current prefix to the results
if (node.IsEndOfWord)
{
results.Add(currentPrefix);
}
// Recursively collect words from all child nodes
foreach (var entry in node.Children)
{
CollectWords(entry.Value, currentPrefix + entry.Key, results);
}
}
}
Characteristics
- Time Complexity:
- Search: O(m) where m is the length of the key
- Insert: O(m)
- Delete: O(m)
- Prefix Search: O(m + k) where k is the number of matches
- Space Complexity: O(n * m) where n is the number of keys and m is the average key length
- Advantages:
- Fast prefix-based searches
- Efficient for dictionary implementations
- Can find all words with a given prefix
- No hash collisions
- Disadvantages:
- Higher memory usage than hash tables
- Not as efficient for single key lookups as hash tables
- More complex implementation
When to Use
- Autocomplete features
- Spell checkers
- IP routing (longest prefix matching)
- Dictionary implementations
- When prefix-based operations are common
Comparison of Data Structure Search Algorithms
Data Structure | Average Search | Worst Search | Space | Ordered | Key Constraints |
---|---|---|---|---|---|
Binary Search Tree | O(log n) | O(n) | O(n) | Yes | Comparable |
Balanced BST (AVL, Red-Black) | O(log n) | O(log n) | O(n) | Yes | Comparable |
Hash Table | O(1) | O(n) | O(n) | No | Hashable |
Trie | O(m) | O(m) | O(n * m) | Lexicographic | Strings |
Where:
- n is the number of elements
- m is the length of the key (for tries)
Performance Characteristics
- Binary Search Tree: Good all-around performance for ordered data, but can degrade if unbalanced.
- Balanced BST: Guarantees logarithmic performance, good for ordered operations.
- Hash Table: Extremely fast lookups on average, but doesn't maintain order.
- Trie: Specialized for string operations, particularly prefix searches.
Practical Considerations
Implementation in C#
When implementing search in data structures in C#, consider:
-
Built-in Collections: .NET provides optimized implementations of many data structures:
Dictionary<TKey, TValue>
: Hash table implementationSortedDictionary<TKey, TValue>
: Balanced binary search treeSortedSet<T>
: Balanced binary search tree for sets
-
Custom Implementations: When built-in collections don't meet your needs:
- Consider performance characteristics for your specific use case
- Benchmark different implementations
- Balance complexity against performance requirements
Choosing the Right Data Structure
The best data structure depends on your specific requirements:
- Need ordered data and range queries: Use a balanced BST
- Need fastest possible lookups and order doesn't matter: Use a hash table
- Working with strings and need prefix operations: Use a trie
- Need both fast lookups and ordering: Consider a combination of data structures
Real-World Applications
Binary Search Tree
- Database indexing
- Priority queues
- Expression evaluation
- Hierarchical data representation
Hash Table
- Caches
- Symbol tables in compilers
- Database indexing for equality queries
- Implementing sets with fast membership testing
Trie
- Autocomplete systems
- Spell checkers
- IP routing tables
- Word games
Conclusion
Specialized data structures offer significant performance advantages for search operations compared to simple arrays. By understanding the characteristics and trade-offs of each data structure, you can select the most appropriate one for your specific requirements.
In practice, the choice of data structure often depends on factors like the type of searches you need to perform (exact match, range, prefix), the frequency of insertions and deletions, memory constraints, and whether maintaining order is important.
The .NET Framework provides efficient implementations of many of these data structures, but understanding their underlying principles helps you use them effectively and implement custom solutions when needed.