2 - Pattern Matching Algorithms

Pattern matching algorithms are used to find occurrences of a pattern within a text. These algorithms are fundamental in text processing, search engines, bioinformatics, and many other applications.

Naive String Matching

The naive approach to string matching involves checking for the pattern at each possible position in the text by comparing characters one by one.

How Naive String Matching Works

Slide the pattern over the text one by one.
For each position, compare the pattern with the text character by character.
If all characters match, record the position as a match.
If any character doesn't match, move to the next position.

Implementation

/// <summary>
/// Implements the naive string matching algorithm to find all occurrences of a pattern in a text.
/// </summary>
/// <param name="text">The text to search within.</param>
/// <param name="pattern">The pattern to search for.</param>
/// <returns>A list of starting positions where the pattern was found in the text.</returns>
public static List<int> NaiveStringMatch(string text, string pattern)
{
    // Validate input parameters
    if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
    {
        return new List<int>();
    }
    
    List<int> matches = new List<int>();
    int n = text.Length;
    int m = pattern.Length;
    
    // Slide the pattern over the text one position at a time
    for (int i = 0; i <= n - m; i++)
    {
        int j;
        
        // Check for pattern match at current position by comparing each character
        for (j = 0; j < m; j++)
        {
            // If any character doesn't match, stop checking this position
            if (text[i + j] != pattern[j])
            {
                break;
            }
        }
        
        // If j reached the pattern length, we found a complete match
        if (j == m)
        {
            matches.Add(i);
        }
    }
    
    return matches;
}

Time and Space Complexity

Time Complexity: O(n × m) in the worst case, where n is the length of the text and m is the length of the pattern.
Space Complexity: O(1) if we only need to know if the pattern exists, O(n) if we need to store all match positions.

Applications

Simple Text Searches: When the pattern and text are small.
Educational Purposes: To understand the basics of string matching.
Baseline for Comparison: To compare with more advanced algorithms.

Knuth-Morris-Pratt (KMP) Algorithm

The KMP algorithm improves upon the naive approach by avoiding redundant comparisons using a prefix function that captures the pattern's structure.

How KMP Works

Preprocess the pattern to build a prefix function (or failure function).
Use this function to skip characters that are known to match based on previous comparisons.
When a mismatch occurs, use the prefix function to determine how far to shift the pattern.

Implementation

/// <summary>
/// Implements the Knuth-Morris-Pratt (KMP) string matching algorithm to efficiently find all occurrences 
/// of a pattern in a text by avoiding redundant comparisons.
/// </summary>
/// <param name="text">The text to search within.</param>
/// <param name="pattern">The pattern to search for.</param>
/// <returns>A list of starting positions where the pattern was found in the text.</returns>
public static List<int> KMPSearch(string text, string pattern)
{
    List<int> matches = new List<int>();
    
    // Validate input parameters
    if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
    {
        return matches;
    }
    
    int n = text.Length;
    int m = pattern.Length;
    
    // If pattern is empty or longer than text
    if (m == 0 || m > n)
    {
        return matches;
    }
    
    // Preprocess the pattern to build the LPS (Longest Proper Prefix which is also Suffix) array
    // This array helps us skip characters we know will match based on previously matched characters
    int[] lps = ComputeLPSArray(pattern);
    
    int i = 0; // Index for text
    int j = 0; // Index for pattern
    
    while (i < n)
    {
        // Current characters match - advance both pointers
        if (pattern[j] == text[i])
        {
            i++;
            j++;
        }
        
        // Pattern completely found - record the match and continue searching
        if (j == m)
        {
            // The match starts at (i - j)
            matches.Add(i - j);
            
            // Use the LPS array to determine where to continue matching
            // This is key to the KMP algorithm's efficiency
            j = lps[j - 1];
        }
        // Mismatch after j matches
        else if (i < n && pattern[j] != text[i])
        {
            // If we've matched some characters, use the LPS array to skip redundant comparisons
            // This is where KMP gains its efficiency over naive matching
            if (j != 0)
            {
                j = lps[j - 1];
                // Note: we don't increment i here because we need to check the current text character
                // against the new pattern position determined by the LPS array
            }
            else
            {
                // If we're at the beginning of the pattern, just move to the next text character
                i++;
            }
        }
    }
    
    return matches;
}

/// <summary>
/// Computes the Longest Proper Prefix which is also Suffix (LPS) array for the KMP algorithm.
/// This array helps determine how far to shift the pattern when a mismatch occurs.
/// </summary>
/// <param name="pattern">The pattern for which to compute the LPS array.</param>
/// <returns>An array where lps[i] contains the length of the longest proper prefix of pattern[0..i] 
/// which is also a suffix of pattern[0..i].</returns>
private static int[] ComputeLPSArray(string pattern)
{
    int m = pattern.Length;
    int[] lps = new int[m];
    
    // Length of the previous longest prefix suffix
    int len = 0;
    
    // The first entry in lps is always 0 (a single character has no proper prefix that is also a suffix)
    lps[0] = 0;
    int i = 1;
    
    // Calculate lps[i] for i = 1 to m-1
    while (i < m)
    {
        // If characters match, extend the prefix-suffix
        if (pattern[i] == pattern[len])
        {
            len++;
            lps[i] = len;
            i++;
        }
        else
        {
            // Characters don't match
            if (len != 0)
            {
                // Look for a shorter prefix that is also a suffix
                // This is a key insight: we don't need to start from scratch
                len = lps[len - 1];
                // Note: we don't increment i here because we need to recheck the current character
            }
            else
            {
                // No prefix matches, so lps[i] is 0
                lps[i] = 0;
                i++;
            }
        }
    }
    
    return lps;
}

Time and Space Complexity

Time Complexity: O(n + m) where n is the length of the text and m is the length of the pattern.
Space Complexity: O(m) for the LPS array.

Applications

Text Editors: For efficient search and replace operations.
Bioinformatics: Finding DNA patterns.
Network Security: Intrusion detection systems that look for specific patterns in network traffic.

Boyer-Moore Algorithm

The Boyer-Moore algorithm is one of the most efficient string matching algorithms, especially for large alphabets. It uses two heuristics: the bad character rule and the good suffix rule.

How Boyer-Moore Works

Preprocess the pattern to build the bad character and good suffix tables.
Start matching from the end of the pattern rather than the beginning.
When a mismatch occurs, use the bad character and good suffix rules to determine how far to shift the pattern.
The algorithm can skip portions of the text, making it very efficient for long patterns.

Implementation

/// <summary>
/// Implements the Boyer-Moore string matching algorithm, which is efficient for large alphabets
/// by using the bad character rule to skip portions of the text.
/// </summary>
/// <param name="text">The text to search within.</param>
/// <param name="pattern">The pattern to search for.</param>
/// <returns>A list of starting positions where the pattern was found in the text.</returns>
public static List<int> BoyerMooreSearch(string text, string pattern)
{
    List<int> matches = new List<int>();
    
    // Validate input parameters
    if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
    {
        return matches;
    }
    
    int n = text.Length;
    int m = pattern.Length;
    
    // Handle edge cases
    if (m == 0 || m > n)
    {
        return matches;
    }
    
    // Preprocess the pattern to build the bad character table
    // This table helps determine how far to shift the pattern when a mismatch occurs
    int[] badChar = PreprocessBadCharacter(pattern);
    
    // s is the shift of the pattern with respect to text
    int s = 0;
    
    // Continue until the pattern can still fit in the remaining text
    while (s <= (n - m))
    {
        int j = m - 1;
        
        // Start matching from the end of the pattern
        // This is a key feature of Boyer-Moore: matching from right to left
        while (j >= 0 && pattern[j] == text[s + j])
        {
            j--;
        }
        
        // If j became negative, the pattern was found (all characters matched)
        if (j < 0)
        {
            // Record the match
            matches.Add(s);
            
            // Shift the pattern to align with the next potential match
            // If we're not at the end of the text, use the bad character rule
            if (s + m < n)
            {
                // Align the pattern so the next character in text matches its last occurrence in pattern
                s += m - badChar[text[s + m]];
            }
            else
            {
                // We're at the end of the text, so just move one position
                s += 1;
            }
        }
        else
        {
            // A mismatch occurred at position j
            // Use the bad character rule to determine how far to shift the pattern
            // The shift should align the mismatched character in text with its rightmost occurrence in pattern
            int badCharShift = j - badChar[text[s + j]];
            
            // Ensure we always move forward at least one position to avoid infinite loops
            s += Math.Max(1, badCharShift);
        }
    }
    
    return matches;
}

/// <summary>
/// Preprocesses the pattern to create the bad character table for the Boyer-Moore algorithm.
/// This table stores the rightmost occurrence of each character in the pattern.
/// </summary>
/// <param name="pattern">The pattern to preprocess.</param>
/// <returns>An array where badChar[c] is the index of the rightmost occurrence of character c in the pattern,
/// or -1 if the character doesn't appear in the pattern.</returns>
private static int[] PreprocessBadCharacter(string pattern)
{
    int m = pattern.Length;
    
    // Create an array for all possible characters (assuming ASCII)
    // For Unicode, a Dictionary<char, int> would be more appropriate
    int[] badChar = new int[256];
    
    // Initialize all occurrences as -1 (character not found in pattern)
    for (int i = 0; i < 256; i++)
    {
        badChar[i] = -1;
    }
    
    // Fill the actual value of the rightmost occurrence of each character
    // If a character appears multiple times, the last occurrence is recorded
    for (int i = 0; i < m; i++)
    {
        badChar[pattern[i]] = i;
    }
    
    return badChar;
}

Time and Space Complexity

Time Complexity: O(n × m) in the worst case, but O(n/m) in the best case. The average case is much better than the naive algorithm.
Space Complexity: O(k) where k is the size of the alphabet (256 for ASCII).

Applications

Text Editors: For efficient search operations, especially with large texts.
Spell Checkers: For finding similar words.
Bioinformatics: For DNA sequence alignment.

Rabin-Karp Algorithm

The Rabin-Karp algorithm uses hashing to find patterns in text. It calculates a hash value for the pattern and for each possible substring of the text, then compares only when the hash values match.

How Rabin-Karp Works

Calculate the hash value of the pattern.
Calculate the hash value of each m-length substring of the text.
If the hash values match, verify that the substring actually matches the pattern (to handle hash collisions).
Use a rolling hash function to efficiently calculate the hash of the next substring.

Implementation

/// <summary>
/// Implements the Rabin-Karp string matching algorithm, which uses hashing to find patterns in text.
/// This algorithm is particularly useful for multiple pattern matching.
/// </summary>
/// <param name="text">The text to search within.</param>
/// <param name="pattern">The pattern to search for.</param>
/// <returns>A list of starting positions where the pattern was found in the text.</returns>
public static List<int> RabinKarpSearch(string text, string pattern)
{
    List<int> matches = new List<int>();
    
    // Validate input parameters
    if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
    {
        return matches;
    }
    
    int n = text.Length;
    int m = pattern.Length;
    
    // Handle edge cases
    if (m == 0 || m > n)
    {
        return matches;
    }
    
    // Define constants for the hash function
    const int d = 256; // Number of characters in the input alphabet (assuming ASCII)
    const int q = 101; // A prime number for the hash function (larger primes reduce collisions)
    
    // Calculate the value of h = d^(m-1) % q
    // This value is used in the rolling hash calculation
    int h = 1;
    for (int i = 0; i < m - 1; i++)
    {
        h = (h * d) % q;
    }
    
    // Calculate the initial hash values for the pattern and the first window of text
    int p = 0; // Hash value for pattern
    int t = 0; // Hash value for current text window
    
    for (int i = 0; i < m; i++)
    {
        // The hash function treats strings as base-d numbers
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }
    
    // Slide the pattern over the text one position at a time
    for (int i = 0; i <= n - m; i++)
    {
        // If hash values match, verify character by character
        // This step is necessary because different strings can have the same hash value (collision)
        if (p == t)
        {
            // Verify that the substring actually matches the pattern
            bool match = true;
            for (int j = 0; j < m; j++)
            {
                if (text[i + j] != pattern[j])
                {
                    match = false;
                    break;
                }
            }
            
            // If all characters match, record the position
            if (match)
            {
                matches.Add(i);
            }
        }
        
        // Calculate the hash value for the next window of text
        // This is the key to the efficiency of Rabin-Karp: the rolling hash
        if (i < n - m)
        {
            // Remove the contribution of the leftmost character of the current window
            // Add the contribution of the next character to the right
            // The formula is: t = (d * (t - text[i] * h) + text[i + m]) % q
            
            t = (d * (t - text[i] * h) + text[i + m]) % q;
            
            // We might get a negative value for t, so convert it to positive
            if (t < 0)
            {
                t = (t + q);
            }
        }
    }
    
    return matches;
}

Time and Space Complexity

Time Complexity: O(n × m) in the worst case (when all hash values match but strings don't), but O(n + m) on average.
Space Complexity: O(1) for the hash values.

Applications

Plagiarism Detection: Finding duplicate content.
Document Similarity: Identifying similar documents.
Multiple Pattern Matching: Can be extended to search for multiple patterns simultaneously.

Aho-Corasick Algorithm

The Aho-Corasick algorithm efficiently finds all occurrences of a set of patterns in a text. It builds a finite state machine from the patterns and then processes the text in a single pass.

How Aho-Corasick Works

Build a trie (prefix tree) from the set of patterns.
Add failure links to the trie to handle cases where a partial match fails.
Add output links to identify when a pattern has been matched.
Process the text character by character, following the trie and reporting matches.

Implementation

/// <summary>
/// Implements the Aho-Corasick algorithm for efficient multiple pattern matching.
/// This algorithm builds a finite state machine from the patterns and processes the text in a single pass.
/// </summary>
public class AhoCorasick
{
    /// <summary>
    /// Represents a node in the Aho-Corasick trie (prefix tree).
    /// </summary>
    private class TrieNode
    {
        /// <summary>
        /// Maps characters to child nodes in the trie.
        /// </summary>
        public Dictionary<char, TrieNode> Children { get; set; }
        
        /// <summary>
        /// The failure link points to the longest proper suffix of the current node that is also a prefix of some pattern.
        /// </summary>
        public TrieNode Failure { get; set; }
        
        /// <summary>
        /// Indices of patterns that end at this node.
        /// </summary>
        public List<int> PatternIndices { get; set; }
        
        /// <summary>
        /// The character value associated with this node.
        /// </summary>
        public char Value { get; set; }
        
        /// <summary>
        /// Initializes a new instance of the TrieNode class.
        /// </summary>
        /// <param name="value">The character value for this node.</param>
        public TrieNode(char value)
        {
            Value = value;
            Children = new Dictionary<char, TrieNode>();
            PatternIndices = new List<int>();
            Failure = null;
        }
    }
    
    private readonly TrieNode root;
    private readonly string[] patterns;
    
    /// <summary>
    /// Initializes a new instance of the AhoCorasick class with the specified patterns.
    /// </summary>
    /// <param name="patterns">An array of patterns to search for.</param>
    public AhoCorasick(string[] patterns)
    {
        // Validate input
        if (patterns == null)
        {
            throw new ArgumentNullException(nameof(patterns));
        }
        
        this.patterns = patterns;
        root = new TrieNode(' ');
        
        // Build the trie and failure links
        BuildTrie();
        BuildFailureLinks();
    }
    
    /// <summary>
    /// Builds a trie (prefix tree) from the patterns.
    /// </summary>
    private void BuildTrie()
    {
        // Add each pattern to the trie
        for (int i = 0; i < patterns.Length; i++)
        {
            string pattern = patterns[i];
            
            // Skip empty patterns
            if (string.IsNullOrEmpty(pattern))
            {
                continue;
            }
            
            TrieNode current = root;
            
            // Add each character of the pattern to the trie
            foreach (char c in pattern)
            {
                // Create a new node if the character doesn't exist
                if (!current.Children.ContainsKey(c))
                {
                    current.Children[c] = new TrieNode(c);
                }
                
                // Move to the next node
                current = current.Children[c];
            }
            
            // Mark the end of the pattern by storing its index
            current.PatternIndices.Add(i);
        }
    }
    
    /// <summary>
    /// Builds failure links for the trie using breadth-first search.
    /// Failure links are used to efficiently transition to the next state when a mismatch occurs.
    /// </summary>
    private void BuildFailureLinks()
    {
        Queue<TrieNode> queue = new Queue<TrieNode>();
        
        // Set failure links for depth 1 nodes to root
        // These are the immediate children of the root
        foreach (var child in root.Children.Values)
        {
            child.Failure = root;
            queue.Enqueue(child);
        }
        
        // Process the remaining nodes level by level (BFS)
        while (queue.Count > 0)
        {
            TrieNode current = queue.Dequeue();
            
            // Process each child of the current node
            foreach (var entry in current.Children)
            {
                char c = entry.Key;
                TrieNode child = entry.Value;
                
                // Add the child to the queue for later processing
                queue.Enqueue(child);
                
                // Start from the failure link of the current node
                TrieNode failure = current.Failure;
                
                // Find the appropriate failure link for the child
                // Keep following failure links until we find a node that has a child with the same character
                while (failure != null && !failure.Children.ContainsKey(c))
                {
                    failure = failure.Failure;
                }
                
                // If no appropriate failure link was found, set it to the root
                if (failure == null)
                {
                    child.Failure = root;
                }
                else
                {
                    // Set the failure link to the child of the found node with the same character
                    child.Failure = failure.Children[c];
                    
                    // Add pattern indices from the failure node
                    // This is important for finding overlapping patterns
                    child.PatternIndices.AddRange(child.Failure.PatternIndices);
                }
            }
        }
    }
    
    /// <summary>
    /// Searches for all occurrences of the patterns in the given text.
    /// </summary>
    /// <param name="text">The text to search in.</param>
    /// <returns>A list of tuples containing the pattern index and its position in the text.</returns>
    public List<(int patternIndex, int position)> Search(string text)
    {
        List<(int, int)> matches = new List<(int, int)>();
        
        // Validate input
        if (string.IsNullOrEmpty(text))
        {
            return matches;
        }
        
        TrieNode current = root;
        
        // Process each character in the text
        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            
            // Follow failure links until we find a node that has a transition for the current character
            // or until we reach the root
            while (current != root && !current.Children.ContainsKey(c))
            {
                current = current.Failure;
            }
            
            // If the current node has a transition for the character, follow it
            if (current.Children.ContainsKey(c))
            {
                current = current.Children[c];
            }
            
            // Check if any patterns end at the current node
            // If so, record the matches
            foreach (int patternIndex in current.PatternIndices)
            {
                // Calculate the starting position of the match
                int position = i - patterns[patternIndex].Length + 1;
                matches.Add((patternIndex, position));
            }
        }
        
        return matches;
    }
}

/// <summary>
/// Demonstrates how to use the AhoCorasick class for multiple pattern matching.
/// </summary>
public static void AhoCorasickExample()
{
    // Define the patterns to search for
    string[] patterns = { "he", "she", "his", "hers" };
    
    // Define the text to search in
    string text = "ushers";
    
    // Create an instance of the AhoCorasick class with the patterns
    AhoCorasick ac = new AhoCorasick(patterns);
    
    // Search for all occurrences of the patterns in the text
    List<(int patternIndex, int position)> matches = ac.Search(text);
    
    // Print the results
    foreach (var match in matches)
    {
        Console.WriteLine($"Pattern '{patterns[match.patternIndex]}' found at position {match.position}");
    }
    
    // Expected output:
    // Pattern 'she' found at position 1
    // Pattern 'he' found at position 2
    // Pattern 'hers' found at position 2
}

Time and Space Complexity

Time Complexity: O(n + m + z) where n is the length of the text, m is the total length of all patterns, and z is the number of matches.
Space Complexity: O(m) for the trie.

Applications

Intrusion Detection Systems: Finding multiple attack signatures in network traffic.
Spam Filters: Identifying multiple spam patterns in emails.
Bioinformatics: Finding multiple DNA patterns in a genome.
Search Engines: Finding multiple keywords in documents.

Regular Expression Matching

Regular expressions provide a powerful way to describe patterns in text. Regular expression matching algorithms determine if a string matches a given pattern.

How Regular Expression Matching Works

Parse the regular expression to build a finite automaton (NFA or DFA).
Process the input string using the automaton to determine if it matches the pattern.

Implementation Using .NET's Built-in Support

public static bool IsMatch(string text, string pattern)
{
    try
    {
        Regex regex = new Regex(pattern);
        return regex.IsMatch(text);
    }
    catch (ArgumentException ex)
    {
        Console.WriteLine($"Invalid regular expression: {ex.Message}");
        return false;
    }
}

public static List<string> FindMatches(string text, string pattern)
{
    List<string> matches = new List<string>();
    
    try
    {
        Regex regex = new Regex(pattern);
        MatchCollection matchCollection = regex.Matches(text);
        
        foreach (Match match in matchCollection)
        {
            matches.Add(match.Value);
        }
    }
    catch (ArgumentException ex)
    {
        Console.WriteLine($"Invalid regular expression: {ex.Message}");
    }
    
    return matches;
}

public static List<(string, int)> FindMatchesWithPositions(string text, string pattern)
{
    List<(string, int)> matches = new List<(string, int)>();
    
    try
    {
        Regex regex = new Regex(pattern);
        MatchCollection matchCollection = regex.Matches(text);
        
        foreach (Match match in matchCollection)
        {
            matches.Add((match.Value, match.Index));
        }
    }
    catch (ArgumentException ex)
    {
        Console.WriteLine($"Invalid regular expression: {ex.Message}");
    }
    
    return matches;
}

Time and Space Complexity

Time Complexity: Varies depending on the regular expression. In the worst case, it can be exponential for certain patterns with backtracking.
Space Complexity: O(m) where m is the size of the pattern.

Applications

Form Validation: Checking if input matches expected formats (email, phone, etc.).
Text Processing: Finding and extracting structured data from text.
Lexical Analysis: Tokenizing input for compilers and interpreters.
Data Extraction: Pulling specific information from documents.

Comparison of Pattern Matching Algorithms

Algorithm	Time Complexity	Space Complexity	Best For	Limitations
Naive	O(n × m)	O(1)	Simple cases, short patterns	Inefficient for long patterns
KMP	O(n + m)	O(m)	General-purpose pattern matching	More complex to implement
Boyer-Moore	O(n × m) worst, O(n/m) best	O(k)	Long patterns, large alphabets	Complex preprocessing
Rabin-Karp	O(n × m) worst, O(n + m) average	O(1)	Multiple pattern matching	Hash collisions can slow it down
Aho-Corasick	O(n + m + z)	O(m)	Multiple pattern matching	Memory intensive for large pattern sets
Regex	Varies (can be exponential)	O(m)	Complex pattern matching	Can be slow for certain patterns

Best Practices and Optimization

Choosing the Right Algorithm

Use Naive String Matching for:
- Simple cases with short patterns and texts
- When simplicity is more important than performance
Use KMP for:
- General-purpose pattern matching
- When the pattern has many repeating subpatterns
Use Boyer-Moore for:
- Long patterns and large alphabets
- When the pattern is expected to appear infrequently
Use Rabin-Karp for:
- Multiple pattern matching
- When hash function can be efficiently computed
Use Aho-Corasick for:
- Multiple pattern matching with large pattern sets
- When memory is not a constraint
Use Regular Expressions for:
- Complex pattern matching needs
- When flexibility is more important than raw performance

Optimizing Pattern Matching

Preprocess Patterns: For algorithms like KMP, Boyer-Moore, and Aho-Corasick, preprocess patterns once and reuse for multiple searches.
Choose Appropriate Data Structures: Use efficient data structures for the specific algorithm.
Consider the Alphabet Size: Some algorithms perform better with larger alphabets.
Use Built-in Functions: For regular expressions, use the built-in .NET implementation which is highly optimized.
Limit Backtracking: For regular expressions, avoid patterns that cause excessive backtracking.
Parallel Processing: For large texts, consider processing different segments in parallel.

Comprehensive Example: Pattern Matching Algorithms in Action

The following example demonstrates how to implement and compare different pattern matching algorithms in a real-world scenario. This program searches for multiple patterns in a text document and compares the performance of different algorithms.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace StringPatternMatchingDemo
{
    /// <summary>
    /// Demonstrates and compares different string pattern matching algorithms.
    /// </summary>
    public class PatternMatchingDemo
    {
        /// <summary>
        /// Main entry point for the pattern matching demonstration.
        /// </summary>
        public static void Main()
        {
            Console.WriteLine("String Pattern Matching Algorithms Demonstration");
            Console.WriteLine("================================================");
            
            // Sample text for demonstration
            string text = LoadSampleText();
            
            // Patterns to search for
            string[] patterns = {
                "algorithm",
                "pattern matching",
                "string",
                "complexity",
                "performance"
            };
            
            Console.WriteLine($"Text length: {text.Length} characters");
            Console.WriteLine($"Searching for {patterns.Length} patterns...\n");
            
            // Compare algorithms
            CompareAlgorithms(text, patterns);
            
            // Demonstrate practical application
            Console.WriteLine("\nPractical Application: Keyword Highlighting");
            Console.WriteLine("-------------------------------------------");
            HighlightKeywords(text, patterns);
        }
        
        /// <summary>
        /// Loads sample text for demonstration purposes.
        /// In a real application, this might come from a file or database.
        /// </summary>
        private static string LoadSampleText()
        {
            // For demonstration, we'll use a hardcoded text about algorithms
            // In a real application, you might use:
            // return File.ReadAllText("sample.txt");
            
            return @"
String pattern matching is a fundamental problem in computer science. 
It involves finding occurrences of a pattern within a larger text.
Several algorithms have been developed to solve this problem efficiently.

The naive approach simply checks each position in the text to see if the pattern matches.
While simple to implement, it has O(n*m) time complexity in the worst case,
where n is the length of the text and m is the length of the pattern.

More advanced algorithms like Knuth-Morris-Pratt (KMP) improve performance by avoiding
redundant comparisons. KMP has O(n+m) time complexity, making it much more efficient
for long texts and patterns.

The Boyer-Moore algorithm can achieve even better performance in practice by using
bad character and good suffix rules to skip portions of the text. This makes it
particularly efficient for large alphabets.

The Rabin-Karp algorithm uses hashing to quickly identify potential matches,
making it useful for multiple pattern searches.

Regular expressions provide a powerful way to describe complex patterns,
though they may have higher overhead for simple pattern matching tasks.

When choosing a pattern matching algorithm, consider factors like:
- The size of the text and pattern
- The complexity of the pattern
- The alphabet size
- Performance requirements
- Memory constraints

String pattern matching algorithms are essential in various applications including:
- Text editors and word processors
- Search engines
- DNA sequence analysis
- Plagiarism detection
- Network security
- Data mining and analysis

Understanding the strengths and limitations of each algorithm helps in selecting
the most appropriate one for specific requirements.
";
        }
        
        /// <summary>
        /// Compares the performance of different pattern matching algorithms.
        /// </summary>
        /// <param name="text">The text to search within.</param>
        /// <param name="patterns">The patterns to search for.</param>
        private static void CompareAlgorithms(string text, string[] patterns)
        {
            Console.WriteLine("Algorithm Performance Comparison");
            Console.WriteLine("--------------------------------");
            
            // Create a table header
            Console.WriteLine($"{"Pattern",-20} {"Naive",-10} {"KMP",-10} {"Boyer-Moore",-15} {"Rabin-Karp",-15} {"Regex",-10}");
            Console.WriteLine(new string('-', 80));
            
            foreach (string pattern in patterns)
            {
                // Measure time for each algorithm
                long naiveTime = MeasureExecutionTime(() => NaiveStringMatch(text, pattern));
                long kmpTime = MeasureExecutionTime(() => KMPSearch(text, pattern));
                long boyerMooreTime = MeasureExecutionTime(() => BoyerMooreSearch(text, pattern));
                long rabinKarpTime = MeasureExecutionTime(() => RabinKarpSearch(text, pattern));
                long regexTime = MeasureExecutionTime(() => RegexSearch(text, pattern));
                
                // Display results
                Console.WriteLine($"{pattern,-20} {naiveTime,-10} {kmpTime,-10} {boyerMooreTime,-15} {rabinKarpTime,-15} {regexTime,-10}");
            }
            
            Console.WriteLine("\nTimes are in microseconds (μs). Lower is better.");
            Console.WriteLine("Note: Performance may vary based on pattern characteristics and text content.");
        }
        
        /// <summary>
        /// Measures the execution time of an action in microseconds.
        /// </summary>
        /// <param name="action">The action to measure.</param>
        /// <returns>The execution time in microseconds.</returns>
        private static long MeasureExecutionTime(Action action)
        {
            // For more accurate measurements, we'll run multiple iterations
            const int iterations = 10;
            Stopwatch stopwatch = new Stopwatch();
            
            // Warm up (to avoid JIT compilation overhead)
            action();
            
            // Measure
            stopwatch.Start();
            for (int i = 0; i < iterations; i++)
            {
                action();
            }
            stopwatch.Stop();
            
            // Return average time in microseconds
            return stopwatch.ElapsedTicks * 1_000_000 / (Stopwatch.Frequency * iterations);
        }
        
        /// <summary>
        /// Demonstrates a practical application: highlighting keywords in text.
        /// </summary>
        /// <param name="text">The text to process.</param>
        /// <param name="keywords">The keywords to highlight.</param>
        private static void HighlightKeywords(string text, string[] keywords)
        {
            // In a console application, we'll simulate highlighting by printing
            // a small excerpt around each match
            
            Console.WriteLine("Excerpts containing keywords:");
            
            foreach (string keyword in keywords)
            {
                // Find all occurrences using KMP (efficient for this task)
                List<int> matches = KMPSearch(text, keyword);
                
                if (matches.Count > 0)
                {
                    Console.WriteLine($"\nKeyword: \"{keyword}\" ({matches.Count} occurrences)");
                    
                    // Show the first 3 occurrences with context
                    int showCount = Math.Min(3, matches.Count);
                    for (int i = 0; i < showCount; i++)
                    {
                        int position = matches[i];
                        int contextStart = Math.Max(0, position - 30);
                        int contextLength = Math.Min(keyword.Length + 60, text.Length - contextStart);
                        
                        string before = text.Substring(contextStart, position - contextStart);
                        string match = text.Substring(position, keyword.Length);
                        string after = text.Substring(position + keyword.Length, 
                            Math.Min(30, text.Length - position - keyword.Length));
                        
                        Console.WriteLine($"  ...{before}[{match}]{after}...");
                    }
                    
                    if (matches.Count > 3)
                    {
                        Console.WriteLine($"  (and {matches.Count - 3} more occurrences)");
                    }
                }
                else
                {
                    Console.WriteLine($"\nKeyword: \"{keyword}\" (no occurrences)");
                }
            }
        }
        
        #region Algorithm Implementations
        
        /// <summary>
        /// Implements the naive string matching algorithm.
        /// </summary>
        public static List<int> NaiveStringMatch(string text, string pattern)
        {
            List<int> matches = new List<int>();
            
            if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
                return matches;
            
            int n = text.Length;
            int m = pattern.Length;
            
            for (int i = 0; i <= n - m; i++)
            {
                int j;
                for (j = 0; j < m; j++)
                {
                    if (text[i + j] != pattern[j])
                        break;
                }
                
                if (j == m)
                    matches.Add(i);
            }
            
            return matches;
        }
        
        /// <summary>
        /// Implements the Knuth-Morris-Pratt (KMP) string matching algorithm.
        /// </summary>
        public static List<int> KMPSearch(string text, string pattern)
        {
            List<int> matches = new List<int>();
            
            if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
                return matches;
            
            int n = text.Length;
            int m = pattern.Length;
            
            // Compute the LPS array
            int[] lps = new int[m];
            ComputeLPSArray(pattern, lps);
            
            int i = 0; // Index for text
            int j = 0; // Index for pattern
            
            while (i < n)
            {
                if (pattern[j] == text[i])
                {
                    i++;
                    j++;
                }
                
                if (j == m)
                {
                    matches.Add(i - j);
                    j = lps[j - 1];
                }
                else if (i < n && pattern[j] != text[i])
                {
                    if (j != 0)
                        j = lps[j - 1];
                    else
                        i++;
                }
            }
            
            return matches;
        }
        
        /// <summary>
        /// Computes the Longest Proper Prefix which is also Suffix (LPS) array for KMP.
        /// </summary>
        private static void ComputeLPSArray(string pattern, int[] lps)
        {
            int length = 0;
            int i = 1;
            
            lps[0] = 0; // LPS of first character is always 0
            
            while (i < pattern.Length)
            {
                if (pattern[i] == pattern[length])
                {
                    length++;
                    lps[i] = length;
                    i++;
                }
                else
                {
                    if (length != 0)
                    {
                        length = lps[length - 1];
                    }
                    else
                    {
                        lps[i] = 0;
                        i++;
                    }
                }
            }
        }
        
        /// <summary>
        /// Implements the Boyer-Moore string matching algorithm.
        /// </summary>
        public static List<int> BoyerMooreSearch(string text, string pattern)
        {
            List<int> matches = new List<int>();
            
            if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
                return matches;
            
            int n = text.Length;
            int m = pattern.Length;
            
            // Preprocess bad character table
            int[] badChar = new int[256];
            for (int i = 0; i < 256; i++)
                badChar[i] = -1;
            
            for (int i = 0; i < m; i++)
                badChar[pattern[i]] = i;
            
            int s = 0; // Shift of the pattern
            
            while (s <= (n - m))
            {
                int j = m - 1;
                
                while (j >= 0 && pattern[j] == text[s + j])
                    j--;
                
                if (j < 0)
                {
                    matches.Add(s);
                    s += (s + m < n) ? m - badChar[text[s + m]] : 1;
                }
                else
                {
                    s += Math.Max(1, j - badChar[text[s + j]]);
                }
            }
            
            return matches;
        }
        
        /// <summary>
        /// Implements the Rabin-Karp string matching algorithm.
        /// </summary>
        public static List<int> RabinKarpSearch(string text, string pattern)
        {
            List<int> matches = new List<int>();
            
            if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
                return matches;
            
            int n = text.Length;
            int m = pattern.Length;
            
            // Base value for hash calculation (a prime number)
            const int prime = 101;
            
            // Calculate hash value for pattern and first window of text
            int patternHash = 0;
            int textHash = 0;
            int h = 1;
            
            // Calculate h = pow(256, m-1) % prime
            for (int i = 0; i < m - 1; i++)
                h = (h * 256) % prime;
            
            // Calculate hash value for pattern and first window of text
            for (int i = 0; i < m; i++)
            {
                patternHash = (256 * patternHash + pattern[i]) % prime;
                textHash = (256 * textHash + text[i]) % prime;
            }
            
            // Slide pattern over text one by one
            for (int i = 0; i <= n - m; i++)
            {
                // Check if hash values match
                if (patternHash == textHash)
                {
                    // Check for characters one by one
                    bool match = true;
                    for (int j = 0; j < m; j++)
                    {
                        if (text[i + j] != pattern[j])
                        {
                            match = false;
                            break;
                        }
                    }
                    
                    if (match)
                        matches.Add(i);
                }
                
                // Calculate hash value for next window
                if (i < n - m)
                {
                    textHash = (256 * (textHash - text[i] * h) + text[i + m]) % prime;
                    
                    // Make sure hash value is positive
                    if (textHash < 0)
                        textHash += prime;
                }
            }
            
            return matches;
        }
        
        /// <summary>
        /// Uses .NET's Regex class for pattern matching.
        /// </summary>
        public static List<int> RegexSearch(string text, string pattern)
        {
            List<int> matches = new List<int>();
            
            if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(pattern))
                return matches;
            
            // Escape special regex characters in the pattern
            string escapedPattern = Regex.Escape(pattern);
            
            // Find all matches
            MatchCollection matchCollection = Regex.Matches(text, escapedPattern);
            
            foreach (Match match in matchCollection)
            {
                matches.Add(match.Index);
            }
            
            return matches;
        }
        
        #endregion
    }
}

Step-by-Step Explanation

This comprehensive example demonstrates how to implement and use different pattern matching algorithms in C#. Let's break it down:

Program Structure:
- The program is organized into a class called PatternMatchingDemo with a Main method as the entry point.
- It includes implementations of various pattern matching algorithms and utility methods for performance comparison.
Sample Data:
- The LoadSampleText method provides a text about pattern matching algorithms for demonstration.
- In a real application, this text might come from a file, database, or user input.
Algorithm Comparison:
- The CompareAlgorithms method measures and compares the performance of different pattern matching algorithms.
- It uses the MeasureExecutionTime method to accurately time each algorithm's execution.
- Results are displayed in a table format for easy comparison.
Practical Application:
- The HighlightKeywords method demonstrates a real-world application: finding and highlighting keywords in text.
- It shows context around each match, similar to what a search feature in a text editor might do.
Algorithm Implementations:
- Naive String Matching: The simplest approach, checking each position in the text.
- Knuth-Morris-Pratt (KMP): Uses a preprocessed array to avoid redundant comparisons.
- Boyer-Moore: Uses bad character heuristics to skip portions of the text.
- Rabin-Karp: Uses hashing to quickly identify potential matches.
- Regex: Uses .NET's built-in regular expression engine.
Performance Considerations:
- The program includes warm-up runs to avoid JIT compilation overhead.
- It runs multiple iterations to get more accurate timing results.
- It converts timing results to microseconds for better readability.

Key Takeaways

Algorithm Selection: Different algorithms perform better in different scenarios. The choice depends on factors like text length, pattern complexity, and alphabet size.
Performance Measurement: Accurate performance measurement requires careful methodology, including warm-up runs and multiple iterations.
Practical Applications: Pattern matching algorithms have many real-world applications, such as text search, keyword highlighting, and data analysis.
Implementation Details: Each algorithm has its own preprocessing steps and matching logic, with different time and space complexity characteristics.
Built-in vs. Custom: While .NET provides built-in methods like String.IndexOf and Regex.Matches, understanding the underlying algorithms helps in selecting the most appropriate approach for specific requirements.

Conclusion

Pattern matching algorithms are essential tools for text processing and analysis. By understanding the characteristics, strengths, and limitations of each algorithm, you can choose the most appropriate one for your specific requirements.

Whether you're searching for a single pattern, multiple patterns, or complex patterns described by regular expressions, these algorithms provide efficient solutions for finding and analyzing patterns in text data.

The example program demonstrates how to implement these algorithms in C# and how to use them in practical applications. By studying and experimenting with these implementations, you can gain a deeper understanding of pattern matching algorithms and their applications in software development.

Naive String Matching​

How Naive String Matching Works​

Implementation​

Time and Space Complexity​

Applications​

Knuth-Morris-Pratt (KMP) Algorithm​

How KMP Works​

Implementation​

Time and Space Complexity​

Applications​

Boyer-Moore Algorithm​

How Boyer-Moore Works​

Implementation​

Time and Space Complexity​

Applications​

Rabin-Karp Algorithm​

How Rabin-Karp Works​

Implementation​

Time and Space Complexity​

Applications​

Aho-Corasick Algorithm​

How Aho-Corasick Works​

Implementation​

Time and Space Complexity​

Applications​

Regular Expression Matching​

How Regular Expression Matching Works​

Implementation Using .NET's Built-in Support​

Time and Space Complexity​

Applications​

Comparison of Pattern Matching Algorithms​

Best Practices and Optimization​

Choosing the Right Algorithm​

Optimizing Pattern Matching​

Comprehensive Example: Pattern Matching Algorithms in Action​

Step-by-Step Explanation​

Key Takeaways​

Conclusion​

Naive String Matching

How Naive String Matching Works

Implementation

Time and Space Complexity

Applications

Knuth-Morris-Pratt (KMP) Algorithm

How KMP Works

Implementation

Time and Space Complexity

Applications

Boyer-Moore Algorithm

How Boyer-Moore Works

Implementation

Time and Space Complexity

Applications

Rabin-Karp Algorithm

How Rabin-Karp Works

Implementation

Time and Space Complexity

Applications

Aho-Corasick Algorithm

How Aho-Corasick Works

Implementation

Time and Space Complexity

Applications

Regular Expression Matching

How Regular Expression Matching Works

Implementation Using .NET's Built-in Support

Time and Space Complexity

Applications

Comparison of Pattern Matching Algorithms

Best Practices and Optimization

Choosing the Right Algorithm

Optimizing Pattern Matching

Comprehensive Example: Pattern Matching Algorithms in Action

Step-by-Step Explanation

Key Takeaways

Conclusion