4.5 - Regular Expressions
Regular expressions (regex) provide a powerful and flexible way to search, match, and manipulate text based on patterns. In C#, regular expressions are implemented through the System.Text.RegularExpressions
namespace, which provides the Regex
class and related types for working with regular expressions.
4.5.1 - Regex Basics
Regular expressions consist of patterns that describe sets of strings. These patterns can include literal characters, character classes, quantifiers, and other special constructs.
4.5.1.1 - Creating and Using Regex Objects
The Regex
class is the primary class for working with regular expressions in C#:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Create a Regex object
Regex regex = new Regex(@"\d+"); // Match one or more digits
// Test if a string matches the pattern
string input = "The price is $42.99";
bool isMatch = regex.IsMatch(input);
Console.WriteLine($"Does the input contain digits? {isMatch}");
// Find the first match
Match match = regex.Match(input);
if (match.Success)
{
Console.WriteLine($"First match: {match.Value}");
Console.WriteLine($"Position: {match.Index}");
Console.WriteLine($"Length: {match.Length}");
}
// Find all matches
MatchCollection matches = regex.Matches(input);
Console.WriteLine($"Number of matches: {matches.Count}");
foreach (Match m in matches)
{
Console.WriteLine($"Match: {m.Value} at position {m.Index}");
}
// Replace matches
string replaced = regex.Replace(input, "XX");
Console.WriteLine($"After replacement: {replaced}");
}
}
4.5.1.2 - Static Regex Methods
The Regex
class also provides static methods for one-time regex operations:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "The price is $42.99";
string pattern = @"\d+";
// Static IsMatch method
bool isMatch = Regex.IsMatch(input, pattern);
Console.WriteLine($"Does the input contain digits? {isMatch}");
// Static Match method
Match match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine($"First match: {match.Value}");
}
// Static Matches method
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match m in matches)
{
Console.WriteLine($"Match: {m.Value}");
}
// Static Replace method
string replaced = Regex.Replace(input, pattern, "XX");
Console.WriteLine($"After replacement: {replaced}");
// Static Split method
string[] parts = Regex.Split("apple,banana;cherry", "[,;]");
Console.WriteLine("Split results:");
foreach (string part in parts)
{
Console.WriteLine($" {part}");
}
}
}
4.5.1.3 - Basic Regex Patterns
Here are some common regex patterns and their meanings:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Test various regex patterns
TestPattern(@"\d", "Matches a single digit", "The number is 42");
TestPattern(@"\d+", "Matches one or more digits", "The number is 42");
TestPattern(@"\w", "Matches a single word character (letter, digit, or underscore)", "Hello, World!");
TestPattern(@"\w+", "Matches one or more word characters", "Hello, World!");
TestPattern(@"\s", "Matches a single whitespace character", "Hello, World!");
TestPattern(@"\s+", "Matches one or more whitespace characters", "Hello, World!");
TestPattern(@"[aeiou]", "Matches any vowel", "Hello, World!");
TestPattern(@"[^aeiou]", "Matches any character that is not a vowel", "Hello, World!");
TestPattern(@"^Hello", "Matches 'Hello' at the start of the string", "Hello, World!");
TestPattern(@"World!$", "Matches 'World!' at the end of the string", "Hello, World!");
TestPattern(@"a|b", "Matches either 'a' or 'b'", "apple and banana");
TestPattern(@"(apple|banana)", "Matches either 'apple' or 'banana'", "I like apples and bananas");
TestPattern(@"a{2}", "Matches exactly 2 consecutive 'a' characters", "The balloon floated away");
TestPattern(@"a{2,}", "Matches 2 or more consecutive 'a' characters", "The balloon floated away");
TestPattern(@"a{1,3}", "Matches 1 to 3 consecutive 'a' characters", "The balloon floated away");
}
static void TestPattern(string pattern, string description, string input)
{
Console.WriteLine($"Pattern: {pattern}");
Console.WriteLine($"Description: {description}");
Console.WriteLine($"Input: {input}");
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine($"Matches ({matches.Count}):");
foreach (Match match in matches)
{
Console.WriteLine($" '{match.Value}' at position {match.Index}");
}
Console.WriteLine();
}
}
4.5.2 - Pattern Matching
Regular expressions provide powerful pattern matching capabilities for various text processing tasks.
4.5.2.1 - Character Classes
Character classes allow you to match any one of a set of characters:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Predefined character classes
TestPattern(@"\d", "Digits", "abc123xyz");
TestPattern(@"\D", "Non-digits", "abc123xyz");
TestPattern(@"\w", "Word characters", "abc123_xyz");
TestPattern(@"\W", "Non-word characters", "abc123_xyz!");
TestPattern(@"\s", "Whitespace", "Hello World");
TestPattern(@"\S", "Non-whitespace", "Hello World");
// Custom character classes
TestPattern(@"[aeiou]", "Vowels", "Hello World");
TestPattern(@"[^aeiou]", "Non-vowels", "Hello World");
TestPattern(@"[a-z]", "Lowercase letters", "Hello World");
TestPattern(@"[A-Z]", "Uppercase letters", "Hello World");
TestPattern(@"[0-9]", "Digits (equivalent to \\d)", "Hello 123");
TestPattern(@"[a-zA-Z]", "Letters", "Hello 123");
TestPattern(@"[a-zA-Z0-9]", "Alphanumeric characters", "Hello 123!");
}
static void TestPattern(string pattern, string description, string input)
{
Console.WriteLine($"Pattern: {pattern}");
Console.WriteLine($"Description: {description}");
Console.WriteLine($"Input: {input}");
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine($"Matches ({matches.Count}):");
if (matches.Count > 0)
{
string result = string.Join(", ", Array.ConvertAll(matches.Cast<Match>().ToArray(), m => $"'{m.Value}'"));
Console.WriteLine($" {result}");
}
Console.WriteLine();
}
}
4.5.2.2 - Quantifiers
Quantifiers specify how many times a character or group should be matched:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Basic quantifiers
TestPattern(@"a+", "One or more 'a'", "aardvark");
TestPattern(@"a*", "Zero or more 'a'", "baaardvark");
TestPattern(@"a?", "Zero or one 'a'", "baaardvark");
// Specific quantifiers
TestPattern(@"a{2}", "Exactly 2 'a'", "baaardvark");
TestPattern(@"a{2,}", "2 or more 'a'", "baaardvark");
TestPattern(@"a{1,3}", "1 to 3 'a'", "baaardvark");
// Greedy vs. lazy quantifiers
TestPattern(@"<.+>", "Greedy match between < and >", "<tag>value</tag>");
TestPattern(@"<.+?>", "Lazy match between < and >", "<tag>value</tag>");
// Combining character classes and quantifiers
TestPattern(@"\d+", "One or more digits", "The price is $42.99");
TestPattern(@"\w+", "One or more word characters", "Hello, World!");
TestPattern(@"\s+", "One or more whitespace characters", "Hello, World!");
}
static void TestPattern(string pattern, string description, string input)
{
Console.WriteLine($"Pattern: {pattern}");
Console.WriteLine($"Description: {description}");
Console.WriteLine($"Input: {input}");
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine($"Matches ({matches.Count}):");
foreach (Match match in matches)
{
Console.WriteLine($" '{match.Value}' at position {match.Index}");
}
Console.WriteLine();
}
}
4.5.2.3 - Anchors and Boundaries
Anchors and boundaries match positions rather than characters:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Start and end anchors
TestPattern(@"^Hello", "Starts with 'Hello'", "Hello, World!");
TestPattern(@"^Hello", "Starts with 'Hello'", "Hi, Hello, World!");
TestPattern(@"World!$", "Ends with 'World!'", "Hello, World!");
TestPattern(@"World!$", "Ends with 'World!'", "Hello, World! Hi");
// Word boundaries
TestPattern(@"\bcat\b", "The word 'cat'", "The cat sat on the mat");
TestPattern(@"\bcat\b", "The word 'cat'", "The cats sat on the mat");
TestPattern(@"\bcat\b", "The word 'cat'", "The concatenation of strings");
// Non-word boundaries
TestPattern(@"cat\B", "'cat' not at a word boundary", "The cat sat on the mat");
TestPattern(@"cat\B", "'cat' not at a word boundary", "The cats sat on the mat");
TestPattern(@"cat\B", "'cat' not at a word boundary", "The concatenation of strings");
// Line anchors in multiline mode
string multiline = "First line\nSecond line\nThird line";
TestPatternWithOptions(@"^Second", "Line starting with 'Second'", multiline, RegexOptions.Multiline);
TestPatternWithOptions(@"line$", "Line ending with 'line'", multiline, RegexOptions.Multiline);
}
static void TestPattern(string pattern, string description, string input)
{
Console.WriteLine($"Pattern: {pattern}");
Console.WriteLine($"Description: {description}");
Console.WriteLine($"Input: {input}");
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine($"Matches ({matches.Count}):");
foreach (Match match in matches)
{
Console.WriteLine($" '{match.Value}' at position {match.Index}");
}
Console.WriteLine();
}
static void TestPatternWithOptions(string pattern, string description, string input, RegexOptions options)
{
Console.WriteLine($"Pattern: {pattern}");
Console.WriteLine($"Description: {description}");
Console.WriteLine($"Input: {input}");
Console.WriteLine($"Options: {options}");
MatchCollection matches = Regex.Matches(input, pattern, options);
Console.WriteLine($"Matches ({matches.Count}):");
foreach (Match match in matches)
{
Console.WriteLine($" '{match.Value}' at position {match.Index}");
}
Console.WriteLine();
}
}
4.5.2.4 - Common Regex Patterns
Here are some commonly used regex patterns for various tasks:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Email validation
string emailPattern = @"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$";
TestValidation(emailPattern, "Email", "user@example.com");
TestValidation(emailPattern, "Email", "invalid-email");
// Phone number validation (US format)
string phonePattern = @"^\(?(\d{3})\)?[-. ]?(\d{3})[-. ]?(\d{4})$";
TestValidation(phonePattern, "Phone number", "123-456-7890");
TestValidation(phonePattern, "Phone number", "(123) 456-7890");
TestValidation(phonePattern, "Phone number", "1234567890");
TestValidation(phonePattern, "Phone number", "123-45-7890");
// Date validation (MM/DD/YYYY)
string datePattern = @"^(0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])/\d{4}$";
TestValidation(datePattern, "Date (MM/DD/YYYY)", "01/01/2023");
TestValidation(datePattern, "Date (MM/DD/YYYY)", "13/01/2023");
// URL validation
string urlPattern = @"^(https?|ftp)://[^\s/$.?#].[^\s]*$";
TestValidation(urlPattern, "URL", "https://www.example.com");
TestValidation(urlPattern, "URL", "invalid-url");
// Password strength validation (at least 8 characters, with at least one uppercase letter, one lowercase letter, one digit, and one special character)
string passwordPattern = @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\da-zA-Z]).{8,}$";
TestValidation(passwordPattern, "Strong password", "Passw0rd!");
TestValidation(passwordPattern, "Strong password", "password");
// IP address validation
string ipPattern = @"^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$";
TestValidation(ipPattern, "IP address", "192.168.1.1");
TestValidation(ipPattern, "IP address", "256.256.256.256");
}
static void TestValidation(string pattern, string description, string input)
{
Console.WriteLine($"Pattern: {pattern}");
Console.WriteLine($"Description: {description} validation");
Console.WriteLine($"Input: {input}");
bool isValid = Regex.IsMatch(input, pattern);
Console.WriteLine($"Is valid: {isValid}");
Console.WriteLine();
}
}
4.5.3 - Regex Options
The RegexOptions
enumeration provides options that modify the behavior of regular expressions.
4.5.3.1 - Case Sensitivity
By default, regex matching is case-sensitive. You can make it case-insensitive using the IgnoreCase
option:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "Hello, World!";
string pattern = "hello";
// Case-sensitive matching (default)
bool caseSensitiveMatch = Regex.IsMatch(input, pattern);
Console.WriteLine($"Case-sensitive match: {caseSensitiveMatch}");
// Case-insensitive matching
bool caseInsensitiveMatch = Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase);
Console.WriteLine($"Case-insensitive match: {caseInsensitiveMatch}");
// Using the IgnoreCase option with a Regex object
Regex caseInsensitiveRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match match = caseInsensitiveRegex.Match(input);
if (match.Success)
{
Console.WriteLine($"Found match: {match.Value}");
}
}
}
4.5.3.2 - Multiline Mode
The Multiline
option changes the behavior of the ^
and $
anchors to match the beginning and end of each line, rather than the beginning and end of the entire string:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "First line\nSecond line\nThird line";
// Without Multiline option
Console.WriteLine("Without Multiline option:");
MatchCollection matches1 = Regex.Matches(input, @"^.*$");
foreach (Match match in matches1)
{
Console.WriteLine($" '{match.Value}'");
}
// With Multiline option
Console.WriteLine("\nWith Multiline option:");
MatchCollection matches2 = Regex.Matches(input, @"^.*$", RegexOptions.Multiline);
foreach (Match match in matches2)
{
Console.WriteLine($" '{match.Value}'");
}
// Finding lines that start with "Second"
Console.WriteLine("\nLines starting with 'Second':");
MatchCollection matches3 = Regex.Matches(input, @"^Second.*$", RegexOptions.Multiline);
foreach (Match match in matches3)
{
Console.WriteLine($" '{match.Value}'");
}
}
}
4.5.3.3 - Single-line Mode
The Singleline
option changes the behavior of the .
(dot) metacharacter to match any character, including newlines:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "First line\nSecond line\nThird line";
// Without Singleline option
Console.WriteLine("Without Singleline option:");
Match match1 = Regex.Match(input, @"First.*Third");
Console.WriteLine($" Match found: {match1.Success}");
// With Singleline option
Console.WriteLine("\nWith Singleline option:");
Match match2 = Regex.Match(input, @"First.*Third", RegexOptions.Singleline);
Console.WriteLine($" Match found: {match2.Success}");
if (match2.Success)
{
Console.WriteLine($" Match value: '{match2.Value}'");
}
}
}
4.5.3.4 - Other Regex Options
The RegexOptions
enumeration includes several other options:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// IgnorePatternWhitespace option
string patternWithWhitespace = @"
\b # Word boundary
[A-Z] # Uppercase letter
[a-z]+ # One or more lowercase letters
\b # Word boundary
";
string input = "Hello World";
Console.WriteLine("IgnorePatternWhitespace option:");
MatchCollection matches1 = Regex.Matches(input, patternWithWhitespace, RegexOptions.IgnorePatternWhitespace);
foreach (Match match in matches1)
{
Console.WriteLine($" '{match.Value}'");
}
// ExplicitCapture option
string capturePattern = @"(\d{3})-(\d{3})-(\d{4})";
string phoneNumber = "123-456-7890";
Console.WriteLine("\nWithout ExplicitCapture option:");
Match match2 = Regex.Match(phoneNumber, capturePattern);
for (int i = 0; i < match2.Groups.Count; i++)
{
Console.WriteLine($" Group {i}: '{match2.Groups[i].Value}'");
}
Console.WriteLine("\nWith ExplicitCapture option:");
Match match3 = Regex.Match(phoneNumber, capturePattern, RegexOptions.ExplicitCapture);
for (int i = 0; i < match3.Groups.Count; i++)
{
Console.WriteLine($" Group {i}: '{match3.Groups[i].Value}'");
}
// Combining multiple options
Console.WriteLine("\nCombining multiple options:");
Regex regex = new Regex(@"^hello.*world$",
RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline);
bool isMatch = regex.IsMatch("HELLO\nWORLD");
Console.WriteLine($" Match found: {isMatch}");
}
}
4.5.4 - Capture Groups
Capture groups allow you to extract specific parts of a matched pattern.
4.5.4.1 - Basic Capture Groups
Parentheses ()
in a regex pattern create capture groups:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Basic capture groups
string pattern = @"(\d{3})-(\d{3})-(\d{4})";
string input = "Phone: 123-456-7890";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine($"Full match: {match.Value}");
Console.WriteLine($"Group 1 (area code): {match.Groups[1].Value}");
Console.WriteLine($"Group 2 (prefix): {match.Groups[2].Value}");
Console.WriteLine($"Group 3 (line number): {match.Groups[3].Value}");
}
// Multiple matches with groups
string multipleInput = "Contact us at 123-456-7890 or 987-654-3210";
MatchCollection matches = Regex.Matches(multipleInput, pattern);
Console.WriteLine("\nMultiple matches:");
foreach (Match m in matches)
{
Console.WriteLine($"Phone: {m.Value}");
Console.WriteLine($" Area code: {m.Groups[1].Value}");
Console.WriteLine($" Prefix: {m.Groups[2].Value}");
Console.WriteLine($" Line number: {m.Groups[3].Value}");
}
}
}
4.5.4.2 - Named Capture Groups
Named capture groups make it easier to reference groups by name rather than by index:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Named capture groups
string pattern = @"(?<areaCode>\d{3})-(?<prefix>\d{3})-(?<lineNumber>\d{4})";
string input = "Phone: 123-456-7890";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine($"Full match: {match.Value}");
Console.WriteLine($"Area code: {match.Groups["areaCode"].Value}");
Console.WriteLine($"Prefix: {match.Groups["prefix"].Value}");
Console.WriteLine($"Line number: {match.Groups["lineNumber"].Value}");
}
// Parsing dates with named groups
string datePattern = @"(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{4})";
string dateInput = "Date: 12/31/2023";
Match dateMatch = Regex.Match(dateInput, datePattern);
if (dateMatch.Success)
{
Console.WriteLine("\nDate components:");
Console.WriteLine($"Month: {dateMatch.Groups["month"].Value}");
Console.WriteLine($"Day: {dateMatch.Groups["day"].Value}");
Console.WriteLine($"Year: {dateMatch.Groups["year"].Value}");
// Convert to DateTime
if (int.TryParse(dateMatch.Groups["month"].Value, out int month) &&
int.TryParse(dateMatch.Groups["day"].Value, out int day) &&
int.TryParse(dateMatch.Groups["year"].Value, out int year))
{
DateTime date = new DateTime(year, month, day);
Console.WriteLine($"Parsed date: {date:D}");
}
}
}
}
4.5.4.3 - Non-capturing Groups
Non-capturing groups allow you to group parts of a pattern without creating a capture group:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Regular capturing groups
string capturePattern = @"(\d{3})-(\d{3})-(\d{4})";
string input = "Phone: 123-456-7890";
Match captureMatch = Regex.Match(input, capturePattern);
Console.WriteLine("With capturing groups:");
for (int i = 0; i < captureMatch.Groups.Count; i++)
{
Console.WriteLine($" Group {i}: '{captureMatch.Groups[i].Value}'");
}
// Non-capturing groups
string nonCapturePattern = @"(?:\d{3})-(?:\d{3})-(\d{4})";
Match nonCaptureMatch = Regex.Match(input, nonCapturePattern);
Console.WriteLine("\nWith non-capturing groups:");
for (int i = 0; i < nonCaptureMatch.Groups.Count; i++)
{
Console.WriteLine($" Group {i}: '{nonCaptureMatch.Groups[i].Value}'");
}
// Mixed capturing and non-capturing groups
string mixedPattern = @"(?:\d{3})-(\d{3})-(\d{4})";
Match mixedMatch = Regex.Match(input, mixedPattern);
Console.WriteLine("\nWith mixed groups:");
for (int i = 0; i < mixedMatch.Groups.Count; i++)
{
Console.WriteLine($" Group {i}: '{mixedMatch.Groups[i].Value}'");
}
}
}
4.5.4.4 - Backreferences
Backreferences allow you to match the same text that was matched by a previous capturing group:
Example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Backreferences with numbered groups
string pattern1 = @"(\w+)\s+\1";
string input1 = "hello hello world";
MatchCollection matches1 = Regex.Matches(input1, pattern1);
Console.WriteLine("Backreferences with numbered groups:");
foreach (Match match in matches1)
{
Console.WriteLine($" '{match.Value}'");
Console.WriteLine($" Group 1: '{match.Groups[1].Value}'");
}
// Backreferences with named groups
string pattern2 = @"(?<word>\w+)\s+\k<word>";
string input2 = "hello hello world world";
MatchCollection matches2 = Regex.Matches(input2, pattern2);
Console.WriteLine("\nBackreferences with named groups:");
foreach (Match match in matches2)
{
Console.WriteLine($" '{match.Value}'");
Console.WriteLine($" Group 'word': '{match.Groups["word"].Value}'");
}
// HTML tag matching
string htmlPattern = @"<(\w+)>.*?</\1>";
string htmlInput = "<div>Content</div><span>More content</span>";
MatchCollection htmlMatches = Regex.Matches(htmlInput, htmlPattern);
Console.WriteLine("\nHTML tag matching:");
foreach (Match match in htmlMatches)
{
Console.WriteLine($" '{match.Value}'");
Console.WriteLine($" Tag: '{match.Groups[1].Value}'");
}
}
}
4.5.5 - Regex Performance
Regular expressions can be powerful but also computationally expensive. Here are some techniques to improve regex performance.
4.5.5.1 - Compiled Regex
The Compiled
option can improve performance for regex patterns that are used repeatedly:
Example:
using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string pattern = @"\b\w+\b";
string input = "The quick brown fox jumps over the lazy dog";
// Create regex objects
Regex standardRegex = new Regex(pattern);
Regex compiledRegex = new Regex(pattern, RegexOptions.Compiled);
// Measure performance
int iterations = 100000;
Console.WriteLine($"Running {iterations} iterations...");
// Standard regex
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
standardRegex.Matches(input);
}
sw1.Stop();
// Compiled regex
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
compiledRegex.Matches(input);
}
sw2.Stop();
Console.WriteLine($"Standard regex: {sw1.ElapsedMilliseconds} ms");
Console.WriteLine($"Compiled regex: {sw2.ElapsedMilliseconds} ms");
}
}
4.5.5.2 - Regex Caching
For one-time regex operations, the static methods of the Regex
class use a cache to improve performance:
Example:
using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string pattern = @"\b\w+\b";
string input = "The quick brown fox jumps over the lazy dog";
// Measure performance
int iterations = 10000;
Console.WriteLine($"Running {iterations} iterations...");
// Creating new Regex objects each time
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex regex = new Regex(pattern);
regex.Matches(input);
}
sw1.Stop();
// Using static methods (which use caching)
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.Matches(input, pattern);
}
sw2.Stop();
Console.WriteLine($"New Regex objects: {sw1.ElapsedMilliseconds} ms");
Console.WriteLine($"Static methods: {sw2.ElapsedMilliseconds} ms");
}
}
4.5.5.3 - Optimizing Regex Patterns
Some regex patterns can be optimized for better performance:
Example:
using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "The quick brown fox jumps over the lazy dog";
// Inefficient pattern (using greedy quantifier)
string inefficientPattern = @"\w+.*\w+";
// More efficient pattern (using lazy quantifier)
string efficientPattern = @"\w+.*?\w+";
// Measure performance
int iterations = 10000;
Console.WriteLine($"Running {iterations} iterations...");
// Inefficient pattern
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.Match(input, inefficientPattern);
}
sw1.Stop();
// Efficient pattern
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.Match(input, efficientPattern);
}
sw2.Stop();
Console.WriteLine($"Inefficient pattern: {sw1.ElapsedMilliseconds} ms");
Console.WriteLine($"Efficient pattern: {sw2.ElapsedMilliseconds} ms");
// Anchoring patterns for better performance
string unanchoredPattern = @"\d+";
string anchoredPattern = @"^\d+$";
string numericInput = "12345";
string mixedInput = "abc12345def";
Console.WriteLine("\nAnchoring patterns:");
// Unanchored pattern with numeric input
Stopwatch sw3 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.IsMatch(numericInput, unanchoredPattern);
}
sw3.Stop();
// Anchored pattern with numeric input
Stopwatch sw4 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.IsMatch(numericInput, anchoredPattern);
}
sw4.Stop();
Console.WriteLine($"Unanchored pattern (numeric input): {sw3.ElapsedMilliseconds} ms");
Console.WriteLine($"Anchored pattern (numeric input): {sw4.ElapsedMilliseconds} ms");
// Unanchored pattern with mixed input
Stopwatch sw5 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.IsMatch(mixedInput, unanchoredPattern);
}
sw5.Stop();
// Anchored pattern with mixed input
Stopwatch sw6 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.IsMatch(mixedInput, anchoredPattern);
}
sw6.Stop();
Console.WriteLine($"Unanchored pattern (mixed input): {sw5.ElapsedMilliseconds} ms");
Console.WriteLine($"Anchored pattern (mixed input): {sw6.ElapsedMilliseconds} ms");
}
}
4.5.5.4 - Alternatives to Regex
For simple string operations, using the built-in string methods can be more efficient than regular expressions:
Example:
using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "The quick brown fox jumps over the lazy dog";
// Measure performance
int iterations = 100000;
Console.WriteLine($"Running {iterations} iterations...");
// Using regex to check if a string contains a substring
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.IsMatch(input, "fox");
}
sw1.Stop();
// Using string.Contains
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
input.Contains("fox");
}
sw2.Stop();
Console.WriteLine($"Regex.IsMatch: {sw1.ElapsedMilliseconds} ms");
Console.WriteLine($"string.Contains: {sw2.ElapsedMilliseconds} ms");
// Using regex to split a string
Stopwatch sw3 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
Regex.Split(input, @"\s+");
}
sw3.Stop();
// Using string.Split
Stopwatch sw4 = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
input.Split(' ');
}
sw4.Stop();
Console.WriteLine($"Regex.Split: {sw3.ElapsedMilliseconds} ms");
Console.WriteLine($"string.Split: {sw4.ElapsedMilliseconds} ms");
}
}
Regular expressions are a powerful tool for pattern matching and text manipulation in C#. By understanding the basics of regex patterns, options, capture groups, and performance considerations, you can effectively use regular expressions to solve a wide range of text processing problems.