### INTRODUCTION TO REGULAR EXPRESSION MATCHING

# INTRODUCTION TO REGULAR EXPRESSION MATCHING

Regular expression matching (often referred to as regex or regexp matching) is a powerful and flexible way to search, match, and manipulate text based on patterns. A regular expression is a sequence of characters that defines a search pattern. These patterns can include a variety of elements, such as literals, metacharacters, and quantifiers, allowing for complex and flexible text matching..

# OVERVIEW OF REGULAR EXPRESSION MATCHING

The Regular Expression Matching problem involves determining if a given string matches a specified pattern defined by a regular expression. This problem is commonly encountered in string matching, text processing, and pattern recognition tasks. The regular expression specifies a set of rules that the input string must follow for a match to occur.

Given two strings **S** and **P** where **S** consists of only lowercase English alphabets while **P** consists of lowercase English alphabets as well as special characters ‘**.’** and ‘***’,** the task is to implement a function to test regular expression such that:

`'.'`

Matches any single character.`'*'`

Matches zero or more of the preceding element.

**Here's an example to illustrate the Regular Expression Matching problem:**

Let's consider the regular expression ** a*b**:

: Zero or more occurrences of the character 'a'.`a*`

: The character 'b'.`b`

Now, suppose we have the following strings:

- Input:
`"b"`

- Does
match the pattern`"b"`

?`a*b`

- No, because there are no 'a' characters before 'b'.

- Does
- Input:
`"aaab"`

- Does
match the pattern`"aaab"`

?`a*b`

- Yes, because there are zero or more 'a' characters followed by 'b'.

- Does

This problem can be solved using dynamic programming or recursion with memoization. The idea is to build a table or use memoization to store the results of subproblems, avoiding redundant computations.

# CODE

### PYTHON

```
# Copyrights to venkys.io
# For more information, visit venkys.io
# time Complexity : O(m * n),
# Space Complexity : O(m * n),
def is_match(user_s, user_p):
# Initialize a 2D table to store results of subproblems
dp = [[False] * (len(user_p) + 1) for _ in range(len(user_s) + 1)]
# Empty pattern matches empty string
dp[0][0] = True
# Handle patterns with '*' at the beginning
for j in range(1, len(user_p) + 1):
if user_p[j - 1] == '*':
dp[0][j] = dp[0][j - 2]
# Build the table using dynamic programming
for i in range(1, len(user_s) + 1):
for j in range(1, len(user_p) + 1):
if user_p[j - 1] == user_s[i - 1] or user_p[j - 1] == '.':
dp[i][j] = dp[i - 1][j - 1]
elif user_p[j - 1] == '*':
dp[i][j] = dp[i][j - 2] or (dp[i - 1][j] if user_s[i - 1] == user_p[j - 2] or user_p[j - 2] == '.' else False)
return dp[len(user_s)][len(user_p)]
if __name__ == "__main__":
# Get input from the user
user_s = input()
user_p = input()
# Test the function with user input
result = is_match(user_s, user_p)
# Display the result
print(result)
```

## Step-by-Step Explanation of the Code

- The code defines a function is_match that takes two parameters: user_s (the user's input string) and user_p (the user's input pattern).
- It initializes a 2D table, dp, with dimensions (len(user_s) + 1) x (len(user_p) + 1). This table will store the results of subproblems.
- It sets dp[0][0] to True, indicating that an empty pattern matches an empty string.
- It handles patterns with '
*' at the beginning by setting dp[0][j] to dp[0][j - 2] if user_p[j - 1] == '*'. - It builds the table using dynamic programming. It iterates through user_s and user_p to fill in the table.
- If user_p[j - 1] matches user_s[i - 1] or user_p[j - 1] is '.', it sets dp[i][j] to dp[i - 1][j - 1].
- If user_p[j - 1] is '*', it sets dp[i][j] to dp[i][j - 2] or (dp[i - 1][j] if user_s[i - 1] == user_p[j - 2] or user_p[j - 2] == '.' else False).
- Finally, it returns dp[len(user_s)][len(user_p)], which represents whether the entire string matches the pattern.
- The code also includes a main block that prompts the user to enter a string and a pattern. It tests the is_match function with the user's input and prints the result.

### JAVA

```
// Copyrights to venkys.io
// For more information, visit venkys.io
// time Complexity : O(m * n),
// Space Complexity : O(m * n),
import java.util.Scanner;
public class RegularExpressionMatching {
public static boolean isMatch(String s, String p) {
// Initialize a 2D table to store results of subproblems
boolean[][] dp = new boolean[s.length() + 1][p.length() + 1];
// Empty pattern matches empty string
dp[0][0] = true;
// Handle patterns with '*' at the beginning
for (int j = 1; j <= p.length(); j++) {
if (p.charAt(j - 1) == '*') {
dp[0][j] = dp[0][j - 2];
}
}
// Build the table using dynamic programming
for (int i = 1; i <= s.length(); i++) {
for (int j = 1; j <= p.length(); j++) {
if (p.charAt(j - 1) == s.charAt(i - 1) || p.charAt(j - 1) == '.') {
dp[i][j] = dp[i - 1][j - 1];
} else if (p.charAt(j - 1) == '*') {
dp[i][j] = dp[i][j - 2] || (dp[i - 1][j] && (s.charAt(i - 1) == p.charAt(j - 2) || p.charAt(j - 2) == '.'));
}
}
}
return dp[s.length()][p.length()];
}
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
// Get user input for the string
// System.out.print("Enter a string: ");
String userString = scanner.nextLine();
// Get user input for the pattern
// System.out.print("Enter a pattern: ");
String userPattern = scanner.nextLine();
// Test the function with user input
boolean result = isMatch(userString, userPattern);
// Display the result
System.out.println(result);
}
}
```

## Step-by-Step Explanation of the Code

- Import the required libraries and classes. In this case, we are importing the Scanner class from the java.util package.
- Define a public class named RegularExpressionMatching.
- Create a static method named isMatch that takes two parameters: s (representing the input string) and p (representing the pattern to match against).
- Initialize a 2D table named dp (dynamic programming) with dimensions s.length() + 1 and p.length() + 1. This table will store the results of subproblems.
- Set the value of dp[0][0] to true, as an empty pattern matches an empty string.
- Handle patterns with '
*' at the beginning. Iterate through the characters of the pattern. If the current character is '*', set dp[0][j] to dp[0][j - 2]. - Build the table using dynamic programming. Iterate through the characters of the string and pattern. For each character, check the following conditions:
- If the current pattern character is equal to the current string character or it is '.', set dp[i][j] to dp[i - 1][j - 1].
- If the current pattern character is '
*', set dp[i][j] to dp[i][j - 2] (considering the '*' as zero occurrence) or check if the previous character in the pattern matches the current character in the string (s.charAt(i - 1) == p.charAt(j - 2)) or the previous character in the pattern is '.' (p.charAt(j - 2) == '.'), and if so, set dp[i][j] to dp[i - 1][j].

- Return the value of dp[s.length()][p.length()], which represents whether the entire string matches the pattern.
- Define a main method. Create a Scanner object named scanner to read user input.
- Prompt the user to enter a string and store the input in the userString variable.
- Prompt the user to enter a pattern and store the input in the userPattern variable.
- Call the isMatch method with userString and userPattern as arguments and store the result in the result variable.
- Print the value of result, which represents whether the string matches the pattern.

### c++

```
// Copyrights to venkys.io
// For more information, visit venkys.io
// time Complexity : O(m * n),
// Space Complexity : O(m * n),
#include <iostream>
#include <vector>
using namespace std;
bool isMatch(const string& s, const string& p) {
// Initialize a 2D table to store results of subproblems
vector<vector<bool>> dp(s.length() + 1, vector<bool>(p.length() + 1, false));
// Empty pattern matches empty string
dp[0][0] = true;
// Handle patterns with '*' at the beginning
for (int j = 1; j <= p.length(); j++) {
if (p[j - 1] == '*') {
dp[0][j] = dp[0][j - 2];
}
}
// Build the table using dynamic programming
for (int i = 1; i <= s.length(); i++) {
for (int j = 1; j <= p.length(); j++) {
if (p[j - 1] == s[i - 1] || p[j - 1] == '.') {
dp[i][j] = dp[i - 1][j - 1];
} else if (p[j - 1] == '*') {
dp[i][j] = dp[i][j - 2] || (dp[i - 1][j] && (s[i - 1] == p[j - 2] || p[j - 2] == '.'));
}
}
}
return dp[s.length()][p.length()];
}
int main() {
// Get input from the user
string userString, userPattern;
// cout << "Enter a string: ";
getline(cin, userString);
// cout << "Enter a pattern: ";
getline(cin, userPattern);
// Test the function with user input
bool result = isMatch(userString, userPattern);
// Display the result
cout << result << endl;
return 0;
}
```

## Step-by-Step Explanation of the Code

- Import necessary libraries: The code begins by importing the required libraries, including iostream and vector. These libraries will be used for input/output operations and for creating dynamic arrays.
- Define the isMatch function: The isMatch function takes two input parameters, s and p, which represent the given string and pattern, respectively. It returns a boolean value indicating whether the string matches the pattern.
- Initialize the dynamic programming table: A 2D vector, dp, is initialized to store the results of subproblems. The dimensions of the table are set to s.length() + 1 and p.length() + 1, and all values are initially set to false.
- Handle empty pattern and empty string case: The table entry dp[0][0] is set to true since an empty pattern matches an empty string.
- Handle patterns with '
*' at the beginning: A loop iterates through each character of the pattern starting from the second character. If the current character is '*', the table entry dp[0][j] is set to the value of dp[0][j - 2]. This accounts for the case where '*' matches zero occurrences of the preceding character. - Build the table using dynamic programming: Nested loops iterate through each character of the string and pattern. If the current characters match or if the pattern character is '.', the table entry dp[i][j] is set to the value of dp[i - 1][j - 1]. If the pattern character is '
*', the table entry dp[i][j] is set to the result of logical OR operations between dp[i][j - 2] and (dp[i - 1][j] && (s[i - 1] == p[j - 2] || p[j - 2] == '.')). This accounts for the case where '*' matches zero or more occurrences of the preceding character. - Return the result: The function returns the final value of dp[s.length()][p.length()].
- Main function: The main function prompts the user to enter a string and a pattern. It then calls the isMatch function with the user input as arguments. The resulting boolean value is stored in the result variable.
- Display the result: Finally, the result is displayed by outputting the value of result.

# TEST CASES

### Example 1 :

**Input :**

Enter a string: "abb"

Enter a pattern: "a.*"

**Output :**

```
True
```

**Explanation :**

replace **.** with b then p becomes ab* now replace * with one preceeding character hence p becomes abb.

### Example 2 :

**Input :**

Enter a string: aaaaaa

Enter a pattern: aaaa

**Output :**

```
False
```

**Explanation :**

" aaaa" does not match the entire string "aaaaaa".

### Example 3 :

**Input :**

Enter a string: apple123

Enter a pattern: "[^0-9]+

**Output :**

```
False
```

**Explanation :**

The regular expression "[^0-9]+" matches sequences without digits, but "apple123" contains numeric characters.

# TIME AND SPACE COMPLEXITY :

The time complexity is `O(m * n)`

, where m is the length of the input string **s**, and n is the length of the pattern string **p**. This is because there is a nested loop iterating through the lengths of both strings.

- The outer loop runs for
**m + 1**iterations. - The inner loop runs for
**n + 1**iterations.

Each iteration involves constant-time operations, so the overall time complexity is O(m * n).

The space complexity is also `O(m * n)`

due to the space used by the 2D table .

- The table has dimensions
**(m + 1) x (n + 1)**, where**m**is the length of string**s**, and**n**is the length of pattern**p**. - Each entry in the table stores a boolean value.

Therefore, the space complexity is O(m * n).

# REAL-WORLD APPLICATION FOR REGULAR EXPRESSION MATCHING

Regular expression matching has numerous real-world applications across various domains due to its ability to define and search for complex patterns in text data. Here are some common real-world applications :

**Text Search and Validation:****Search Engines:**Search engines use regular expressions to match user queries against a vast amount of text data efficiently.**Text Editors and IDEs:**Text editors and integrated development environments (IDEs) often provide find-and-replace functionality using regular expressions.

**Data Validation and Extraction:****Form Validation:**Regular expressions are commonly used to validate user input in forms, such as email addresses, phone numbers, or ZIP codes.**Log Analysis:**Regular expressions help extract specific information from log files, enabling analysis and troubleshooting.

**String Manipulation and Parsing:****Data Cleaning:**Regular expressions are employed to clean and preprocess textual data by removing or replacing specific patterns.**URL Parsing:**Regular expressions can be used to parse and extract components from URLs, such as extracting the domain or parameters.

**Lexical Analysis and Compilers:****Programming Languages:**Regular expressions play a vital role in lexical analysis, where they are used to define tokens in the source code of programming languages.**Compiler Construction:**Regular expressions are part of the toolkit used in the construction of compilers for parsing and tokenizing code.

**Natural Language Processing (NLP):****Named Entity Recognition:**Regular expressions can be used to define patterns for identifying named entities (e.g., names, locations) in text data.**Text Pattern Matching:**In NLP tasks, regular expressions are applied to match specific linguistic patterns or structures.

**Network Security:****Intrusion Detection Systems (IDS):**Regular expressions are used to define patterns of known attack signatures or suspicious network activities in security systems.**Log Analysis for Security:**Regular expressions aid in extracting relevant information from security logs for analysis and threat detection.

**Web Scraping and Data Extraction:****Web Scraping:**Regular expressions are utilized to extract specific data patterns from HTML or other markup languages.**Data Extraction from Documents:**Regular expressions can be employed to extract structured information from documents in various formats.

**Configuration File Parsing:****Parsing Configuration Files:**Regular expressions are used to parse and extract information from configuration files in software applications.