Saturday, November 29, 2014

[LeetCode] Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.
'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true


思路1: DP

关键在于如何处理这个'*'号。

状态:和Mininum Edit Distance这类题目一样。
dp[i][j]表示s[0:i-1]是否能和p[0:j-1]匹配。

递推公式:由于只有p中会含有regular expression,所以以p[j-1]来进行分类。
p[j-1] != '.' && p[j-1] != '*':dp[i][j] = dp[i-1][j-1] && (s[i-1] == p[j-1])
p[j-1] == '.':dp[i][j] = dp[i-1][j-1]

而关键的难点在于 p[j-1] = '*'。由于星号可以匹配0,1,乃至多个p[j-2]。
1. 匹配0个元素,即消去p[j-2],此时p[0: j-1] = p[0: j-3]
dp[i][j] = dp[i][j-2]

2. 匹配1个元素,此时p[0: j-1] = p[0: j-2]
dp[i][j] = dp[i][j-1]

3. 匹配多个元素,此时p[0: j-1] = { p[0: j-2], p[j-2], ... , p[j-2] }
dp[i][j] = dp[i-1][j] && (p[j-2]=='.' || s[i-2]==p[j-2])

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        int m = strlen(s), n = strlen(p);
        vector<vector<bool>> dp(m+1, vector<bool>(n+1,false));
        dp[0][0] = true;
        
        for(int i=0; i<=m; i++) {
            for(int j=1; j<=n; j++) {
                if(p[j-1]!='.' && p[j-1]!='*') {
                    if(i>0 && s[i-1]==p[j-1] && dp[i-1][j-1])
                        dp[i][j] = true;
                }
                else if(p[j-1]=='.') {
                    if(i>0 && dp[i-1][j-1])
                        dp[i][j] = true;
                }
                else if(j>1) {  //'*' cannot be the 1st element
                    if(dp[i][j-1] || dp[i][j-2])  // match 0 or 1 preceding element
                        dp[i][j] = true;
                    else if(i>0 && (p[j-2]==s[i-1] || p[j-2]=='.') && dp[i-1][j]) // match multiple preceding elements
                        dp[i][j] = true;
                }
            }
        }
        return dp[m][n];
    }
};


思路2: 双指针扫描

LeetCode作者给的解法,非常巧妙:

http://leetcode.com/2011/09/regular-expression-matching.html

8 comments:

  1. 楼主您好!不太理解星号多匹配时的状态方程,您能仔细说说吗?谢谢!

    ReplyDelete
  2. 第三点 “3. 匹配多个元素” 有错误。“s[i-2]==p[j-2]” 应该是s[i-1]==p[j-2]。 是s当前的字符等于p当前(就是*号)的前面一个字符

    ReplyDelete
    Replies
    1. I think `if(dp[i][j-1] || dp[i][j-2])` is wrong, it should be `if(dp[i][j-2])`, which means p[j-1]p[j-2] is treated as ''.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. 其实匹配多个元素的转换方程里面包含了匹配1个元素的情况,不需要分开处理。如果当前p[j-1]是‘*’,我们可以先将dp[i][j]的值设为dp[i][j-2]然后去判断是否可以匹配一个或者多个。因为匹配0个或者多个(包括1个)是一个或的关系。

    ReplyDelete
  5. 我觉得递归比DP的方法更难理解一点

    ReplyDelete