喜刷刷: string

Showing posts with label string. Show all posts

Saturday, November 29, 2014

[LeetCode] Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

思路1: DP

关键在于如何处理这个'*'号。

状态：和Mininum Edit Distance这类题目一样。

dp[i][j]表示s[0:i-1]是否能和p[0:j-1]匹配。

递推公式：由于只有p中会含有regular expression，所以以p[j-1]来进行分类。

p[j-1] != '.' && p[j-1] != '*'：dp[i][j] = dp[i-1][j-1] && (s[i-1] == p[j-1])

p[j-1] == '.'：dp[i][j] = dp[i-1][j-1]

而关键的难点在于 p[j-1] = '*'。由于星号可以匹配0，1，乃至多个p[j-2]。

1. 匹配0个元素，即消去p[j-2]，此时p[0: j-1] = p[0: j-3]

dp[i][j] = dp[i][j-2]

2. 匹配1个元素，此时p[0: j-1] = p[0: j-2]

dp[i][j] = dp[i][j-1]

3. 匹配多个元素，此时p[0: j-1] = { p[0: j-2], p[j-2], ... , p[j-2] }

dp[i][j] = dp[i-1][j] && (p[j-2]=='.' || s[i-2]==p[j-2])

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        int m = strlen(s), n = strlen(p);
        vector<vector<bool>> dp(m+1, vector<bool>(n+1,false));
        dp[0][0] = true;
        
        for(int i=0; i<=m; i++) {
            for(int j=1; j<=n; j++) {
                if(p[j-1]!='.' && p[j-1]!='*') {
                    if(i>0 && s[i-1]==p[j-1] && dp[i-1][j-1])
                        dp[i][j] = true;
                }
                else if(p[j-1]=='.') {
                    if(i>0 && dp[i-1][j-1])
                        dp[i][j] = true;
                }
                else if(j>1) {  //'*' cannot be the 1st element
                    if(dp[i][j-1] || dp[i][j-2])  // match 0 or 1 preceding element
                        dp[i][j] = true;
                    else if(i>0 && (p[j-2]==s[i-1] || p[j-2]=='.') && dp[i-1][j]) // match multiple preceding elements
                        dp[i][j] = true;
                }
            }
        }
        return dp[m][n];
    }
};

思路2: 双指针扫描

LeetCode作者给的解法，非常巧妙：

http://leetcode.com/2011/09/regular-expression-matching.html

Thursday, November 27, 2014

[LeetCode] Word Ladder I, II

Word Ladder I

Given two words (start and end), and a dictionary, find the length of shortest transformation sequence from start to end, such that:

Only one letter can be changed at a time
Each intermediate word must exist in the dictionary

For example,

Given:
start = "hit"
end = "cog"
dict = ["hot","dot","dog","lot","log"]

As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
return its length 5.

Note:

Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.

Word Ladder II

Given two words (start and end), and a dictionary, find all shortest transformation sequence(s) from start to end, such that:

Only one letter can be changed at a time
Each intermediate word must exist in the dictionary

For example,

Given:
start = "hit"
end = "cog"
dict = ["hot","dot","dog","lot","log"]

Return

  [
    ["hit","hot","dot","dog","cog"],
    ["hit","hot","lot","log","cog"]
  ]

Note:

All words have the same length.
All words contain only lowercase alphabetic characters.

思路：

LeetCode中为数不多的考图的难题。尽管题目看上去像字符串匹配题，但从“shortest transformation sequence from start to end”还是能透露出一点图论中最短路径题的味道。如何转化？

1. 将每个单词看成图的一个节点。
2. 当单词s1改变一个字符可以变成存在于字典的单词s2时，则s1与s2之间有连接。
3. 给定s1和s2，问题I转化成了求在图中从s1->s2的最短路径长度。而问题II转化为了求所有s1->s2的最短路径。

无论是求最短路径长度还是求所有最短路径，都是用BFS。在BFS中有三个关键步骤需要实现:

1. 如何找到与当前节点相邻的所有节点。
这里可以有两个策略：
(1) 遍历整个字典，将其中每个单词与当前单词比较，判断是否只差一个字符。复杂度为：n*w，n为字典中的单词数量，w为单词长度。
(2) 遍历当前单词的每个字符x，将其改变成a~z中除x外的任意一个，形成一个新的单词，在字典中判断是否存在。复杂度为：26*w，w为单词长度。
这里可以和面试官讨论两种策略的取舍。对于通常的英语单词来说，长度大多小于100，而字典中的单词数则往往是成千上万，所以策略2相对较优。

2. 如何标记一个节点已经被访问过，以避免重复访问。
可以将访问过的单词从字典中删除。

3. 一旦BFS找到目标单词，如何backtracking找回路径？

Word Ladder I

class Solution {
public:
    int ladderLength(string start, string end, unordered_set<string> &dict) {
        dict.insert(end);
        queue<pair<string,int>> q;
        q.push(make_pair(start,1));
        while(!q.empty()) {
            string s = q.front().first;
            int len = q.front().second;
            if(s==end) return len;
            q.pop();
            vector<string> neighbors = findNeighbors(s, dict);
            for(int i=0; i<neighbors.size(); i++) 
                q.push(make_pair(neighbors[i],len+1));
        }
        return 0;
    }
    
    vector<string> findNeighbors(string s, unordered_set<string> &dict) {
        vector<string> ret;
        for(int i=0; i<s.size(); i++) {
            char c = s[i];
            for(int j=0; j<26; j++) {
                if(c=='a'+j) continue;
                s[i] = 'a'+j;
                if(dict.count(s)) {
                    ret.push_back(s);    
                    dict.erase(s);    
                }
            }
            s[i] = c;
        }
        return ret;
    }
};

Word Ladder II

[LeetCode] Simplify Path

Given an absolute path for a file (Unix-style), simplify it.

For example,
path = "/home/", => "/home"
path = "/a/./b/../../c/", => "/c"

click to show corner cases.

Corner Cases:

Did you consider the case where path = "/../"?
In this case, you should return "/".
Another corner case is the path might contain multiple slashes '/' together, such as "/home//foo/".
In this case, you should ignore redundant slashes and return "/home/foo".

思路：

Unix的path规则可以在这里了解：

http://en.wikipedia.org/wiki/Path_(computing)

归下类的话，有四种字符串：

1. "/"：为目录分隔符，用来分隔两个目录。

2. "."：当前目录

3. ".."：上层目录

4. 其他字符串：目录名

简化的核心是要找出所有的目录，并且如果遇到".."，需要删除上一个目录。

class Solution {
public:
    string simplifyPath(string path) {
        string ret, curDir;
        vector<string> allDir;
        
        path.push_back('/');
        for(int i=0; i<path.size(); i++) {
            if(path[i]=='/') {
                if(curDir.empty()) {
                    continue;
                }
                else if(curDir==".") {
                    curDir.clear();
                }
                else if(curDir=="..") {
                    if(!allDir.empty())
                        allDir.pop_back();
                    curDir.clear();
                }
                else {
                    allDir.push_back(curDir);
                    curDir.clear();
                }
            }
            else {
                curDir.push_back(path[i]);
            }
        }
        
        
        for(int i=0; i<allDir.size(); i++) 
            ret.append("/"+allDir[i]);
        if(ret.empty()) ret = "/";
        return ret;
    }
};

Wednesday, November 26, 2014

[LeetCode] Longest Valid Parentheses

Given a string containing just the characters '(' and ')', find the length of the longest valid (well-formed) parentheses substring.

For "(()", the longest valid parentheses substring is "()", which has length = 2.

Another example is ")()())", where the longest valid parentheses substring is "()()", which has length = 4.

思路1：DP

求极值问题一般想到DP或Greedy，显然Greedy在这里不太适用，只有用DP了。

1. 状态：
DP[i]：以s[i-1]为结尾的longest valid parentheses substring的长度。

2. 通项公式：
s[i] = '('：
DP[i] = 0

s[i] = ')'：找i前一个字符的最长括号串DP[i]的前一个字符j = i-2-DP[i-1]
DP[i] = DP[i-1] + 2 + DP[j]，如果j >=0，且s[j] = '('
DP[i] = 0，如果j<0，或s[j] = ')'

......... ( x x x x )
j i-2 i-1

证明：不存在j' < j，且s[j' : i]为valid parentheses substring。
假如存在这样的j'，则s[j'+1 : i-1]也valid。那么对于i-1来说：
( x x x x x
j' j'+1 i-1
这种情况下，i-1是不可能有比S[j'+1 : i-1]更长的valid parentheses substring的。

3. 计算方向
显然自左向右，且DP[0] = 0

class Solution {
public:
    int longestValidParentheses(string s) {
        int n = s.size(), maxLen = 0;
        vector<int> dp(n+1,0);
        for(int i=1; i<=n; i++) {
            int j = i-2-dp[i-1];
            if(s[i-1]=='(' || j<0 || s[j]==')') 
                dp[i] = 0;
            else {
                dp[i] = dp[i-1]+2+dp[j];
                maxLen = max(maxLen, dp[i]);
            }
        }
        return maxLen;
    }
};

思路2：stack

括号题自然又想到了stack。仔细想下，这题目其实是Valid Parentheses的升级版。一个valid parentheses substring必然以'('开头以')'结尾，并且中间一定也是一个valid parentheses substring。这也意味着依照Valid Parentheses解法那样，用stack来存储没有配对的'('和')'，并用相应的')'来消去stack top的'('，最长substring的首尾必然会在这个过程中相消。

1. 由于我们需要知道它们之间的长度，所以stack里可以存储各个'('的坐标。
2. 为了能正确计算类似“ ) ( ) ( ) ”这种valid substring连接而组成的valid substring。我们必须也插入')'进stack，作为边界来计算长度。
3. 为了能在stack中区分左右括号，每个插入item定义为pair<int,int>。first为坐标，second表示左还是右括号。

class Solution {
public:
    int longestValidParentheses(string s) {
        stack<pair<int,int>> stk;   // first: index, second: 0:'(', 1:')'
        int maxLen = 0, curLen = 0;
        for(int i=0; i<s.size(); i++) {
            if(s[i]=='(')   // left parenthesis
                stk.push(make_pair(i,0));
            else {          // right parenthesis
                if(stk.empty() || stk.top().second==1)
                    stk.push(make_pair(i,1));
                else {
                    stk.pop();
                    if(stk.empty())
                        curLen = i+1;
                    else
                        curLen = i-stk.top().first;    
                    maxLen = max(maxLen, curLen);
                }
            }
        }
        return maxLen;
    }
};

[LeetCode] Distinct Subsequences

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"

Return 3.

思路：

一看要求解的个数，然后又是string匹配，而且形式上和Minimum Edit Distance那题很像，基本就是DP题跑不了了。DP题惯例的三步走：定义状态，推导递推公式，确定状态计算方向和起始状态。

1. 状态：i, j分别表示T中长度为i的prefix：T[0:i-1]，和S中长度为j的prefix：S[0:j-1]。

DP[i][j]：S[0:j-1]中存在T[0:i-1]作为distinct subsequence的个数。显然如果j<i，DP[i][j] = 0。

2. 递推公式：

(a) T[i]!=s[j]:

T = r a b

S = r c a c b c

DP[i+1][j+1] = DP[i+1][j]

(b) T[i] = s[j]:

T = r a b b

S = r a b b b - DP[i+1][j] = 1

S = r a b b b - DP[i][j] = 2

S = r a b b b /

DP[i+1][j+1] = DP[i][j] + DP[i+1][j]

公式总结：
S[j-1]!= T[i-1]：DP[i][j] = DP[i][j-1]
S[j-1]==T[i-1]：DP[i][j] = DP[i-1][j-1] + DP[i][j-1]

3. 计算方向和起始状态：

DP[i][j]

DP[i+1][j] DP[i+1][j+1]

所以从上向下，从左到右的顺序可以计算。

根据计算顺序，需要先设置第0行、0列的值。
第0列：DP[i][0] = 0，i>0。因为T的长度大于S的长度，不可能成为S的subsequence。
第0行：DP[0][j] = 1，j>=0。这是为了保证第1行的计算正确：

T = r
S = r a r b r c

r a r b r c
1 1 1 1 1 1 1
r 0 1 1 2 2 3 3

4. 计算优化：用滚动数组减少内存消耗。

class Solution {
public:
    int numDistinct(string S, string T) {
        int n = S.size(), m = T.size();
        vector<int> dp(n+1, 1);
        
        for(int i=1; i<=m; i++) {
            int upLeft = dp[0];
            dp[0] = 0;
            for(int j=1; j<=n; j++) {
                int temp = dp[j];
                dp[j] = dp[j-1];
                if(S[j-1]==T[i-1]) dp[j] += upLeft;
                upLeft = temp;
            }
        }
        
        return dp[n];
    }
};

Tuesday, November 25, 2014

[LeetCode] Palindrome Partitioning I, II

Palindrome Partitioning I

Given a string s, partition s such that every substring of the partition is a palindrome.

Return all possible palindrome partitioning of s.

For example, given s = "aab",
Return

  [
    ["aa","b"],
    ["a","a","b"]
  ]

Palindrome Partitioning II

Given a string s, partition s such that every substring of the partition is a palindrome.

Return the minimum cuts needed for a palindrome partitioning of s.

For example, given s = "aab",
Return 1 since the palindrome partitioning ["aa","b"] could be produced using 1 cut.

思路：Palindrome Partitioning I

遇到要求所有组合、可能、排列等解集的题目，一般都是用DFS + backtracking来做。要分割回文的前提是能够判断回文。在做DFS的时候，如果每次从start出发查找s[start:end]是否是回文，会使算法复杂度大大增加。而在Longest Palindromic Substring这题中已经知道如何用DP来计算任意s[i:j]是否是回文。因此可以先计算该判断矩阵，在DFS的时候用来剪枝，大大提高效率。

class Solution {
public:
    vector<vector<string>> partition(string s) {
        int n = s.size();
        vector<vector<string>> ret;
        vector<string> sol;
        vector<vector<bool>> isPal(n, vector<bool>(n,false));
        for(int i=n-1; i>=0; i--) {
            for(int j=i; j<n; j++) {
                if((i+1>=j-1 || isPal[i+1][j-1]) && s[i]==s[j])
                    isPal[i][j] = true;
            }
        }
        findPartitions(s, 0, isPal, sol, ret);
        return ret; 
    }
    
    void findPartitions(string &s, int start, vector<vector<bool>> &isPal, vector<string> &sol, vector<vector<string>> &ret) {
        if(start==s.size()) {
            ret.push_back(sol);
            return;
        }
        
        for(int i=start; i<s.size(); i++) {
            if(isPal[start][i]) {
                int len = i-start+1;
                sol.push_back(s.substr(start, len));
                findPartitions(s, i+1, isPal, sol, ret);
                sol.pop_back();
            }
        }
    }
};

思路：Palindrome Partitioning II

整体的思路是一维DP。DP[i]表示长度为i的prefix：s[0: i-1]的min cut数量。
DP[i] = min (DP[j] + 1) ，对所有 0<=j<i，且s[j: i-1]为palindrome。
和I同样的技巧，用DP先计算存储一个palindrome判断矩阵isPal[i][j]，便于后面一维DP计算中能迅速判断s[j: i-1]是否为palindrome。

有一个更优的解法参考：
https://oj.leetcode.com/discuss/9476/solution-does-not-need-table-palindrome-right-uses-only-space

class Solution {
public:
    int minCut(string s) {
        int n = s.size();
        if(n<=1) return 0;
        vector<vector<bool>> isPal(n, vector<bool>(n, false));
        for(int i=n-1; i>=0; i--) {
            for(int j=i; j<n; j++) {
                if((i+1>j-1 || isPal[i+1][j-1]) && s[i]==s[j])
                    isPal[i][j] = true;
            }
        }
        
        vector<int> dp(n+1,INT_MAX);
        dp[0] = -1;
        for(int i=1; i<=n; i++) {
            for(int j=i-1; j>=0; j--) {
                if(isPal[j][i-1]) {
                    dp[i] = min(dp[i], dp[j]+1);
                }
            }
        }
        return dp[n];
    }
};

[LeetCode] Text Justification

Given an array of words and a length L, format the text such that each line has exactly L characters and is fully (left and right) justified.

You should pack your words in a greedy approach; that is, pack as many words as you can in each line. Pad extra spaces ' ' when necessary so that each line has exactlyL characters.

Extra spaces between words should be distributed as evenly as possible. If the number of spaces on a line do not divide evenly between words, the empty slots on the left will be assigned more spaces than the slots on the right.

For the last line of text, it should be left justified and no extra space is inserted between words.

For example,
words: ["This", "is", "an", "example", "of", "text", "justification."]
L: 16.

Return the formatted lines as:

[
   "This    is    an",
   "example  of text",
   "justification.  "
]

Note: Each word is guaranteed not to exceed L in length.

click to show corner cases.

Corner Cases:

A line other than the last line might contain only one word. What should you do in this case?
In this case, that line should be left-justified.

思路：

比较麻烦的字符串细节实现题。需要解决以下几个问题：

1. 首先要能判断多少个word组成一行：

这里统计读入的所有words的总长curLen，并需要计算空格的长度。假如已经读入words[0:i]。当curLen + i <=L 且加curLen + 1 + word[i+1].size() > L时，一行结束。

2. 知道一行的所有n个words，以及总长curLen之后要决定空格分配：

平均空格数：k = (L - curLen) / (n-1)

前m组每组有空格数k+1：m = (L - curLen) % (n-1)

例子：L = 21，curLen = 14，n = 4

k = (21 - 14) / (4-1) = 2

m = (21 - 14) % (4-1) = 1

A---B--C--D

3. 特殊情况：

(a) 最后一行：当读入到第i = words.size()-1 个word时为最后一行。该行k = 1，m = 0

(b) 一行只有一个word：此时n-1 = 0，计算(L - curLen)/(n-1)会出错。该行k = L-curLen, m = 0

class Solution {
public:
    vector<string> fullJustify(vector<string> &words, int L) {
        int start = 0, end = -1, totLen = 0;
        bool isLast = false;
        vector<string> ret;
        int i=0;
        while(i<words.size()) {
            if(words[i].size()>L) return ret;
            int newLen = totLen + (end-start+1) + words[i].size();
            if(newLen<=L) { // words[i] can be in current line
                end  = i;
                totLen += words[i].size();
                i++;
            }
            else {  // word[i-1] is the end of a line
                string line = createLine(words, L, start, end, totLen, false);
                ret.push_back(line);
                start = i;
                end = i-1;
                totLen = 0;
            }
        }
        
        string lastLine = createLine(words, L, start, end, totLen, true);
        ret.push_back(lastLine);
        return ret;
    }
    
    
    string createLine(vector<string> &words, int L, int start, int end, int totLen, bool isLast) {
        string ret;
        if(start<0 || end>=words.size() || start>end) return ret;
        ret.append(words[start]);
        int n = end-start+1;
        
        // special case: one word or last line - left justified
        if(n==1 || isLast) {
            for(int i=start+1; i<=end; i++) 
                ret.append(" " + words[i]);
            int j = L-totLen-(n-1);
            ret.append(j, ' ');
            return ret;
        }
        
        // normal case: fully justified
        int k = (L-totLen)/(n-1), m = (L-totLen)%(n-1);
        for(int i=start+1; i<=end; i++) {
            int nspace = i-start<=m ? k+1 : k;
            ret.append(nspace,' ');
            ret.append(words[i]);
        }
        return ret;
    }
};

[LeetCode] Edit Distance

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character

思路：

很多算法教科书上都有的经典二维DP问题。

1. 状态：

DP[i+1][j+1]：word1[0:i] -> word2[0:j]的edit distance。

2. 通项公式：

考虑word1[0:i] -> word2[0:j]的最后一次edit。无非题目中给出的三种方式：

a) 插入一个字符：word1[0:i] -> word2[0:j-1]，然后在word1[0:i]后插入word2[j]

DP[i+1][j+1] = DP[i+1][j]+1

b) 删除一个字符：word1[0:i-1] -> word2[0:j]，然后删除word1[i]

DP[i+1][j+1] = DP[i][j+1]+1

c) 替换一个字符：word1[0:i-1] -> word2[0:j-1]

word1[i] != word2[j]时，word1[i] -> word2[j]：DP[i+1][j+1] = DP[i][j] + 1

word1[i] == word2[j]时：DP[i+1][j+1] = DP[i][j]

所以min editor distance应该为：

DP[i+1][j+1] = min(DP[i][j] + k, DP[i+1][j]+1, DP[i][j+1]+1)

word1[i]==word2[j] -> k = 0, 否则k = 1

3. 计算方向：

replace (i, j) delete (i, j+1)

insert (i+1, j) (i+1, j+1)

可见要求DP[i+1][j+1]，必须要知道二维矩阵中左上，上方和下方的3个值。所以当我们确定第0行和第0列的值后，就可以从上到下、从左到右的计算了。

4. 起始、边界值

DP[0][i] = i： word1为空，要转化到word2[0:i-1]，需要添加i个字符。

DP[i][0] = i： word2为空，要从word1转化到空字符串，需要删除i个字符。

class Solution {
public:
    int minDistance(string word1, string word2) {
        int m = word1.size(), n = word2.size();
        vector<vector<int>> dp(m+1, vector<int>(n+1, 0));
        for(int j=1; j<=n; j++)
            dp[0][j] = j;
        
        for(int i=1; i<=m; i++) {
            dp[i][0] = i;
            for(int j=1; j<=n; j++) {
                dp[i][j] = dp[i-1][j-1];
                if(word1[i-1]!=word2[j-1]) dp[i][j]++;
                dp[i][j] = min(min(dp[i-1][j]+1, dp[i][j-1]+1), dp[i][j]);
            }
        }
        
        return dp[m][n];
    }
};