LeetCode题目:Edit Distance,字符串之间的编辑距离,动态规划

By | 2012 年 9 月 19 日

这个题目花了我太长的时间,方向是对的,不过一直没有找到关键的递推。

题目描述
Edit Distance
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character

简单的说,就是将一个字符串变成另外一个字符串所用的最少操作数,每次只能增加、删除或者替换一个字符。


做这个题过程挺曲折

      一开始考虑的是,只要找到能重用的最大子串不就行了么,比如将字符:
      abc变成xxxxxaxxxxxbxxxxxcxxxxx,只要abc都可以重用,就只需要20步。
      但是写完了这个方法之后(见下面失败方法1),被一个简单的case击败了:
      将ykyyyab变成abxxxxx,
      这种情况下,如果利用了最长子串ab的话,编辑距离实际上是10,但是不用的话只需要7。也就是说,最长子串不一定能降低编辑距离。


      然后就又想,上面这种失败case的原因在于为了利用最长子串,导致的头尾一删一添,两次操作,还不如只需一次的替换操作。
      于是就限制了一下匹配的区间,比如abxxxxx中的b在匹配的时候,只在kyyya这个范围内才有实际作用,超过这个区间再匹配就会造成得不偿失的一删一添两次操作,得不偿失。
      于是写完代码,悲剧的是,又发现问题:
      sea变成eat,
      按这个算法,编辑距离是3,但实际上是2(删s添t)。


      最后花了太长时间忍不了了,百度之,发现正确的解出来方法是用二维的dp。
      假如我们要将字符串str1变成str2
      sstr1(i)是str1的子串,范围[0到i),sstr1(0)是空串
      sstr2(j)是str2的子串,同上
      d(i,j)表示将sstr1(i)变成sstr2(j)的编辑距离

      首先d(0,t),0<=t<=str1.size()和d(k,0)是很显然的。 当我们要计算d(i,j)时,即计算sstr1(i)到sstr2(j)之间的编辑距离, 此时,设sstr1(i)形式是somestr1c;sstr2(i)形如somestr2d的话, 将somestr1变成somestr2的编辑距离已知是d(i-1,j-1) 将somestr1c变成somestr2的编辑距离已知是d(i,j-1) 将somestr1变成somestr2d的编辑距离已知是d(i-1,j) 那么利用这三个变量,就可以递推出d(i,j)了: 如果c==d,显然编辑距离和d(i-1,j-1)是一样的 如果c!=d,情况稍微复杂一点,

      1. 如果将c替换成d,编辑距离是somestr1变成somestr2的编辑距离 + 1,也就是d(i-1,j-1) + 1
      2. 如果在c后面添加一个字d,编辑距离就应该是somestr1c变成somestr2的编辑距离 + 1,也就是d(i,j-1) + 1
      3. 如果将c删除了,那就是要将somestr1编辑成somestr2d,距离就是d(i-1,j) + 1
      4. 那最后只需要看着三种谁最小,就采用对应的编辑方案了。

      递推公式出来了,程序也就出来了。



终于过了,72ms过大集合

class Solution {
public:
    int minDistance(string word1, string word2) {
        int rows = word1.size() + 1;
        int cols = word2.size() + 1;
        int ** d = (int**)malloc(rows * sizeof(int*));
        for(int i = 0; i < rows; ++i){
            d[i] = (int*)malloc(cols * sizeof(int));
            d[i][0] = i;//sub string in word1 range [0,i) edit to ""
        }
        for(int j = 0; j < cols; ++j)
            d[0][j] = j;//sub string in word2 range [0,j) edit to ""
        for(int i = 1; i < rows; ++i) {
            char ci = word1[i-1];
            for(int j = 1; j < cols; ++j) {
                char cj = word2[j-1];
                //we will edit str1:word1[0,i) to str2:word2[0,j)
                if (ci == cj) {
                    //if ci equal to cj, then the edit ditance of word1[0,i) to word2[0,j)
                    // is the same as word1[0,i-1) to word2[0,j-1)
                    d[i][j] = d[i-1][j-1];
                } else {
                    //if we modify letter ci to cj, there will be 1 operation
                    int dEdit = d[i-1][j-1] + 1;
                    //if we add cj to the end of word1[0,i), then from the edit distance of 
                    // word1[0,i) to word2[0, j -1), we can conclude follow dist
                    int dAdd = d[i][j-1] + 1;
                    //if we delete ci from word1[0,i), and we know the dist
                    // from word1[0,i-1) to word2[0,j), the things done:
                    int dDel = d[i-1][j] + 1;
                    //the minimum one will be the final distance for str1 to str2
                    int min = dEdit < dAdd ? dEdit : dAdd;
                    min = min < dDel ? min : dDel;
                    d[i][j] = min;
                }
            }
        }
        int result = d[rows - 1][cols - 1];
        //delete dist;
        return result;
    }
};



这是两次错误的尝试

  1. 通过寻找最长公共子序列(可间断序列)来解决
    class Solution {
    public:
        class datum{
            public: 
                int lastIndex;
                int len;
                datum(){
                    lastIndex = -1;
                    len = 0;
                }
        };
        inline int locationOfCharInStr(char c, string &str, int startIndex) {
            for( int i = startIndex; i < str.size() ; ++ i ) {
                if (c == str[i])
                    return i;
            }
            return -1;
        }
        int longestCommonArray(string &str1, string &str2) {
            if(str1.size() == 0 || 0 == str2.size())
                return 0;
            vector maxLen(str1.size());
            for( int i = 0; i <  str1.size(); ++i ) {
                char ci = str1[i];
                datum &di = maxLen[i];
                for (int pre = 0 ; pre < i; ++pre ) {
                    datum dpre = maxLen[pre];
                    if (dpre.len == 0 ) continue;
                    int cie = locationOfCharInStr(ci,str2,dpre.lastIndex + 1);
                    if( cie != -1 && di.len < dpre.len + 1 ) {
                        di.len = dpre.len + 1;
                        di.lastIndex = cie;
                    }
                }
                if(di.len == 0) {
                    int cie = locationOfCharInStr(ci,str2,0);
                    if( cie != -1 ) {
                        di.len = 1;
                        di.lastIndex = cie;
                    }
                }
            }
            int maxlen = 0;
            for(int i = 0 ; i < maxLen.size(); ++i) {
                if (maxlen < maxLen[i].len)
                    maxlen = maxLen[i].len;
            }
            return maxlen;
        }
        int minDistance(string word1, string word2) {
            int lcsl = longestCommonArray(word1,word2);
            int maxlen = word1.size();
            if(maxlen < word2.size())
                maxlen = word2.size();
            return maxlen - lcsl;
        }
    };
    
  2. 通过限制查找范围来解决
    class Solution {
    public:
        class datum{
            public: 
                int loc;
                int len;
                datum(){
                    loc = -1;
                    len = 0;
                }
        };
        inline int locationOfCharInStr(char c, string &str, int startIndex, int endIndex) {
            if(endIndex > str.size() - 1)
                endIndex = str.size() - 1;
            for( int i = startIndex; i <= endIndex ; ++ i ) {
                if (c == str[i])
                    return i;
            }
            return -1;
        }
        int minDist(string &str1, string &str2) {
            vector minLen(str2.size());
            for( int i = 0; i <  str2.size(); ++i ) {
                char ci = str2[i];
                datum &di = minLen[i];
                di.len = str1.size();
                di.loc = i;
                int endIndex = str1.size() - str2.size() + i;
                for (int pre = 0 ; pre < i; ++pre ) {
                    datum dpre = minLen[pre];
                    int startIndex = dpre.loc + i - pre;
                    int cie = locationOfCharInStr(ci,str1,startIndex,endIndex);
                    if( cie != -1 && (di.len >= dpre.len ||
                                      di.loc > cie)) {
                        di.len = dpre.len - 1;
                        di.loc = cie;
                    }
                }
                if(di.len == str1.size()) {
                    int cie = locationOfCharInStr(ci,str1,i,endIndex);
                    if( cie != -1 ) {
                        di.len = str1.size() - 1;
                        di.loc = cie;
                    }
                }
            }
            int result = str1.size();
            for(int i = 0 ; i < minLen.size(); ++i) {
                if (result > minLen[i].len)
                    result = minLen[i].len;
            }
            return result;
        }
        int minDistance(string word1, string word2) {
            // Start typing your C/C++ solution below
            // DO NOT write int main() function
            string *pSLen, *pSShort;
            if(word1.size() >= word2.size()){
                pSLen = &word1;
                pSShort = &word2;
            } else {
                pSLen = &word2;
                pSShort = &word1;
            }
            if (pSShort->size() <= 0)
                return pSLen->size();
            int result = minDist(*pSLen,*pSShort);
            return result;
        }
    };
    



Code rewrite at 2012-12-29


// // main.cpp // EditDistance // // Created by Qiu Xiangyu on 12-12-29. // Copyright (c) 2012年 Qiu Xiangyu. All rights reserved. // #include <iostream> #include <string> #include <vector> using namespace std; int editDist(string &str0, string &str1) { if (str0.size() == 0) { return (int)str1.size(); } else if (str1.size() == 0) { return (int)str0.size(); } vector<vector<int> > dists(str0.size() + 1, vector<int>(str1.size() + 1,0)); for(int i0 = 0; i0 <= str0.size(); ++ i0) { dists[i0][0] = i0; } for (int i1 = 0; i1 <= str1.size(); ++i1) { dists[0][i1] = i1; } for (int l0 = 1; l0 <= str0.size(); ++l0) { for (int l1 = 1; l1 <= str1.size(); ++l1) { if (str0[l0 - 1] == str1[l1-1]) { //the same ending of this length pair for strings dists[l0][l1] = dists[l0-1][l1-1]; } else { //for replace int replaceDist = dists[l0-1][l1-1] + 1; int mindist = replaceDist; //insert the last char into substr1 int insertDist = dists[l0][l1-1] + 1; if (mindist > insertDist) { mindist = insertDist; } //delete the last char from substr1 int deleteDist = dists[l0-1][l1] + 1; if (mindist > deleteDist) { mindist = deleteDist; } dists[l0][l1] = mindist; } } } return dists[str0.size()][str1.size()]; } int main(int argc, const char * argv[]) { string str0 = "abc"; string str1 = "bcd"; int editdist = editDist(str0, str1); cout<<"edit distance is : "<<editdist<<endl; return 0; }



Code rewrite 2012-12-30

class Solution {
public:
    int minDistance(string word1, string word2) {
        // Start typing your C/C++ solution below
        // DO NOT write int main() function
        vector > dists(word1.size() + 1, vector(word2.size() + 1, 0));
        for(int l1 = 1; l1 <= word1.size(); ++l1) {
            //by adding chars in sub-word1 to sub-word2 with zero length
            dists[l1][0] = l1;
        }
        for(int l2 = 1; l2 <= word2.size(); ++l2) {
            dists[0][l2] = l2;
        }
        for(int l1 = 1; l1 <= word1.size(); ++l1) {
            for(int l2 = 1; l2 <= word2.size(); ++l2) {
                if(word1[l1-1] == word2[l2-1]) {
                    dists[l1][l2] = dists[l1-1][l2-1];
                } else {
                    //if we can form sub-word1 with length l1-1 to sub-word2 with length l2-1, 
                    //with steps of dists[l1-1][l2-1]
                    //then we can replace the char sub-word2[l2] to sub-word1[l1], 
                    //then we can edit sub-word2 with length l2 to sub-word1 with length l1 
                    //with steps of dists[l1-1][l2-1] + 1
                    int replacedist = dists[l1-1][l2-1] + 1;
                    //if we can form sub-word1 with length l1-1 to sub-word2 with length l2, 
                    //with steps of dists[l1-1][l2]
                    //then we can add the char sub-word1[l1] after sub-word2[l2],
                    //or se can delete the char sub-word1[l1]
                    //then we can edit sub-word2 with length l2 to sub-word1 with length l1 
                    //with steps of dists[l1-1][l2] + 1
                    int adddist = dists[l1-1][l2] + 1;
                    //or if we can form sub-word1 with length l1 to sub-word2 with length l2-1,
                    //with steps of dists[l1][l2-1]
                    //then we can add the char sub-word2[l2] after sub-word1[l1],
                    //or we can delete the char sub-word2[l2]
                    //then we can edit sub-word2 with length l2 to sub-word1 with length l1
                    //with steps of dists[l1][l2-1] + 1
                    int deldist = dists[l1][l2-1] + 1;
                    
                    int mindist = replacedist;
                    if(mindist>adddist) mindist = adddist;
                    if(mindist>deldist) mindist = deldist;
                    dists[l1][l2] = mindist;
                }
            }
        }
        return dists[word1.size()][word2.size()];
    }
};

6 thoughts on “LeetCode题目:Edit Distance,字符串之间的编辑距离,动态规划

  1. weikan

    非常支持把思考过程写完整的算法博客!!看到很多博客都是直接摆上公式,写了等于白写,看都不想看啊

    Reply
  2. Pingback: Edit Distance — leetcode – 剑客|关注科技互联网

  3. Pingback: Edit Distance | luckworld

  4. Pingback: Edit Distance | cloris1000

发表评论

电子邮件地址不会被公开。 必填项已用*标注