LeetCode: Longest Substring Without Repeating Characters

By | 2012 年 9 月 30 日

Given a string, find the length of the longest substring without repeating characters. For example, the longest substring without repeating letters for “abcabcbb” is “abc”, which the length is 3. For “bbbbb” the longest substring is “b”, with the length of 1.

Analyze

贪心法,从头扫描到尾,O(n)搞定。
用一个map[256]来记录出现过的字符位置。
遇到没有出现过的字符,将map对应位置标记,并且子串长度+1;
遇到出现过的字符,将子串自上一次出现该字符位置前面的截断,对这部分截断的字符,标记map为notFound,重新计算子串长度。

The idea is that scan from beginning to the end, in the process, keep tracking the current sub-string without duplication.
When duplication involved, hash set or hash map is the powerful tool we can use.
So, we can setup a hash map here, key is character, value is the index of this char in the string.

Thus, while we scan the string, when the current char is absence from the map, record the index of current char by setting the map with map[c] = i, and also updating the current valid substring length and the longest length if necessary;
When the current char is present in the map, then we should truncate the valid sub-string to the last position of current character.
e.g. if string is “abcb”, when we at the position of the second “b”, the map should be: {'a' => 0, 'b' => 1, 'c' => 2}, the current valid string is “abc”. Then, the new truncated valid sub-string should be “cb”. Thus, we have to nullify the keys before previous “b”, e.g. “a” and “b” here. To change the map to {'b' => 3, 'c' => 2}. This is the intuitive Ruby Solution #1.

A better idea is not to track the length of the current valid sub-string. Instead, keep tracking the starting index of the valid sub-string. The good part is that when met an existing char in the valid sub-string, we can just update this starting index and leave the map untouched for the keys before the last index of this char. e.g. for the above example, if we proceed in this way, we can skip nullify the value for key “a”. This results into Ruby Solution #2.

Solutions

C++, 44ms过大集合

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        int map[256];
        const int notFound = -1;
        for(int i = 0 ; i < 256; ++i){
            map[i] = notFound;
        }
        int maxlen = 0;
        int len = 0;
        for(int i = 0 ; i < s.size(); ++i) {
            char si = s[i];
            int lastSi = map[si];
            if(lastSi == notFound) {
                ++len;
                map[si] = i;
                if(maxlen < len)
                    maxlen = len;
            } else {
                int curStart = i - len;
                for(int j = curStart; j < lastSi; ++j) {
                    map[s[j]] = notFound;
                }
                curStart = lastSi + 1;
                len = i - curStart + 1;
                map[si] = i;
            }
        }
        return maxlen;
    }
};

Ruby

Solution 1:

# @param {String} s
# @return {Integer}
def length_of_longest_substring(s)
    map = {}
    longest_length = 0
    current_length = 0
    s.each_char.with_index do |c, i|
        # puts "#{c}:#{i}"
        if last_index = map[c]
            map.each do |k,v|
                map[k] = nil if v && v < last_index
            end
            # p map
            current_length = i - last_index
        else
            # p "+1"
            current_length += 1
            longest_length = current_length if longest_length < current_length
        end
        map[c] = i
    end
    longest_length 
end

Solution 2: 2016-04-22

# @param {String} s
# @return {Integer}
def length_of_longest_substring(s)
    max = 0
    c2imap = {}
    si = -1
    s.chars.each_with_index do |c, i|
        if ci = c2imap[c] and ci > si
            si = ci
        end
        c2imap[c] = i
        new_length = i - si
        max = new_length if max < new_length
    end
    max
end

15 thoughts on “LeetCode: Longest Substring Without Repeating Characters

  1. 潘双庆

    不能说计算lenth总次数不超过n就说时间复杂度为为n,但思想挺好滴

    Reply
    1. uniEagle Post author

      用大O记法应该是O(n)的,因为追踪整个算法的执行过程,可以看到我们关注的字串前端是从0走到n-1,后端总是递增,所以最差情况下就是2n,O(2n) = O(n)

      Reply
    1. uniEagle Post author

      而且,你的算法貌似是O(n^2)的吧?在最坏的情况下,比如一个完全没有重复的字符串,需要n^2才能运算完。

      Reply
        1. uniEagle Post author

          当然,如果已知字符集是a-z + A-Z的话,可以减小map的大小。我这里是偷个懒直接开256。
          map[si]是s[i]这个字符在s中的出现位置(上一次出现),如果map[si] == NotFound,说明这个字符s[i]还没有出现过。

          Reply
  2. shyJohnnie

    这个是o(n2)的,因为最快情况每次都要对N个前面的字符的出现记录重新复制。

    Reply
    1. uniEagle Post author

      不对,虽然最坏情况下会往后倒退n-1个位置,但是从头到尾,后退计算lenth的总次数不会超过n。所以整体上还是O(n+n)=O(n)的。
      比如”bbbbbbbb”,每次都退1,但是最多也就倒推了n-1次。
      又比如“abcabc”,在过程中,倒退的执行次数是”000111″ = 3次也不会超过n。

      Reply
      1. shyJohnnie

        明白了,简单来说就是curStart也是递增的,而每次需要更新的[curStart, lastSi]区间是不覆盖的。
        能用正式点的语言证明这个平摊分析吗?

        Reply

发表评论

电子邮件地址不会被公开。 必填项已用*标注