LeetCode: Median of Two Sorted Arrays

By | 2012 年 10 月 4 日

Updated at 2016-04-24

Question

Median of Two Sorted Arrays
There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Analyze

This is a hard question, when I met this question again after 4 years, it still takes much time to solve. But this time, with a more clear thought on it.

This question needs find 1 or 2 number(s) in the middle of 2 sorted array and has strict time limit on it. At the first thought, 2-way divide will be the solution. But how to apply it, is a big challenge.

First transform this question to another more general question will help a lot: find the k-th number in two sorted array. Then for this Leetcode question, k is (nums1.size + nums2.size) / 2 for odd total. For even total, need find k-th and (k+1)-th number.

Then there are too many options on this question: divide arrays by their own middle number (too complex go from here, have 4 parts, conditions to determine which part to drop is very hard with edge cases); Or divide k to k/2 and k-k/2, more simple, less edge cases, but need consider how to handle the situation when one array is shorter than k/2; for even total number, find n/2 and n/2 + 1 separately or at a time is another consideration.

At last, I worked out the solution by divide k and find k and k+1 at once (for even total numbers). For this solution, some facts are important:
Let’s say those 2 arrays are A and B. A has n numbers and B has m numbers.
So:
A = a0 , a1 , … , an-1
B = b0 , b1 , … , bm-1
And for simplicity, let’s only consider total number is odd, so k = (n + m)/2 .

Then divide k as equally as we can to ka = k/2 and kb = k - k/2, it’s apparent that kb >= ka (you can get this conclusion by check k from 0). Then if we use ka and kb to segment those two arrays we got:
a0 , a1 , … , aka-1 | aka , … , an-1
b0 , b1 , … , bkb-1 | bkb , … , bm-1

Then we can either drop a0 , a1 , … , aka-1 or b0 , b1 , … , bkb-1 from our search. Try my best to explain this idea as clear as possible in the following analyze:

From here, compare the aka-1 and bkb-1.
If aka-1 <= bkb-1 , then we can make sure the beginning of the array A is included in the first k elements of the whole sorted array.

Why? Since bkb-1 is equal or bigger than aka-1, then the tail of Array B and the tail of Array A are behind aka-1 for sure. The only possible items may be placed before aka-1 is the items in the front of Array B: b0 , b1 , … , bkb-2. But even in that case, there are only ka + kb – 1 < k numbers. There is not enough number can push aka-1 out of the first k numbers of the whole sorted array.

Then, with this conclusion that a0 , a1 , … , aka-1 are included in the first k elements of the whole sorted array, and we are looking for the k and maybe k+1 elements. So they are unrelated to our task, thus we can exclude them from our search. Then we can further solve the problem by finding (k-ka)-th element in array aka , … , an-1 and b0 , … , bm-1

And if in other case that aka-1 > bkb-1, we can exclude the beginning of B b0 , b1 , … , bkb-1 from our search. And we can find the (k-kb)-th element in the arrays a0 , … , an-1 and bkb , … , bm-1

In this manner we can find the target in O(m + n) time limit.

And for the case that total numbers are even, when we found k-th element, just do another compare of the heads of array, then we can get the (k+1)-th number. It’s simple, but will confuse a lot if we take this into consideration of the whole process at the beginning.

And the edge cases are simple, won’t explain them here.

Codes

C++ version

class Solution {
public:
    inline int findFirstAndPop(vector<int>& nums1, vector<int>& nums2, int& s1, int& s2) {
        if(nums2.size() <= s2 || nums1.size() > s1 && nums1[s1] <= nums2[s2]) {
            return nums1[s1++];
        } else {
            return nums2[s2++];
        }
    }
    double findK(int k, int kn, vector<int>& nums1, vector<int>& nums2, int s1 = 0, int s2 = 0) {
        // find the short array
        vector<int> *pShort, *pLong;
        int sShort, sLong;
        if (nums1.size() - s1 <= nums2.size() - s2) {
            pShort = &nums1;
            pLong = &nums2;
            sShort = s1;
            sLong = s2;
        } else {
            pShort = &nums2;
            pLong = &nums1;
            sShort = s2;
            sLong = s1;
        }
        vector<int> &arrShort = *pShort;
        vector<int> &arrLong = *pLong;
        // edge case 1, empty short array
        if (arrShort.size() == sShort) {
            return (arrLong[k + sLong] + arrLong[kn + sLong]) / 2.0;
        }
        // edge case 2
        if (k == 0) {
            double sum = findFirstAndPop(arrShort, arrLong, sShort, sLong);
            if (k == kn) {
                return sum;
            }
            sum += findFirstAndPop(arrShort, arrLong, sShort, sLong);
            return sum / 2.0;
        }

        // divide k
        int ks = k / 2;
        int kl = k - ks;
        if (ks > arrShort.size() - sShort) {
            ks = arrShort.size() - sShort;
            kl = k - ks;
        } else if (ks == 0) {
            ks = 1;
        }

        // compare pivots
        if (arrShort[sShort + ks - 1] <= arrLong[sLong + kl - 1]) {
            // remove head of short array
            return findK(k - ks, kn - ks, arrShort, arrLong, sShort + ks, sLong);
        } else {
            // remove the head of long array
            return findK(k - kl, kn - kl, arrShort, arrLong, sShort, sLong + kl);
        }
    }
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        int total = nums1.size() + nums2.size();
        if (total == 0) return 0;
        int k = total / 2;
        if (total % 2) {
            return findK(k, k, nums1, nums2);
        } else {
            return findK(k - 1, k, nums1, nums2);
        }
    }
};

Ruby version

# @param {Integer[]} nums1
# @param {Integer[]} nums2
# @return {Float}
def remove_first(nums1, nums2)
    (nums2.empty? || nums1.size > 0 && nums1.first <= nums2.first) ? nums1.shift : nums2.shift
end

def find_kth(k, knext, nums1, nums2)
    # got short and long arr
    short_arr, long_arr = (nums1.size <= nums2.size ? [nums1, nums2] : [nums2, nums1])

    # edge case 1, has an empty array
    if short_arr.empty?
        return (long_arr[k] + long_arr[knext]) / 2.0
    end

    # edge case 2, find the first 1 or 2 number(s)
    if k == 0
        sum = remove_first(nums1, nums2)
        return sum if knext == k
        sum += remove_first(nums1, nums2)
        return sum / 2.0
    end

    # edge case 3, add this to void kl becomes 0.
    if k == 1
        remove_first(nums1, nums2)
        return find_kth(k-1, knext-1, nums1, nums2)
    end

    # devide k by 2
    kl = k/2
    ks = k - kl
    # kl will always <= ks, try apply ks to short arr, apply large length on short arr has most possiblity to empty one arr.
    if ks > short_arr.size
        ks = short_arr.size
        kl = k - ks
    end
    # decide which arr to cut:
    if short_arr[ks-1] <= long_arr[kl - 1]
        # cut short
        find_kth(k-ks, knext-ks, short_arr[ks..-1], long_arr)
    else
        # cut long
        find_kth(k-kl, knext-kl, short_arr, long_arr[kl..-1])
    end
end

def find_median_sorted_arrays(nums1, nums2)
    total_size = nums1.size + nums2.size
    return 0 if total_size == 0
    k = total_size / 2
    if total_size % 2 == 0
        find_kth(k - 1, k, nums1, nums2)
    else
        find_kth(k, k, nums1, nums2)
    end
end



Old version at 2012

这题费了好大的功夫才做对啊,得好好记一记。
题目让求解两个排好序的数组的中位数。
假设总数为C,如果C是奇数,那么返回位于正中央的那个数即可(index是(C-1)/2);如果是偶数,那么要返回中间两个数的平均数,也就是下标是(C-1)/2和(C-1)/2 + 1的两个数的平均数。

题目自然转换成在两个排好序的数组中,寻找第k大的数(这里指在排好序的数组中下标为k),k = (C-1)/2。
这个问题可以用二分来解,算法有点复杂。

    先说临界情况

  1. A为空或者B为空
    直接在非空数组中找第k大的数即可。O(1)
  2. 找最小的数,k==0的情况,也简单,比较两个数组最开头的元素,谁小就是谁
    然后就是比较复杂的情况,假设寻找目标target是下标为k的数。
    那么意味着在排好的数组中,在目标数之前,一共有k个比目标更小的数。
    将k分成两份,一份在A的前端,一份在B的前端。这里其实将k怎么分配是一个可以讨论的问题,但是平分k可以得到平均最快的效果。
    设k = ka + kb,(k是偶数简单,k是奇数的话,剩下那一个随便放在两个数组中哪一个中都可以)

    这里可以列出来我们想要的效果:
    k=1 —-> ka = 1, kb = 1
    k=2 —-> ka = 1, kb = 1
    k=3 —-> ka = 1, kb = 1. [+1,表示还有一个元素,可以随意分配在ka或者kb中,只要不越界]
    k=4 —-> ka = 2, kb = 2
    k=5 —-> ka = 2, kb = 2. [+1]
    已经可以看出来规律了,这个造成了下面代码中比较复杂的部分,这些细节消耗的时间不少啊。

    然后就是主要逻辑

  1. 如果A[ka-1] >= B[kb-1]
    说明在B前端的kb个数中,不可能出现我们要寻找的目标。
    为什么呢?
    假如A一共有m个数,B一共有n个数。
    那么target(下标是k)后面有且只有n + m – 1 – k个数;
    但是B[kb-1]在B中已经排在n – kb个数之前,加上A中的m – ka + 1(A[ka-a]),也就是说在排序后的整体数组中排在B[kb-1]之后的数,至少有n – kb + m – ka + 1 = n + m – k + 1个。
    由于n + m – k + 1 > n + m – k – 1,所以B前端kb个数都不可能是target。
    所以此时可以将问题转化为,在A[0,…,m-1]和B[kb,…,n-1]中,寻找下标为k – kb的数。
  2. 否则,A[ka-1] < B[kb-1] 同上,可以剔除A前端的ka个数。

这样循环下去,就能以二分的速度找到目标。

这个问题不仅要找到第k大的数,当C是偶数的时候,还要找到第k+1个数,取两者均值。



代码:176ms过大集合

class Solution {
public:
    double midianValueOnArray(int Arr[], int si, bool isOdd) {
        if (isOdd)
            return Arr[si];
        else {
            return (Arr[si] + Arr[si + 1]) / 2.0;
        }
    }
    double midianValueOnArrays(int A[], int m, int ms, int B[], int n, int ns, bool isOdd){
        int sorted[2];
        for(int i = 0 ; i < 2; ++i) {
            if(ms < m) {
                if (ns < n) {
                    sorted[i] = (A[ms] <= B[ns] ? A[ms++] : B[ns++]);
                } else {
                    sorted[i] = A[ms++];
                }
            } else {
                sorted[i] = B[ns++];
            }
            if (isOdd) {
                return sorted[0];
            }
        }
        return 0.5 * (sorted[0] + sorted[1]);
    }
    double findMedianSortedArrays(int A[], int m, int B[], int n) {
        if(m + n <= 0) return 0;
        //find the number at this index from ms and ns
        int target = (m + n - 1) / 2;
        //if isOdd return val[target] is ok
        // else return (val[target] + val[target + 1]) / 2
        bool isOdd = (m + n) % 2;

        int ms = 0;//current start index in A
        int ns = 0;//current start index in B
        while (true) {
            //if A or B is empty just anotherArray[target] is the target
            if(m - ms <= 0) {//A is empty
                return midianValueOnArray(B, ns + target, isOdd);
            } else if(n - ns <= 0) {//B is empty
                return midianValueOnArray(A, ms + target, isOdd);
            }
            //A and B are not empty
            if(target <= 0) {
                return midianValueOnArrays(A,m,ms,B,n,ns,isOdd);
            }
            //divide the numbers smaller than val[target]
            int divideCount = target > 1 ? (target - 2) / 2 : 0;
            int plus = target > 2 ? target % 2 : 0;
            int mm = ms + divideCount;
            if(mm >= m) mm = m - 1;
            int nm = ns + divideCount;
            if(nm >= n) nm = n - 1;
            if(n - nm >= m - mm)
                nm += plus;
            else
                mm += plus;
            if(A[mm] >= B[nm]) {
                //in this case, the numbers at the beginning of B is impossiblely contain val[target]
                target -= (nm - ns + 1);
                ns = nm + 1;
            } else {
                target -= (mm - ms + 1);
                ms = mm + 1;
            }
        }
    }
};

3 thoughts on “LeetCode: Median of Two Sorted Arrays

  1. Pingback: Median of Two Sorted Arrays | ZhongYin Zhang

  2. traveller

    方法很不错~感觉比leetcode上面从两头减的方法要好点,特别是在有个数组很小的时候。

    ps,标题打错了,medium….

    Reply

发表评论

电子邮件地址不会被公开。 必填项已用*标注