Step By Step of Creating a Ruby Gem

Step By Step of Creating a Ruby Gem

I’m going to create a ruby gem to handle the accents in a string. The gem will replace all the accents to it’s corresponding ascii character. And will achieve this by extending the class String to provide an instance method #accent_to_ascii.

For example:

require 'accent_to_ascii'
"Thìs ìs ā strìng wïth āccēnts".accent_to_ascii
#=> "This is a string with accents"

The initialtive is that those non-ascii codes in the string sometime is not compatible with systems, we need clear them up. But it’s too crude to just remove them with something like gsub(/[^\x20-\x7e]/, ''). And there are a lot of names are with accents in American. So replace those accents with the corresponding ascii characters is a better solution. ( Will make them happy 🙂 )

So in this article will create a gem accent_to_ascii to do this specific task.

Let’s get started:

Steps (story in short)

  1. Create the files for the gem. It’s as simple as just running: bundle gem accent_to_ascii. This will create the gem files from templates into directory ./accent_to_ascii. I prefer the rspec option for testing.
  2. Develop the gem functionality.
    1. Update the gemspec file accent_to_ascii.gemspec with information of the gem.
    2. Write tests in the spec/accent_to_ascii_spec.rb (Test first right?)
    3. Write the core methods in the lib/accent_to_ascii.rb to make tests green
  3. Publish it into rubygems.org, if you already have rubygems account, this is simple as just run rake release

Now let’s diving into the details of the above steps:

1. Gem Creation

Create gem is as simple as just a line of command:

$ bundle gem accent_to_ascii

By running this command, the bundler will create the gem in the directory accent_to_ascii as:

$ tree accent_to_ascii
accent_to_ascii
├── CODE_OF_CONDUCT.md
├── Gemfile
├── LICENSE.txt
├── README.md
├── Rakefile
├── accent_to_ascii.gemspec
├── bin
│   ├── console
│   └── setup
├── lib
│   ├── accent_to_ascii
│   │   └── version.rb
│   └── accent_to_ascii.rb
├── pkg
│   └── accent_to_ascii-0.1.0.gem
└── spec
    ├── accent_to_ascii_spec.rb
    └── spec_helper.rb

accent_to_ascii.gemspec

The accent_to_ascii.gemspec is the description file of this gem, we can modify it first.

lib/accent_to_ascii/version.rb

The lib/accent_to_ascii/version.rb defined the version of the gem.

module AccentToAscii
  VERSION = "0.1.0"
end

The version is defined with a Major.Minor.Patch style:

  • The Major version is for updates will break the backward compatibility;
  • The Minor version is for updates with backward compatibility;
  • The Patch version is for bug fixes with backward compatibility.

spec/accent_to_ascii_spec.rb

This is the test file, we modify this file with the expected behaviors of the gem and make it green by developing the code.

lib/accent_to_ascii.rb

This is the main code file.

2.1 Update the gemspec file

Just provide the informations in the TODO section, and I make it as:

# coding: utf-8
lib = File.expand_path('../lib', __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'accent_to_ascii/version'

Gem::Specification.new do |spec|
  spec.name          = "accent_to_ascii"
  spec.version       = AccentToAscii::VERSION
  spec.authors       = ["uniEagle"]
  spec.email         = ["unieagle@gmail.com"]

  spec.summary       = %q{Rubygem to replace accents in string into the corresponding ascii chars}
  spec.description   = %q{Replace the non ascii accents in a string}
  spec.homepage      = "https://github.com/unieagle/accent_to_ascii"
  spec.license       = "MIT"

  spec.files         = `git ls-files -z`.split("\x0").reject do |f|
    f.match(%r{^(test|spec|features)/})
  end
  spec.bindir        = "exe"
  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
  spec.require_paths = ["lib"]

  spec.add_development_dependency "bundler", "~> 1.13"
  spec.add_development_dependency "rake", "~> 10.0"
  spec.add_development_dependency "rspec", "~> 3.0"
end

2.2 Write the tests

The in the test file /spec/accent_to_ascii_spec.rb, added several tests to reflect the expected behavior of this gem:

require "spec_helper"

describe AccentToAscii do
  it "has a version number" do
    expect(AccentToAscii::VERSION).not_to be nil
  end

  it { expect("São Paulo".accent_to_ascii).to eq "Sao Paulo" }
  it { expect("Tẽst".accent_to_ascii).to eq "Test" }

  it "does not change original string" do
    str = "São Paulo"
    expect(str.accent_to_ascii).to eq "Sao Paulo"
    expect(str).to eq "São Paulo"
  end

  context "hungarian accents" do
    specify { expect("fejlődő".accent_to_ascii).to eq "fejlodo" }
    specify { expect("FEJLŐDŐ".accent_to_ascii).to eq "FEJLODO" }
    specify { expect("fű".accent_to_ascii).to eq "fu" }
    specify { expect("FŰ".accent_to_ascii).to eq "FU" }
  end

  context "accent_to_ascii!" do
    let(:str) { "São Paulo" }

    it "changes original string" do
      expect(str.accent_to_ascii!).to eq "Sao Paulo"
      expect(str).to eq "Sao Paulo"
    end
  end
end

And of course, if you run rake, almost all the tests will fail:

$ rake
... some test errors ...
Finished in 0.0024 seconds
9 examples, 8 failures

2.3 Write some real code and make the tests green

The file /lib/accent_to_ascii.rb is the target file. I just made it as the following:

require "accent_to_ascii/version"

module AccentToAscii
  ACCENTS_MAPPING = {
    'E' => [200,201,202,203],
    'e' => [232,233,234,235,7869],
    'A' => [192,193,194,195,196,197],
    'a' => [224,225,226,227,228,229,230],
    'C' => [199],
    'c' => [231],
    'O' => [210,211,212,213,214,216,336],
    'o' => [242,243,244,245,246,248,337],
    'I' => [204,205,206,207],
    'i' => [236,237,238,239],
    'U' => [217,218,219,220,368],
    'u' => [249,250,251,252,369],
    'N' => [209],
    'n' => [241],
    'Y' => [221],
    'y' => [253,255],
    'AE' => [306],
    'ae' => [346],
    'OE' => [188],
    'oe' => [189]
  }

  def self.accent_to_ascii(string)
    string.tap do |s|
      ACCENTS_MAPPING.each { |letter, accents| replace(s, letter, accents) }
    end
  end

  private

  def self.replace(string, letter, accents)
    packed = accents.pack('U*')
    regex = Regexp.new("[#{packed}]", nil)
    string.gsub!(regex, letter)
  end

end

class String
  def accent_to_ascii(string = String.new(self))
    AccentToAscii.accent_to_ascii(string)
  end

  def accent_to_ascii!
    accent_to_ascii(self)
  end
end

Special thanks to the Github project accentless, reused the ACCENTS_MAPPING in that project.

Then the tests are green now:

$ rake
AccentToAscii
  has a version number
  should eq "Sao Paulo"
  should eq "Test"
  does not change original string
  hungarian accents
    should eq "fejlodo"
    should eq "FEJLODO"
    should eq "fu"
    should eq "FU"
  accent_to_ascii!
    changes original string

Finished in 0.00283 seconds
9 examples, 0 failures

3. Package and Publish

If you already have rubygems account registered and configured well, it’s just as simple as a single command:

$ rake release

Or, just sign up in the RubyGems.org, then run the following command:

$ gem push pkg/accent_to_ascii-0.1.0.gem

The first time run this command will ask you input your account information for RubyGems.org.

And for the following up changes, just update the version and run rake release.

References

[Engineering Lunch Series] Step-by-Step Guide to Building Your First Ruby Gem

Official Guide of Publish Gems onto RubyGems.org

LeetCode: Reverse Integer

Previous Post

This is the previous post on same question.

Question

LeetCode Link
Reverse digits of an integer.
Example1: x = 123, return 321
Example2: x = -123, return -321

Analyze

Careful about the overflow. For ruby, it’s not a problem. For c++, need more consideration.

Code

Ruby Code

# @param {Integer} x
# @return {Integer}
def reverse(x)
    sign = x >= 0 ? 1 : -1
    x = x.abs
    y = 0
    int_max = 2 ** 31 - 1
    while x > 0 do
        digit = x % 10
        y = y * 10 + digit
        if y > int_max
            y = 0
            break
        end
        x /= 10
    end
    y * sign
end

C++ Code

class Solution {
public:
    int reverse(int x) {
        int sign = 1;
        if(x < 0) {
            sign = -1;
            x = -x;
        }
        int int_max = (1 << 31) - 1; //notice the priority of << is lower than -
        int dime = int_max / 10;
        int tail = int_max % 10 + (sign == 1 ? 0 : 1);

        int y = 0;
        while(x > 0){
            int digit = x % 10;
            if(y > dime || (y == dime && digit > tail)){
                return 0;
            }
            y = y * 10 + digit;
            x /= 10;
        }
        return y * sign;
    }
};

LeetCode: ZigZag Conversion

Previous post

This is the old post on this question, using another method.

Question

Leetcode Link
The string “PAYPALISHIRING” is written in a zigzag pattern on a given number of rows like this: (you may want to display this pattern in a fixed font for better legibility)

P   A   H   N
A P L S I I G
Y   I   R

And then read line by line: “PAHNAPLSIIGYIR”
Write the code that will take a string and make this conversion given a number of rows:

string convert(string text, int nRows);
convert("PAYPALISHIRING", 3) //should return "PAHNAPLSIIGYIR".

Analyze

This question is simple, one method is that build strings for each row, and for each char in the string, determine the target row of the position based on the total row count, and append it to the target string. Then concat all the strings together in the last.

Code

The fast ruby version

# @param {String} s
# @param {Integer} num_rows
# @return {String}
def convert(s, num_rows)
    result = []
    tail_size = num_rows > 2 ? num_rows - 2 : 0
    group_size = num_rows + tail_size

    s.length.times do |i|
        in_group_position = i % group_size
        target_row = if in_group_position < num_rows
            in_group_position
        else
            num_rows - 1 - (in_group_position - num_rows)
        end
        result[target_row] ||= ""
        result[target_row] << s[i]
    end

    result
end

Screen Shot 2016-05-04 at 16.20.51

Formatting DateTime in Rails

  • There are some useful params with Time#to_s
[14] pry(main)> time = u.created_at
=> Thu, 13 Nov 2014 14:21:01 UTC +00:00
[15] pry(main)> time.to_s(:long)
=> "November 13, 2014 14:21"
[16] pry(main)> time.to_s(:short)
=> "13 Nov 14:21"
[17] pry(main)> time.to_s(:default)
=> "2014-11-13 14:21:01 UTC"
[18] pry(main)> time.to_s(:db) #this will put it into the UTC time zone
=> "2014-11-13 14:21:01"
[19] pry(main)> time.to_s(:number)
=> "20141113142101"
[20] pry(main)> time.to_s(:long_ordinal)
=> "November 13th, 2014 14:21"
[21] pry(main)> time.to_s(:rfc822)
=> "Thu, 13 Nov 2014 14:21:01 +0000"
> time.utc.strftime('%FT%TZ')
=> "2016-03-18T18:33:09Z"
  • And for custom formatting, there is a very good post for it:
    https://hackhands.com/format-datetime-ruby/

[Solved] Install Nokogiri on Yosemite with Ruby 2.1.3

Updated to Yosemite and got the following problem while run bundle install with ruby 2.1.3:

An error occurred while installing nokogiri (1.6.1), and Bundler cannot continue.
Make sure that `gem install nokogiri -v '1.6.1'` succeeds before bundling.

The solution is:

gem install nokogiri -v '1.6.1' -- --use-system-libraries=true --with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/libxml2
Fetching: nokogiri-1.6.1.gem (100%)
Building native extensions with: '--use-system-libraries=true --with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/libxml2'
This could take a while...
Successfully installed nokogiri-1.6.1
Parsing documentation for nokogiri-1.6.1
Installing ri documentation for nokogiri-1.6.1
Done installing documentation for nokogiri after 3 seconds
1 gem installed

Using VCR with WebMock in Rails Rspec Tests

If the application need interaction with another web service, it’s convenience to using VCR with WebMock to test the application, rather than doing the request to another web service every time.

Setup

Modify Gemfile

group :test do
  gem 'webmock'
  gem 'vcr'
end

and don’t forget to run

bundle install

Config VCR

The following code can be placed in the spec_helper.rb

VCR.configure do |c|
  c.allow_http_connections_when_no_cassette = true
  c.cassette_library_dir = 'spec/cassettes' #this specified the directory for placing the record files
  c.hook_into :webmock
  c.configure_rspec_metadata!
  c.default_cassette_options = {
    :record => :once,
    :erb => true,
    :match_requests_on => [:method, :uri, :host, :path, :headers]
  }
end

Using VCR

Then you can using VCR when write the rspec test like:

describe 'Some api call tests', :vcr => true do
#write normal api call tests here
end

Re-record

Sometimes we need to re-record the cassettes, because maybe the request params changed in our app or the response is updated from third-party server.

It’s very easy to do that, just modify your VCR.config block, set record: all instead of :once.

:record => :all

and run your rspec tests related to that cassettes, then the cassettes files will be updated.

And you can specify the matchers per test:

describe 'Something', :vcr => {match_requests_on:[:method, :uri]} do
  # Some tests
end

The following method can tell whether vcr is recording, it’s very useful in some scenarios:

VCR.current_cassette.recording?

[Repost] Here Document in Ruby

This is repost from http://log.gmarik.info/2007/12/rubys-here-document-heredoc-mini.html , good post!

Basics

Here Document(or HereDoc) is a way to declare String literal in Ruby programming language:

some_text = <<END_OF_STRING
 You can write any text here for your document that's why such
 statement is called - HereDoc
END_OF_STRING

That’s it! Now some_text variable points to the string object containing the text going between END_OF_STRING
As you may know HereDoc isn’t a unique Ruby feature, rather it’s a common construct(with some distinctions) for scripting languages “brewed” in Unix environment.

The terminator

By Ruby convention a variable starting with capital letter is a constant. But that’s not a case for the END_OF_STRING from previous example, because terminator is just a string which parser treats as the end of HereDoc.
Well if a terminator is a string then can i use arbitrary(say put spaces within) string as the terminator like end of string? The answer is – yes you can!
<<heredoc is interpreted same as <<"heredoc" (please note double quotes around latter heredoc).
But explicit notation(with quotes) is a bit more powerful.

String interpolation

By explicitly enclosing terminator with quotes you may have:

a_text = <<"Ruby, please end my HereDoc once you find this terminator string"
Hello world!
Ruby, please end my HereDoc once you find this terminator string

or

fuzzy_names = <<">>"
foo, baz, bar
>>

Wow, if i can use double quoted string, then probably i can use single quoted string as well:

puts <<'end of string'
 1+1=#{1+1}
end of string
prints
1+1=#{1+1}

as single quoted strings aren’t interpolated unlike double quoted:

puts <<"end of string"
 1+1=#{1+1}
end of string
prints
1+1=2

Indent modifier

By default HereDoc terminator is expected to be placed on the very beginning of the separate line
By using – on HereDoc declaration, you may indent end terminator arbitrary:

greeting = <<-"here document ends"
                 Hello world
               here document ends

Keep in mind that leading spaces are kept.

Advanced usages

a, b = <<'EOA', <<-EOB
string_a
EOA
string_b
   EOB
is equal to
a, b = "string_a\n", "string_b\n"

At this point i’m thinking about HereDoc as “placeholder” that gets substituted with the string itself. Why is that important? Because you may then treat HereDoc declaration as the actual string and send messages(call methods):

<<'heredoc'.reverse == "\n!dlrow olleH"
Hello world!
heredoc

is a true statement!

STI of ActiveRecord in Rails

STI, Single Table Inheritance, supported by ActiveRecord in rails to let us build the models’ hierarchy on a single data table.

Model description

For example, we have a User model, and have an Administrator model, which inherit from User. we can build just one table named ‘user’, and make sure there is a column named ‘type’ and of string type in the User table. like below:

mysql>describe 'user';
+--------------------------+--------------+------+-----+----------+----------------+
| Field                    | Type         | Null | Key | Default  | Extra          |
+--------------------------+--------------+------+-----+----------+----------------+
| id                       | int(11)      | NO   | PRI | NULL     | auto_increment |
| type                     | varchar(255) | YES  |     | NULL    |                |
| name                     | varchar(255) | YES  |     | NULL     |                |
+--------------------------+--------------+------+-----+----------+----------------+

Code

then, we can writing following code to have a look of STI:

class User < ActiveRecord::Base
    validates_presence_of :name
end
class Administrator < Post
end

just this, and we can verify it in console:

admin = Administrator.create( :name => "admin")
admin.type # "Administrator"
admin.id # 1

Concerns

the STI is enabled by default, and so the 'type' in table is likely reserved for STI and we can not using it for other purpose;
and another problem is that if the shared columns are not so much in the model hierarchy, it's a waste using STI, we prefer to setup another table for the child model.
Luckily, we can disable this feature in a table that not using this feature. by doing this:

class User < ActiveRecord::Base
    self.abstract_class = true
    validates_presence_of :name
end
class Administrator < Post
end

by doing this, the 'type' columns in User can be used at our will and there must another table for Administrator alone.

Referrence

http://ihower.tw/rails3/activerecord-others.html

The difference from %w to %W in Ruby

Difference between %w and %W

In short, %w act as ', while %W act as "
In long words, %w not convert the #{} clause, but %W do, like the codes below:

Code

irb(main):001:0> foo="hello"
=> "hello"
irb(main):002:0> %W(foo bar baz #{foo})
=> ["foo", "bar", "baz", "hello"]
irb(main):003:0> %w(foo bar baz #{foo})
=> ["foo", "bar", "baz", "\#{foo}"]

Reference

http://stackoverflow.com/questions/690794/ruby-arrays-w-vs-w

LeetCode: Median of Two Sorted Arrays

Updated at 2016-04-24

Question

Median of Two Sorted Arrays
There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Analyze

This is a hard question, when I met this question again after 4 years, it still takes much time to solve. But this time, with a more clear thought on it.

This question needs find 1 or 2 number(s) in the middle of 2 sorted array and has strict time limit on it. At the first thought, 2-way divide will be the solution. But how to apply it, is a big challenge.

First transform this question to another more general question will help a lot: find the k-th number in two sorted array. Then for this Leetcode question, k is (nums1.size + nums2.size) / 2 for odd total. For even total, need find k-th and (k+1)-th number.

Then there are too many options on this question: divide arrays by their own middle number (too complex go from here, have 4 parts, conditions to determine which part to drop is very hard with edge cases); Or divide k to k/2 and k-k/2, more simple, less edge cases, but need consider how to handle the situation when one array is shorter than k/2; for even total number, find n/2 and n/2 + 1 separately or at a time is another consideration.

At last, I worked out the solution by divide k and find k and k+1 at once (for even total numbers). For this solution, some facts are important:
Let’s say those 2 arrays are A and B. A has n numbers and B has m numbers.
So:
A = a0 , a1 , … , an-1
B = b0 , b1 , … , bm-1
And for simplicity, let’s only consider total number is odd, so k = (n + m)/2 .

Then divide k as equally as we can to ka = k/2 and kb = k - k/2, it’s apparent that kb >= ka (you can get this conclusion by check k from 0). Then if we use ka and kb to segment those two arrays we got:
a0 , a1 , … , aka-1 | aka , … , an-1
b0 , b1 , … , bkb-1 | bkb , … , bm-1

Then we can either drop a0 , a1 , … , aka-1 or b0 , b1 , … , bkb-1 from our search. Try my best to explain this idea as clear as possible in the following analyze:

From here, compare the aka-1 and bkb-1.
If aka-1 <= bkb-1 , then we can make sure the beginning of the array A is included in the first k elements of the whole sorted array.

Why? Since bkb-1 is equal or bigger than aka-1, then the tail of Array B and the tail of Array A are behind aka-1 for sure. The only possible items may be placed before aka-1 is the items in the front of Array B: b0 , b1 , … , bkb-2. But even in that case, there are only ka + kb – 1 < k numbers. There is not enough number can push aka-1 out of the first k numbers of the whole sorted array.

Then, with this conclusion that a0 , a1 , … , aka-1 are included in the first k elements of the whole sorted array, and we are looking for the k and maybe k+1 elements. So they are unrelated to our task, thus we can exclude them from our search. Then we can further solve the problem by finding (k-ka)-th element in array aka , … , an-1 and b0 , … , bm-1

And if in other case that aka-1 > bkb-1, we can exclude the beginning of B b0 , b1 , … , bkb-1 from our search. And we can find the (k-kb)-th element in the arrays a0 , … , an-1 and bkb , … , bm-1

In this manner we can find the target in O(m + n) time limit.

And for the case that total numbers are even, when we found k-th element, just do another compare of the heads of array, then we can get the (k+1)-th number. It’s simple, but will confuse a lot if we take this into consideration of the whole process at the beginning.

And the edge cases are simple, won’t explain them here.

Codes

C++ version

class Solution {
public:
    inline int findFirstAndPop(vector<int>& nums1, vector<int>& nums2, int& s1, int& s2) {
        if(nums2.size() <= s2 || nums1.size() > s1 && nums1[s1] <= nums2[s2]) {
            return nums1[s1++];
        } else {
            return nums2[s2++];
        }
    }
    double findK(int k, int kn, vector<int>& nums1, vector<int>& nums2, int s1 = 0, int s2 = 0) {
        // find the short array
        vector<int> *pShort, *pLong;
        int sShort, sLong;
        if (nums1.size() - s1 <= nums2.size() - s2) {
            pShort = &nums1;
            pLong = &nums2;
            sShort = s1;
            sLong = s2;
        } else {
            pShort = &nums2;
            pLong = &nums1;
            sShort = s2;
            sLong = s1;
        }
        vector<int> &arrShort = *pShort;
        vector<int> &arrLong = *pLong;
        // edge case 1, empty short array
        if (arrShort.size() == sShort) {
            return (arrLong[k + sLong] + arrLong[kn + sLong]) / 2.0;
        }
        // edge case 2
        if (k == 0) {
            double sum = findFirstAndPop(arrShort, arrLong, sShort, sLong);
            if (k == kn) {
                return sum;
            }
            sum += findFirstAndPop(arrShort, arrLong, sShort, sLong);
            return sum / 2.0;
        }

        // divide k
        int ks = k / 2;
        int kl = k - ks;
        if (ks > arrShort.size() - sShort) {
            ks = arrShort.size() - sShort;
            kl = k - ks;
        } else if (ks == 0) {
            ks = 1;
        }

        // compare pivots
        if (arrShort[sShort + ks - 1] <= arrLong[sLong + kl - 1]) {
            // remove head of short array
            return findK(k - ks, kn - ks, arrShort, arrLong, sShort + ks, sLong);
        } else {
            // remove the head of long array
            return findK(k - kl, kn - kl, arrShort, arrLong, sShort, sLong + kl);
        }
    }
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        int total = nums1.size() + nums2.size();
        if (total == 0) return 0;
        int k = total / 2;
        if (total % 2) {
            return findK(k, k, nums1, nums2);
        } else {
            return findK(k - 1, k, nums1, nums2);
        }
    }
};

Ruby version

# @param {Integer[]} nums1
# @param {Integer[]} nums2
# @return {Float}
def remove_first(nums1, nums2)
    (nums2.empty? || nums1.size > 0 && nums1.first <= nums2.first) ? nums1.shift : nums2.shift
end

def find_kth(k, knext, nums1, nums2)
    # got short and long arr
    short_arr, long_arr = (nums1.size <= nums2.size ? [nums1, nums2] : [nums2, nums1])

    # edge case 1, has an empty array
    if short_arr.empty?
        return (long_arr[k] + long_arr[knext]) / 2.0
    end

    # edge case 2, find the first 1 or 2 number(s)
    if k == 0
        sum = remove_first(nums1, nums2)
        return sum if knext == k
        sum += remove_first(nums1, nums2)
        return sum / 2.0
    end

    # edge case 3, add this to void kl becomes 0.
    if k == 1
        remove_first(nums1, nums2)
        return find_kth(k-1, knext-1, nums1, nums2)
    end

    # devide k by 2
    kl = k/2
    ks = k - kl
    # kl will always <= ks, try apply ks to short arr, apply large length on short arr has most possiblity to empty one arr.
    if ks > short_arr.size
        ks = short_arr.size
        kl = k - ks
    end
    # decide which arr to cut:
    if short_arr[ks-1] <= long_arr[kl - 1]
        # cut short
        find_kth(k-ks, knext-ks, short_arr[ks..-1], long_arr)
    else
        # cut long
        find_kth(k-kl, knext-kl, short_arr, long_arr[kl..-1])
    end
end

def find_median_sorted_arrays(nums1, nums2)
    total_size = nums1.size + nums2.size
    return 0 if total_size == 0
    k = total_size / 2
    if total_size % 2 == 0
        find_kth(k - 1, k, nums1, nums2)
    else
        find_kth(k, k, nums1, nums2)
    end
end



Old version at 2012

这题费了好大的功夫才做对啊,得好好记一记。
题目让求解两个排好序的数组的中位数。
假设总数为C,如果C是奇数,那么返回位于正中央的那个数即可(index是(C-1)/2);如果是偶数,那么要返回中间两个数的平均数,也就是下标是(C-1)/2和(C-1)/2 + 1的两个数的平均数。

题目自然转换成在两个排好序的数组中,寻找第k大的数(这里指在排好序的数组中下标为k),k = (C-1)/2。
这个问题可以用二分来解,算法有点复杂。

    先说临界情况

  1. A为空或者B为空
    直接在非空数组中找第k大的数即可。O(1)
  2. 找最小的数,k==0的情况,也简单,比较两个数组最开头的元素,谁小就是谁
    然后就是比较复杂的情况,假设寻找目标target是下标为k的数。
    那么意味着在排好的数组中,在目标数之前,一共有k个比目标更小的数。
    将k分成两份,一份在A的前端,一份在B的前端。这里其实将k怎么分配是一个可以讨论的问题,但是平分k可以得到平均最快的效果。
    设k = ka + kb,(k是偶数简单,k是奇数的话,剩下那一个随便放在两个数组中哪一个中都可以)

    这里可以列出来我们想要的效果:
    k=1 —-> ka = 1, kb = 1
    k=2 —-> ka = 1, kb = 1
    k=3 —-> ka = 1, kb = 1. [+1,表示还有一个元素,可以随意分配在ka或者kb中,只要不越界]
    k=4 —-> ka = 2, kb = 2
    k=5 —-> ka = 2, kb = 2. [+1]
    已经可以看出来规律了,这个造成了下面代码中比较复杂的部分,这些细节消耗的时间不少啊。

    然后就是主要逻辑

  1. 如果A[ka-1] >= B[kb-1]
    说明在B前端的kb个数中,不可能出现我们要寻找的目标。
    为什么呢?
    假如A一共有m个数,B一共有n个数。
    那么target(下标是k)后面有且只有n + m – 1 – k个数;
    但是B[kb-1]在B中已经排在n – kb个数之前,加上A中的m – ka + 1(A[ka-a]),也就是说在排序后的整体数组中排在B[kb-1]之后的数,至少有n – kb + m – ka + 1 = n + m – k + 1个。
    由于n + m – k + 1 > n + m – k – 1,所以B前端kb个数都不可能是target。
    所以此时可以将问题转化为,在A[0,…,m-1]和B[kb,…,n-1]中,寻找下标为k – kb的数。
  2. 否则,A[ka-1] < B[kb-1] 同上,可以剔除A前端的ka个数。

这样循环下去,就能以二分的速度找到目标。

这个问题不仅要找到第k大的数,当C是偶数的时候,还要找到第k+1个数,取两者均值。



代码:176ms过大集合

class Solution {
public:
    double midianValueOnArray(int Arr[], int si, bool isOdd) {
        if (isOdd)
            return Arr[si];
        else {
            return (Arr[si] + Arr[si + 1]) / 2.0;
        }
    }
    double midianValueOnArrays(int A[], int m, int ms, int B[], int n, int ns, bool isOdd){
        int sorted[2];
        for(int i = 0 ; i < 2; ++i) {
            if(ms < m) {
                if (ns < n) {
                    sorted[i] = (A[ms] <= B[ns] ? A[ms++] : B[ns++]);
                } else {
                    sorted[i] = A[ms++];
                }
            } else {
                sorted[i] = B[ns++];
            }
            if (isOdd) {
                return sorted[0];
            }
        }
        return 0.5 * (sorted[0] + sorted[1]);
    }
    double findMedianSortedArrays(int A[], int m, int B[], int n) {
        if(m + n <= 0) return 0;
        //find the number at this index from ms and ns
        int target = (m + n - 1) / 2;
        //if isOdd return val[target] is ok
        // else return (val[target] + val[target + 1]) / 2
        bool isOdd = (m + n) % 2;

        int ms = 0;//current start index in A
        int ns = 0;//current start index in B
        while (true) {
            //if A or B is empty just anotherArray[target] is the target
            if(m - ms <= 0) {//A is empty
                return midianValueOnArray(B, ns + target, isOdd);
            } else if(n - ns <= 0) {//B is empty
                return midianValueOnArray(A, ms + target, isOdd);
            }
            //A and B are not empty
            if(target <= 0) {
                return midianValueOnArrays(A,m,ms,B,n,ns,isOdd);
            }
            //divide the numbers smaller than val[target]
            int divideCount = target > 1 ? (target - 2) / 2 : 0;
            int plus = target > 2 ? target % 2 : 0;
            int mm = ms + divideCount;
            if(mm >= m) mm = m - 1;
            int nm = ns + divideCount;
            if(nm >= n) nm = n - 1;
            if(n - nm >= m - mm)
                nm += plus;
            else
                mm += plus;
            if(A[mm] >= B[nm]) {
                //in this case, the numbers at the beginning of B is impossiblely contain val[target]
                target -= (nm - ns + 1);
                ns = nm + 1;
            } else {
                target -= (mm - ms + 1);
                ms = mm + 1;
            }
        }
    }
};

LeetCode题目:Longest Substring Without Repeating Characters

贪心法,从头扫描到尾,O(n)搞定。
用一个map[256]来记录出现过的字符位置。
遇到没有出现过的字符,将map对应位置标记,并且子串长度+1;
遇到出现过的字符,将子串自上一次出现该字符位置前面的截断,对这部分截断的字符,标记map为notFound,重新计算子串长度。



Longest Substring Without Repeating Characters
Given a string, find the length of the longest substring without repeating characters. For example, the longest substring without repeating letters for “abcabcbb” is “abc”, which the length is 3. For “bbbbb” the longest substring is “b”, with the length of 1.



代码:44ms过大集合

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        int map[256];
        const int notFound = -1;
        for(int i = 0 ; i < 256; ++i){
            map[i] = notFound;
        }
        int maxlen = 0;
        int len = 0;
        for(int i = 0 ; i < s.size(); ++i) {
            char si = s[i];
            int lastSi = map[si];
            if(lastSi == notFound) {
                ++len;
                map[si] = i;
                if(maxlen < len)
                    maxlen = len;
            } else {
                int curStart = i - len;
                for(int j = curStart; j < lastSi; ++j) {
                    map[s[j]] = notFound;
                }
                curStart = lastSi + 1;
                len = i - curStart + 1;
                map[si] = i;
            }
        }
        return maxlen;
    }
};

New ruby solution at 2016-04-22

# @param {String} s
# @return {Integer}
def length_of_longest_substring(s)
    max = 0
    c2imap = {}
    si = -1
    s.chars.each_with_index do |c, i|
        if ci = c2imap[c] and ci > si
            si = ci
        end
        c2imap[c] = i
        new_length = i - si
        max = new_length if max < new_length
    end
    max
end

LeetCode题目:Longest Palindromic Substring,二维动态规划

Updated at 2016-04-28

Question

Longest Palindromic Substring
Given a string S, find the longest palindromic substring in S. You may assume that the maximum length of S is 1000, and there exists one unique longest palindromic substring.

Analyze

The brute force method will take O(n3) time, and apparently, it will exceed the time limit.

To reduce the time, as we noticed there are a lot of duplicated comparisons in the brute force method, the first thought is dynamic programming. But I was too aggressive at the begging, want to solve this with 1-D dynamic programming in O(n) time.

After find out several exceptional cases with either focus on the start of substring or the end of substring, I realized that it can not fit into a 1-D dynamic programming. But 2-D, can easily beat this question.

2-D Dynamic Programming

Let’s focus on the start index si and end index ei of the substring. When we want answer the question: “Is the substring s[si..ei] palindromic?”, we just check if s[si + 1 .. ei – 1] is palindromic and if s[si] == s[ei]. If both conditions are true, then we can get the positive answer. Then if we solve the problem in this order:

for(int ei = 0; ei < s.length(); ei ++)
  for(int si = ei; si >= 0; si --)
  { // implement here }

Then we can make sure when we resolve sub problem on (si, ei), we know the answer of (si+1, ei-1). So, just build a 2-D matrix and we can beat this question in O(n2) time.

Expand from center

That’s the DP solution. And since it’s O(n2), then there is a more straight forward method with the same time level:
Just at each point in the string S, on the positions of it’s characters like 0,1,2,… or between characters like 0.5, 1.5, … . Then from this position, just expand the substring if it is still palindromic. Then there are O(n) centers and at each center, we need expand O(n) times. Thus, this is also a O(n2) time solution. And simpler.

Codes

Ruby version DP

def longest_palindrome(s)
    map = {}
    max_si, max_ei, max_length = nil, nil, 0
    s.length.times do |ei|
        si = ei
        while si >= 0 do
            if si == ei
                map[ei] = { si => true }
            else
                map[ei][si] = s[ei] == s[si] && (si + 1 >= ei || map[ei - 1][si + 1] == true)
            end

            len = (ei - si + 1)
            if map[ei][si] && len > max_length
                max_length = len
                max_si = si
                max_ei = ei
            end

            si -= 1
        end
    end

    s[max_si..max_ei]
end

Ruby version expand from center

def expand(s, center)
  li, ri = center.floor, center.ceil
  if li == ri
      li -= 1
      ri += 1
  end
  left_space = li
  right_space = s.length - ri - 1
  max_space = [left_space, right_space].min
  most_left = li - max_space
  while li >= most_left do
    if s[li] == s[ri]
      li -= 1
      ri += 1
    else
      break
    end
  end
  #p [ri - li - 1, li + 1, ri - 1]
  [ri - li - 1, li + 1, ri - 1]
end

def longest_palindrome(s)
  i = 0
  max_si, max_ei, max_len = nil, nil, 0
  while i <= s.length - 1 do
    len, si, ei = expand(s, i)
    if max_len < len
        max_si, max_ei, max_len = si, ei, len
    end
    i += 0.5
  end
  s[max_si..max_ei]
end

C++ version DP

class Solution {
public:
    inline void setIsPalindrome(bool * dpmap, int si, int ei, int total_length, bool value) {
        *(dpmap + total_length * ei + si) = value;
    }
    inline bool isPalindrome(bool * dpmap, int si, int ei, int total_length) {
        return *(dpmap + total_length * ei + si);
    }
    string longestPalindrome(string s) {
        int total_length = s.length();
        int max_len = 0, max_si, max_ei;
        bool *dpmap = new bool[total_length * total_length];
        for(int ei = 0; ei < total_length; ei++) {
            for(int si = ei; si >= 0; si--) {
                bool positive = 
                    (si == ei) ||
                    ((s[si] == s[ei]) && ((si == ei - 1) || isPalindrome(dpmap, si + 1, ei - 1, total_length)));
                setIsPalindrome(dpmap, si, ei, total_length, positive);
                if(positive){
                    int len = ei - si + 1;
                    if(max_len < len){
                        max_len = len;
                        max_si = si;
                        max_ei = ei;
                    }
                }
            }
        }
        delete dpmap; dpmap = NULL;
        return s.substr(max_si, max_len);
    }
};

旧版,有误,奇怪的是当初竟然通过了测试…

问题要求是寻找一个字符串中的最长可逆子串,可以转化为求它和自身的逆序串的最长公共子串问题。
关键的思路是:
用一个matrix记录最长公共子串的长度,mat[i][j]表示以a串的i-1字符结束且以b串的j-1字符结束的最长子串长度。
当stringa[i-1] == stringb[j-1]时,mat[i][j] == mat[i-1][j-1] + 1

其中遇到了两个问题:
1.在子函数longestCommonSub中,如果先new,在得到最长公共串之后delete申请的空间mat的话,运行到大概第5个test case的时候会报runtime error。
2.在代码段

if(max <= len)

中,如果把小于等于改成小于,在大测试集合中会有一个莫名其妙的test case结果错误。
这两个问题都不知道原因。



代码:1488ms过大集合

class Solution {
public:
    string longestCommonSub(string &sa, string &sb) {
        int rows = sa.size() + 1;
        int cols = sb.size() + 1;
        static int *mat = NULL;
        if(NULL == mat) {
            mat = new int[1001 * 1001];
        }
        for(int ir = 0 ; ir < rows; ++ ir) {
            for(int ic = 0 ; ic < cols; ++ ic) {
                *(mat + ir * cols + ic) = 0;
            }
        }
        int max = 0;
        int maxIr;
        for(int ir = 1 ; ir < rows; ++ ir) {
            char sai = sa[ir - 1];
            for(int ic = 1 ; ic < cols; ++ ic) {
                if(sai == sb[ic - 1]){
                    int len = 1 + *(mat + (ir-1) * cols + ic - 1);
                    *(mat + ir * cols + ic) = len;
                    if(max <= len) {
                        max = len;
                        maxIr = ir;
                    }
                }
            }
        }
        string result;
        if(max > 0) {
            result = sa.substr(maxIr - max, max);
        }
        return result;
    }
    string longestPalindrome(string s) {
        string rs = s;
        for( int i = 0 ; i < s.size(); ++ i) {
            rs[i] = s[s.size() - 1 - i];
        }
        return longestCommonSub(s,rs);
    }
};