Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match? should be faster than match #5256

Open
ahorek opened this issue Jul 20, 2018 · 10 comments
Open

match? should be faster than match #5256

ahorek opened this issue Jul 20, 2018 · 10 comments

Comments

@ahorek
Copy link
Contributor

ahorek commented Jul 20, 2018

Environment

jruby 9.2.1.0-SNAPSHOT (2.5.0) 2018-07-20 a2bc9a3 Java HotSpot(TM) 64-Bit Server VM 9.0.4+11 on 9.0.4+11 +jit [linux-x86_64]
ruby 2.6.0dev (2018-07-20 trunk 64004) [x86_64-linux]
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]

require 'benchmark/ips'

pattern = /^([0-1]?[0-9]|2[0-3])([:\-,\.])?([0-5][0-9])$/
string = '9:30'

Benchmark.ips do |x|
  x.report('match') { string.match(pattern) }
  x.report('match?') { string.match?(pattern) }
end

ruby 2.5.0

               match    926.635k (± 2.6%) i/s -      4.634M in   5.004093s
              match?      3.395M (± 5.5%) i/s -     16.945M in   5.013444s

ruby master

               match      1.024M (± 4.4%) i/s -      5.123M in   5.013162s
              match?      4.809M (± 6.9%) i/s -     23.905M in   5.001929s

ruby master + jit

               match    984.690k (± 7.8%) i/s -      4.911M in   5.018228s
              match?      5.348M (±11.2%) i/s -     26.410M in   5.009641s

jruby master

               match      1.158M (± 9.2%) i/s -      5.712M in   4.995932s
              match?      1.222M (± 5.6%) i/s -      6.092M in   5.005920s

jruby master + indy

               match      1.211M (±11.0%) i/s -      5.912M in   4.985816s
              match?      1.362M (± 4.1%) i/s -      6.782M in   4.989386s

match? should perform better https://bugs.ruby-lang.org/issues/8110
but jruby implement it just like match, without setting a back-reference on a ruby level

ruby has a specialized method for it and it looks like there has been some improvements recently
https://github.com/ruby/ruby/blob/trunk/re.c#L3324

I found match? pattern very useful on many places, so I think it's worth investigating if there's a way to improve it.

@kares
Copy link
Member

kares commented Jul 23, 2018

good find, definitely makes sense to look into "optimizing" match? at least the way MRI does ...
shouldn't be too hard - split out a method or introduce a boolean parameter on the internals to not set $~

@lopex
Copy link
Member

lopex commented Jul 23, 2018

MRI has separate internal implementation for that which passes region as NULL so we'd have to make something similar in joni.

@enebo
Copy link
Member

enebo commented Jul 24, 2018

@lopex and I looked at this one today. Removing creating internal regions in joni and then deleting a bunch of code on JRuby side (since we don't need to deal with captures) MRI (2.5.1) was like 5.2M i/s and we were like 4.7-4.9M i/s from like 1.9-2.1M i/s before patching. If I removed interruptible support we went up to about 5.5M i/s. If I tried bench with those changes using graal ce rc4 we get about 7.2M i/s.

@lopex plans on doing a cleaner version of my changes to joni and we will at least close the gap quite a bit.

Even without removing regions we discovered we were calling search instead of match (which we had to call based on the current version of what match? calls through; but we ended up finding two bottlenecks overall so this was somewhat satisfying.

@headius
Copy link
Member

headius commented Aug 16, 2018

We are closer but still not quite there (numbers from my MBP):

[] ~/projects/jruby $ jruby blah.rb
Warming up --------------------------------------
               match    93.104k i/100ms
              match?   132.259k i/100ms
Calculating -------------------------------------
               match      2.204M (± 9.7%) i/s -     10.893M in   5.009938s
              match?      3.104M (± 6.3%) i/s -     15.474M in   5.009289s

[] ~/projects/jruby $ rvm ruby-2.5 do ruby blah.rb
Warming up --------------------------------------
               match    71.941k i/100ms
              match?   222.292k i/100ms
Calculating -------------------------------------
               match    917.639k (± 5.0%) i/s -      4.604M in   5.032212s
              match?      4.885M (± 2.1%) i/s -     24.452M in   5.007808s

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic blah.rb
Warming up --------------------------------------
               match   106.672k i/100ms
              match?   152.679k i/100ms
Calculating -------------------------------------
               match      2.565M (±10.5%) i/s -     12.694M in   5.030781s
              match?      3.712M (± 6.3%) i/s -     18.474M in   5.003471s

@headius
Copy link
Member

headius commented Aug 28, 2018

FWIW here's an allocation profile of the match? call from the above benchmark run an arbitrary number of times. The vast majority of allocations are the Joni engine and its int[] state.

          percent          live          alloc'ed  stack class
 rank   self  accum     bytes objs     bytes  objs trace name
    1 61.37% 61.37%  38246880 239043 2637534560 16484591 377572 org.joni.ByteCodeMachine
    2 18.41% 79.79%  11474016 239042 791260272 16484589 377574 int[]
    3  0.90% 80.69%    562256 10216    562256 10216 300000 char[]
    4  0.39% 81.08%    245464 10206    245464 10206 300000 java.lang.String
    5  0.31% 81.39%    193472 4233    193472  4233 300000 java.lang.Object[]
    6  0.16% 81.55%     99016  387     99016   387 307282 byte[]
    7  0.15% 81.70%     94464 1968     94560  1970 343568 org.jruby.RubySymbol
    8  0.13% 81.83%     80512  629     80512   629 345011 org.jruby.ir.IRMethod
    9  0.13% 81.96%     78920 1973     79000  1975 343569 org.jruby.util.ByteList
   10  0.12% 82.08%     77600 1940     77600  1940 343562 org.jruby.util.ByteList

@lopex
Copy link
Member

lopex commented Oct 10, 2018

Since we might be beaten by GC pressure here and MRI just mallocs/frees stack and repeat stack (we're allocating matcher and repeat stack) Is there possibly a mix of thread local / dynamic scope place where we could reuse the matcher instance ?

@enebo
Copy link
Member

enebo commented Oct 10, 2018

@lopex it would be interesting as a test to see how much difference that makes...like just cheat with the impl to cache matcher and see if we get considerably faster.

@headius
Copy link
Member

headius commented Oct 10, 2018

I have experimented with resetting and reusing the matcher, but the machine itself still constructed a lot of state. If we could reuse the entire thing I'm sure it would have an impact, but I was never able to measure that.

@lopex
Copy link
Member

lopex commented Oct 11, 2018

But it shouldnt though, Matcher should be all state there is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants