Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining and releasing a Mutex lock on JRuby 9000 is 5 times slower than on MRI. #3255

Closed
boazsegev opened this issue Aug 15, 2015 · 4 comments

Comments

@boazsegev
Copy link

Obtaining and releasing a Mutex lock on JRuby 9000 is 5 times slower than on MRI.

This is an important issue, especially considering that JRuby's advantage is the lack of a GIL, which allows true multithreading.

replicating the issue

require 'benchmark'
l = Mutex.new
puts Benchmark.measure { 100_000.times { l.synchronize { true } unless l.locked? } }

# => MRI:        0.020000   0.000000   0.020000 (  0.025405)
# => JRuby:    0.540000   0.010000   0.550000 (  0.133114)
@Freaky
Copy link
Contributor

Freaky commented Aug 15, 2015

I don't see that here:

require 'benchmark/ips'

l = Mutex.new
Benchmark.ips do |b|
  b.report { l.synchronize { true } unless l.locked? }
end

jruby 9.0.0.0 (2.2.2) 2015-07-21 e10ec96 OpenJDK 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [FreeBSD-amd64]

Calculating -------------------------------------
                        53.542k i/100ms
-------------------------------------------------
                          2.611M (±12.4%) i/s -     12.743M

ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-freebsd10.2]

 Calculating -------------------------------------
                        62.050k i/100ms
-------------------------------------------------
                          1.987M (± 1.1%) i/s -      9.990M

@boazsegev
Copy link
Author

The problem is there - it's just hidden by another layer of the interpreter that let's it speed up when the same short piece of code repeats itself.

In other words: the difference shrinks with repetitive operations (interpreter own code optimization perhaps?), but would be heavily experienced in a real life application.

Allow me to demonstrate this change using the following benchmarks.

Please observe how the results in MRI are consistent factors of 10, whereas the results in JRuby improve with each time the repetition increase and do not represent a consistent factor.

Also, you will get different results if you repeat the test (unless you restart jirb), since optimization already took place.

Here are the results on my machine (MacBook Pro 2.5 GHz Intel Core i7):

require 'benchmark'
require 'benchmark/ips'
def test_both
    l = Mutex.new
    Benchmark.bm do |b|
      b.report('non-ips 10K') { 10_000.times { l.synchronize { true } unless l.locked? } }
      b.report('non-ips 100K') { 100_000.times { l.synchronize { true } unless l.locked? } }
      b.report('non-ips 1M') { 1_000_000.times { l.synchronize { true } unless l.locked? } }
    end
    l = Mutex.new
    Benchmark.ips do |b|
      b.report('ips-report') { l.synchronize { true } unless l.locked? }
    end
    nil
end

test_both

JRuby 9000

      user     system      total        real
non-ips 10K  0.070000   0.000000   0.070000 (  0.016453)
non-ips 100K  0.330000   0.020000   0.350000 (  0.091444)
non-ips 1M  0.630000   0.020000   0.650000 (  0.508916)
Calculating -------------------------------------
          ips-report    94.207k i/100ms
-------------------------------------------------
          ips-report      2.004M (± 3.5%) i/s -     10.080M

ruby 2.2.2p95

      user     system      total        real
non-ips 10K  0.000000   0.000000   0.000000 (  0.002434)
non-ips 100K  0.030000   0.000000   0.030000 (  0.024393)
non-ips 1M  0.280000   0.000000   0.280000 (  0.276850)
Calculating -------------------------------------
          ips-report   118.545k i/100ms
-------------------------------------------------
          ips-report      3.326M (±10.1%) i/s -     16.478M

@Freaky
Copy link
Contributor

Freaky commented Aug 15, 2015

Yep, that's the JIT warming up. It takes time for JRuby and the JVM to reach steady-state - basically everything starts off interpreted, and it compiles stuff as it goes along if it feels the need to. The price we pay for that is ropey startup performance with just about anything.

Using Benchmark#bmbm will give you more consistent results:

2.2.2p95:

Rehearsal ------------------------------------------------
non-ips 10K    0.000000   0.000000   0.000000 (  0.004350)
non-ips 100K   0.046875   0.000000   0.046875 (  0.042396)
non-ips 1M     0.421875   0.000000   0.421875 (  0.424676)
--------------------------------------- total: 0.468750sec

                   user     system      total        real
non-ips 10K    0.007812   0.000000   0.007812 (  0.004349)
non-ips 100K   0.039062   0.000000   0.039062 (  0.042142)
non-ips 1M     0.414062   0.000000   0.414062 (  0.419855)

9.0.0.0:

Rehearsal ------------------------------------------------
non-ips 10K    0.265625   0.000000   0.265625 (  0.032104)
non-ips 100K   0.312500   0.023438   0.335938 (  0.064529)
non-ips 1M     0.468750   0.093750   0.562500 (  0.445830)
--------------------------------------- total: 1.164063sec

                   user     system      total        real
non-ips 10K    0.007813   0.000000   0.007813 (  0.005175)
non-ips 100K   0.070313   0.023438   0.093750 (  0.046860)
non-ips 1M     0.335938   0.101563   0.437500 (  0.437393)

If you replace Mutex with a stub class like:

class Foo
  def synchronize() yield ; end
  def locked?() false ; end
end

You'll find you get basically identical behaviour.

@boazsegev
Copy link
Author

Seems to be my bad.... sorry.

Thank you for your time.

@enebo enebo added this to the Invalid or Duplicate milestone Aug 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants