Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jruby 9.0.0.0.rc1 performance regressions #3121

Closed
jzakiya opened this issue Jul 9, 2015 · 12 comments
Closed

jruby 9.0.0.0.rc1 performance regressions #3121

jzakiya opened this issue Jul 9, 2015 · 12 comments

Comments

@jzakiya
Copy link

jzakiya commented Jul 9, 2015

In running a cpu heavy math benchmark against 1.7.20.1 and 9.0.0.0.rc1 I've observed a significant performance regression. The 9.0.0.0.rc1 times are 1.5 to 2.5 slower than for 1.7.20.1.

Here is a gist of the benchmark, justdo load 'primalitytest.rb' and it will run.
https://gist.github.com/jzakiya/9befd7eb5ed9640f9358

This code became the precursor to a gem I released on April 1, 2015 -- primes-utils.
I see the same performance degradation with its methods as with the above benchmark methods.

I know 9.0.0.0 is still a first release candidate but this seems very extreme to me.

jz

@jzakiya
Copy link
Author

jzakiya commented Jul 9, 2015

I forgot to state my reference system is a Lenovo laptop, I5 cpu, 6GB ram, using PCLinuxOS 32-bit Linux distro. But I see similar performance in other 64-bit distros using Virtual Box.

jz

@enebo enebo added this to the JRuby 9.0.0.0 milestone Jul 9, 2015
@chrisseaton
Copy link
Contributor

Wow this is a beast of a benchmark! JRuby performance regression aside, I would imagine your current benchmark is running this in the interpreter because you are only calling each method twice. Unless a Ruby implementation has on-stack-replacement in their JIT (most do not), or some kind of tracing, there isn't really an opportunity to optimise here.

@chrisseaton
Copy link
Contributor

I tried a much smaller prime, and benchmark-ips with a generous 20s warmup and 10s timing, so the methods are executed much more frequently and are more likely to be optimised.

I hope the smaller prime is still correct for what you are trying to achieve?

1.7

Calculating -------------------------------------
prime tests for P = 982451653
                       301.636k i/100ms
       Miller-Rabin      2.448k i/100ms
       primz7?         181.000  i/100ms
       primz7a?        277.000  i/100ms
       primz7b?        253.000  i/100ms
       primep? 13       31.000  i/100ms
       primep? 17        1.000  i/100ms
       primepa? 13      27.000  i/100ms
       primepa? 17       1.000  i/100ms
       factorp 13       31.000  i/100ms
       factorp 17        1.000  i/100ms
  prime? [ruby lib]      7.000  i/100ms
prime_division [ruby lib]
                         7.000  i/100ms
-------------------------------------------------
prime tests for P = 982451653
                         22.588M (±10.4%) i/s -    445.818M
       Miller-Rabin      25.410k (± 5.2%) i/s -    509.184k
       primz7?            1.871k (± 4.9%) i/s -     37.467k
       primz7a?           2.834k (± 5.0%) i/s -     56.508k
       primz7b?           2.662k (± 3.3%) i/s -     53.383k
       primep? 13       311.965  (± 3.8%) i/s -      6.231k
       primep? 17        18.217  (± 5.5%) i/s -    364.000 
       primepa? 13      271.729  (± 4.0%) i/s -      5.427k
       primepa? 17       16.342  (± 6.1%) i/s -    326.000 
       factorp 13       310.819  (± 3.9%) i/s -      6.231k
       factorp 17        18.859  (± 5.3%) i/s -    376.000 
  prime? [ruby lib]      79.177  (± 5.1%) i/s -      1.582k
prime_division [ruby lib]
                         71.363  (± 4.2%) i/s -      1.428k

9k

Calculating -------------------------------------
prime tests for P = 982451653
                       159.008k i/100ms
       Miller-Rabin      2.085k i/100ms
       primz7?         145.000  i/100ms
       primz7a?        173.000  i/100ms
       primz7b?        139.000  i/100ms
       primep? 13       28.000  i/100ms
       primep? 17        1.000  i/100ms
       primepa? 13      23.000  i/100ms
       primepa? 17       1.000  i/100ms
       factorp 13       26.000  i/100ms
       factorp 17        1.000  i/100ms
  prime? [ruby lib]     37.000  i/100ms
prime_division [ruby lib]
                        32.000  i/100ms
-------------------------------------------------
prime tests for P = 982451653
                         19.528M (±12.4%) i/s -    382.573M
       Miller-Rabin      22.314k (± 5.2%) i/s -    446.190k
       primz7?            1.507k (± 5.1%) i/s -     30.160k
       primz7a?           1.820k (± 4.3%) i/s -     36.330k
       primz7b?           1.505k (± 4.2%) i/s -     30.163k
       primep? 13       286.737  (± 3.1%) i/s -      5.740k
       primep? 17        17.017  (± 5.9%) i/s -    340.000 
       primepa? 13      242.588  (± 3.7%) i/s -      4.853k
       primepa? 17       14.733  (± 6.8%) i/s -    294.000 
       factorp 13       273.193  (± 3.7%) i/s -      5.460k
       factorp 17        17.162  (± 5.8%) i/s -    343.000 
  prime? [ruby lib]     383.873  (± 3.6%) i/s -      7.696k
prime_division [ruby lib]
                        331.196  (± 3.6%) i/s -      6.624k

Note that 9k is a lot better on prime? and prime_division, but worse on everything else (I think - hard to scan data like this).

@chrisseaton
Copy link
Contributor

I'd post Truffle results btw but we have a compiler bug that stops us compiling benchmark-ips right now :(

@chrisseaton
Copy link
Contributor

Ah, I can get Truffle working.

Truffle

Calculating -------------------------------------
prime tests for P = 982451653
                       102.015k i/100ms
       Miller-Rabin      2.137k i/100ms
       primz7?           2.112k i/100ms
       primz7a?          2.861k i/100ms
       primz7b?          3.166k i/100ms
       primep? 13       33.000  i/100ms
       primep? 17        2.000  i/100ms
       primepa? 13      32.000  i/100ms
       primepa? 17       2.000  i/100ms
       factorp 13       39.000  i/100ms
       factorp 17        2.000  i/100ms
  prime? [ruby lib]    150.000  i/100ms
prime_division [ruby lib]
                       139.000  i/100ms
-------------------------------------------------
prime tests for P = 982451653
                        101.988M (± 1.4%) i/s -      1.995B
       Miller-Rabin      31.611k (± 9.2%) i/s -    621.867k
       primz7?           24.722k (± 4.2%) i/s -    494.208k
       primz7a?          50.190k (± 6.5%) i/s -      1.001M
       primz7b?          50.372k (± 5.0%) i/s -      1.007M
       primep? 13       416.814  (± 7.7%) i/s -      8.283k
       primep? 17        24.323  (± 4.1%) i/s -    486.000 
       primepa? 13      413.689  (± 4.8%) i/s -      8.256k
       primepa? 17       24.063  (± 4.2%) i/s -    482.000 
       factorp 13       440.572  (± 3.9%) i/s -      8.814k
       factorp 17        24.520  (± 4.1%) i/s -    490.000 
  prime? [ruby lib]       1.577k (± 3.6%) i/s -     31.500k
prime_division [ruby lib]
                          1.535k (± 3.2%) i/s -     30.719k

Between 1.2x (Miller-Rabin) and 21x (prime_division) faster.

@chrisseaton
Copy link
Contributor

prime_division is written in pure Ruby, and is 4.5x on 9k compared to 1.7, so the IR is definitely doing something right and hopefully the other cases are just tweaking.

@ianks
Copy link

ianks commented Jul 11, 2015

Would there be interest in adding this to the benchmark suite?

@jzakiya
Copy link
Author

jzakiya commented Jul 12, 2015

The Miller-Rabin implementation for this benchmark was old code.
MR is way faster with the newer implementation code, which I include.
When you get a chance, run it again with this new code.
It uses the openssl library, so using it may help you find any regressions in it too.

require 'openssl'
def primemr?(k=20)  # increase k for more reliability
  n = self.abs
  return true  if n == 2 or n == 3
  return false if n % 6 != 1 && n % 6 != 5 or n == 1

  d = n - 1
  s = 0
  (d >>= 1; s += 1) while d.even?
  k.times do
    a = 2 + rand(n-4)
    x = a.to_bn.mod_exp(d,n)      # x = (a**d) mod n
    next if x == 1 or x == n-1
    (s-1).times do
      x = x.mod_exp(2,n)          # x = (x**2) mod n
      return false if x == 1
      break if x == n-1
    end
    return false if x != n-1
  end
  true  # with high probability
end

@enebo enebo modified the milestone: JRuby 9.0.0.0 Jul 14, 2015
@headius headius added this to the JRuby 9.0.1.0 milestone Jul 16, 2015
@enebo enebo modified the milestones: JRuby 9.0.2.0, JRuby 9.0.1.0 Aug 18, 2015
@enebo enebo modified the milestones: JRuby 9.0.5.0, JRuby 9.0.4.0 Nov 12, 2015
@headius
Copy link
Member

headius commented Jan 20, 2016

I'll attempt to get updated numbers today.

@headius
Copy link
Member

headius commented Jan 20, 2016

Here's current numbers with JRuby 1.7.21 and 9k master (9.0.5.0):

1.7.21:

prime tests for P = 982451653
                         16.569M (±10.8%) i/s -     81.711M
       Miller-Rabin      18.751k (± 4.7%) i/s -     93.555k
       primz7?            1.397k (± 5.4%) i/s -      7.072k
       primz7a?           2.306k (± 7.5%) i/s -     11.514k
       primz7b?           2.121k (± 8.0%) i/s -     10.584k
       primep? 13       266.916  (± 4.5%) i/s -      1.334k
       primep? 17        15.511  (± 6.4%) i/s -     78.000 
       primepa? 13      220.959  (± 5.9%) i/s -      1.113k
       primepa? 17       13.241  (± 7.6%) i/s -     66.000 
       factorp 13       270.233  (± 4.1%) i/s -      1.368k
       factorp 17        15.394  (± 6.5%) i/s -     77.000 
  prime? [ruby lib]      65.363  (± 4.6%) i/s -    330.000 
prime_division [ruby lib]
                         62.487  (± 4.8%) i/s -    315.000 

9.0.5.0

prime tests for P = 982451653
                         16.149M (±13.1%) i/s -     78.755M
       Miller-Rabin      16.959k (± 5.6%) i/s -     84.622k
       primz7?            1.341k (± 4.7%) i/s -      6.786k
       primz7a?           1.711k (± 6.1%) i/s -      8.600k
       primz7b?           1.338k (± 5.5%) i/s -      6.708k
       primep? 13       257.417  (± 4.7%) i/s -      1.298k
       primep? 17        15.328  (± 6.5%) i/s -     77.000 
       primepa? 13      226.209  (± 4.9%) i/s -      1.134k
       primepa? 17       13.540  (± 7.4%) i/s -     68.000 
       factorp 13       260.059  (± 4.6%) i/s -      1.298k
       factorp 17        15.104  (± 6.6%) i/s -     76.000 
  prime? [ruby lib]     331.101  (± 6.6%) i/s -      1.649k
prime_division [ruby lib]
                        251.822  (± 8.3%) i/s -      1.260k

Most of the numbers are still slower on master. I'll poke at it a little today.

@headius
Copy link
Member

headius commented Jan 20, 2016

Numbers after a few rounds of this bench do improve.

1.7.21, with invokedynamic this time, 3rd iteration:

prime tests for P = 982451653
                         10.345M (±10.0%) i/s -     51.078M
       Miller-Rabin      26.281k (± 5.0%) i/s -    132.080k
       primz7?            1.654k (± 5.5%) i/s -      8.320k
       primz7a?           4.088k (± 4.5%) i/s -     20.628k
       primz7b?           3.338k (± 5.1%) i/s -     16.881k
       primep? 13       271.459  (± 5.9%) i/s -      1.352k
       primep? 17        15.266  (± 6.6%) i/s -     76.000 
       primepa? 13      227.191  (± 4.4%) i/s -      1.150k
       primepa? 17       13.390  (± 7.5%) i/s -     67.000 
       factorp 13       271.856  (± 4.4%) i/s -      1.375k
       factorp 17        15.901  (± 6.3%) i/s -     80.000 
  prime? [ruby lib]     117.186  (± 5.1%) i/s -    594.000 
prime_division [ruby lib]
                        115.230  (± 5.2%) i/s -    583.000 

9.0.5.0, with invokedynamic this time, 4th iteration:

prime tests for P = 982451653
                         21.162M (±12.2%) i/s -    103.438M
       Miller-Rabin      26.968k (± 3.9%) i/s -    136.296k
       primz7?            1.767k (± 3.7%) i/s -      8.904k
       primz7a?           3.730k (± 4.3%) i/s -     18.615k
       primz7b?           3.549k (± 4.9%) i/s -     17.850k
       primep? 13       248.643  (± 4.0%) i/s -      1.248k
       primep? 17        15.127  (± 6.6%) i/s -     76.000 
       primepa? 13      236.778  (± 4.6%) i/s -      1.188k
       primepa? 17       13.179  (± 7.6%) i/s -     66.000 
       factorp 13       244.663  (± 4.9%) i/s -      1.225k
       factorp 17        14.869  (± 6.7%) i/s -     75.000 
  prime? [ruby lib]     513.712  (± 4.7%) i/s -      2.601k
prime_division [ruby lib]
                        342.977  (± 5.8%) i/s -      1.728k

After warmup, it's kinda a toss-up between the two.

I suspect the final two may be faster because JRuby ships a newer version of the prime stdlib, and they should probably not be considered interesting here.

@headius
Copy link
Member

headius commented Jan 20, 2016

Since the original report was that JRuby 9k was 1.5 to 2.5x slower than 1.7, and I've shown above that it's more like 1.1x slower without warmup, and possibly the same or faster with warmup, I'm going to call this one fixed.

Obviously there's more we can do to improve performance, but at this point, for this example, I would say our regression is largely gone.

@headius headius closed this as completed Jan 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants