prime? very slow #3649

giuan · 2016-02-06T21:50:37Z

The method prime? is 20x slower then MRI.
But a method implemented by myself give the same time as MRI prime? in MRI and jruby.
I think the problem is in the library prime.
Try

require 'prime'
require 'benchmark'
n = 1090109110921093
Benchmark.bm do |r|
  4.times do
    r.report {puts n.prime?}
  end
end

Tiziano Merzi

P.S.
jruby 9.0.5.0 on MAC

The text was updated successfully, but these errors were encountered:

headius · 2016-02-06T22:23:23Z

Wow. Ok, I'll look into this. Definitely something wrong.

headius · 2016-02-06T22:26:49Z

It looks like this is never getting out of the (simple) interpreter:

     Compiled + native   Method                        
 28.9%  1649  +     8    org.jruby.ir.interpreter.StartupInterpreterEngine.interpret
 25.5%  1460  +     0    org.jruby.ir.interpreter.StartupInterpreterEngine.processOtherOp
 10.8%   620  +     0    org.jruby.ir.IRScope.getFileName
  6.8%   390  +     0    org.jruby.runtime.BlockBody.yield
  6.1%   351  +     0    org.jruby.runtime.callsite.CachingCallSite.call
  6.1%   348  +     0    org.jruby.ir.interpreter.InterpreterEngine.processCall
  4.4%   251  +     0    org.jruby.ir.instructions.BFalseInstr.interpretAndGetNewIPC
  3.9%   224  +     0    java.lang.invoke.LambdaForm$DMH.invokeStatic_L6_L
  1.4%    77  +     1    org.jruby.ir.operands.LocalVariable.retrieve
  1.2%    67  +     1    org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath
  0.6%    36  +     0    org.jruby.RubyKernel.loop

headius · 2016-02-06T22:32:59Z

It appears that at least part of this bad benchmark result is due to the fact that most of the hot call path is not compiling before the benchmark completes. If I force JIT to happen sooner, the numbers improve (but still don't beat MRI).

It's not clear to me yet which methods are getting called.

headius · 2016-02-06T22:36:25Z

Ok, it looks like Ruby 2.3 improved the performance of prime.rb. JRuby 9.0.5.0 roughly matches MRI 2.2 performance, and JRuby with 2.3 stdlib roughly matches MRI 2.3 performance:

Ruby 2.2 stdlib:

[] ~/projects/jruby $ jruby -Xjit.threshold=0 -Xcompile.invokedynamic bench_prime.rb 
       user     system      total        real
 true
  3.170000   0.110000   3.280000 (  2.220189)
 true
  2.520000   0.030000   2.550000 (  2.033816)
 true
  2.100000   0.010000   2.110000 (  1.926965)
 true
  1.930000   0.010000   1.940000 (  1.879152)

[] ~/projects/jruby $ ruby22 bench_prime.rb 
       user     system      total        real
 true
  3.530000   0.010000   3.540000 (  3.533079)
 true
  3.530000   0.000000   3.530000 (  3.530615)
 true
  3.590000   0.010000   3.600000 (  3.596020)
 true
  3.430000   0.000000   3.430000 (  3.436481)

Ruby 2.3 stdlib

[] ~/projects/jruby $ jruby -I ../ruby/lib -Xjit.threshold=0 -Xcompile.invokedynamic bench_prime.rb 
       user     system      total        real
 true
  1.910000   0.100000   2.010000 (  0.934381)
 true
  0.700000   0.010000   0.710000 (  0.631105)
 true
  1.030000   0.010000   1.040000 (  0.748262)
 true
  0.630000   0.010000   0.640000 (  0.610905)

[] ~/projects/jruby $ ruby23 bench_prime.rb 
       user     system      total        real
 true
  0.640000   0.000000   0.640000 (  0.637603)
 true
  0.630000   0.000000   0.630000 (  0.629570)
 true
  0.640000   0.000000   0.640000 (  0.637175)
 true
  0.630000   0.000000   0.630000 (  0.636287)

giuan · 2016-02-06T22:43:51Z

This is the link
https://gist.github.com/giuan/f2176226e253d8620e0a
In jruby prime? is 20x slower then my method

headius · 2016-02-06T22:46:14Z

Ok, so other than the 2.2/2.3 stdlib issue, the remaining problem here is that a single long-running method does not trigger JIT.

We don't have a good way to do on-stack replacement, so the best we could do would be to trigger JIT of a method that ran longer than some expectation.

Below is a patch for this, but it's not really accurate because it doesn't exclude child calls.

diff --git a/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java b/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
index 2cdeaf0..7aeee01 100644
--- a/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
+++ b/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
@@ -97,7 +97,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, args, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, args, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, args, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -132,7 +139,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -167,7 +181,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -202,7 +223,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, arg1, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -237,7 +265,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, arg1, arg2, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, arg2, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, arg2, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

I'm not sure what priority to give this.

giuan · 2016-02-06T22:52:07Z

But if try https://gist.github.com/giuan/f2176226e253d8620e0a
my method is compiled. I have 0.9 sec. for MRI 2.3 and 1.1 for jruby

headius · 2016-02-07T10:54:32Z

0.9s versus 1.1s is not bad. And for me, the difference is greater. JRuby wins for both your prime? implementation and for the stdlib version (I set JIT threshold to 0 here to force it to compile right away).

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic -Xjit.threshold=0 -I ../ruby/lib bench_prime2.rb 
       user     system      total        real
 true
  1.560000   0.090000   1.650000 (  0.861703)
 true
  1.110000   0.020000   1.130000 (  0.754386)
 true
  0.700000   0.010000   0.710000 (  0.660860)
 true
  0.960000   0.010000   0.970000 (  0.738980)
 true
  0.780000   0.010000   0.790000 (  0.650650)
 true
  0.530000   0.010000   0.540000 (  0.426186)
 true
  0.650000   0.000000   0.650000 (  0.618836)
 true
  0.420000   0.010000   0.430000 (  0.395621)
 true
  0.640000   0.010000   0.650000 (  0.608281)
 true
  0.420000   0.000000   0.420000 (  0.392981)
 true
  0.650000   0.010000   0.660000 (  0.617567)
 true
  0.430000   0.010000   0.440000 (  0.401770)
 true
  0.660000   0.000000   0.660000 (  0.640085)
 true
  0.430000   0.010000   0.440000 (  0.397853)
 true
  0.750000   0.010000   0.760000 (  0.607031)
 true
  0.450000   0.010000   0.460000 (  0.432919)
 true
  0.670000   0.000000   0.670000 (  0.637849)
 true
  0.410000   0.010000   0.420000 (  0.381010)
 true
  0.620000   0.000000   0.620000 (  0.596011)
 true
  0.500000   0.010000   0.510000 (  0.407787)

[] ~/projects/jruby $ ruby23 -I ../ruby/lib bench_prime2.rb 
       user     system      total        real
 true
  0.730000   0.000000   0.730000 (  0.733017)
 true
  0.710000   0.000000   0.710000 (  0.706568)
 true
  0.690000   0.000000   0.690000 (  0.691615)
 true
  0.680000   0.010000   0.690000 (  0.682329)
 true
  0.620000   0.000000   0.620000 (  0.619963)
 true
  0.670000   0.000000   0.670000 (  0.666560)
 true
  0.650000   0.000000   0.650000 (  0.652076)
 true
  0.680000   0.000000   0.680000 (  0.682871)
 true
  0.640000   0.000000   0.640000 (  0.640788)
 true
  0.700000   0.000000   0.700000 (  0.702633)
 true
  0.740000   0.000000   0.740000 (  0.741795)
 true
  0.700000   0.000000   0.700000 (  0.700159)
 true
  0.680000   0.000000   0.680000 (  0.684462)
 true
  0.730000   0.010000   0.740000 (  0.727324)
 true
  0.670000   0.000000   0.670000 (  0.676201)
 true
  0.710000   0.000000   0.710000 (  0.709593)
 true
  0.680000   0.000000   0.680000 (  0.681053)
 true
  0.720000   0.000000   0.720000 (  0.719188)
 true
  0.670000   0.000000   0.670000 (  0.677244)
 true
  0.720000   0.000000   0.720000 (  0.719593)

headius · 2016-02-08T08:25:11Z

@enebo @subbuss Any thoughts on triggering JIT sooner than 50 calls when we have a hot loop? I know this falls into the "let's add OSR because benchmarks look better" world, but I wonder if we can at least accelerate the JIT promotion in these cases.

enebo · 2016-02-08T14:59:53Z

Well nothing in inlining branch changes when/how we JIT methods (read till end :) ), but we do keep track of checking for hot callsites via thread_polling stats. This could also be used for methods which trigger the most thread_polls. Will that be faster than 50 calls? I don't know, but it seems reasonable to me. fwiw, when inlining+profiling is enabled if we detect a site in that loopy method that needs to be inlined we actually compile the surrounding method. In that sense, how profiling works now might end up compiling earlier...assuming there is a hot ruby-only method in the loop.

headius · 2016-03-08T00:03:43Z

Ok, I did find a regression for #3626 that prevented blocks from jitting. This was the source of the performance drop in #3626 going from 9.0.4.0 to 9.0.5.0 and it is now fixed. I believe the benchmarks here are back where they should be.

headius added jit performance ir JRuby 9000 labels Mar 3, 2016

headius closed this as completed Mar 8, 2016

headius added this to the JRuby 9.1.0.0 milestone Mar 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

prime? very slow #3649

prime? very slow #3649

giuan commented Feb 6, 2016

headius commented Feb 6, 2016

headius commented Feb 6, 2016

headius commented Feb 6, 2016

headius commented Feb 6, 2016

giuan commented Feb 6, 2016

headius commented Feb 6, 2016

giuan commented Feb 6, 2016

headius commented Feb 7, 2016

headius commented Feb 8, 2016

enebo commented Feb 8, 2016

headius commented Mar 8, 2016

prime? very slow #3649

prime? very slow #3649

Comments

giuan commented Feb 6, 2016

headius commented Feb 6, 2016

headius commented Feb 6, 2016

headius commented Feb 6, 2016

headius commented Feb 6, 2016

giuan commented Feb 6, 2016

headius commented Feb 6, 2016

giuan commented Feb 6, 2016

headius commented Feb 7, 2016

headius commented Feb 8, 2016

enebo commented Feb 8, 2016

headius commented Mar 8, 2016