-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby 9k: simple script ~80% as fast as jruby 1.7.19 #2761
Comments
Looks like it's limited to the JVM6 JIT. The indy JIT is actually much faster than 1.7:
|
I've spent some time playing with this and found a few improvements, but nothing that fixes the benchmark. First improvement was from discovering that literal fixnum operations (like
When I modify the benchmark to have a 1M while loop inside the ips report section, we see a similar ratio:
And if I put the 1M loop inside the method being benchmarked, the performance difference goes away:
My current theory is that this is a performance problem calling through a MethodHandle for every jitted method. In order to avoid generating invoker stubs for compiled methods, CompilerIRMethod always uses MethodHandle to do dispatch. This simplifies codegen but appears to introduce enough overhead to tilt the benchmark. A final version doing 1M calls to an empty method (returning self, which is basically free) shows how much of a hit:
Minus the cost of the control loop (which is obviously quite fast), dispatching to a jitted method via a MethodHandle seems to be around 2x slower than the generated invokers we used before. So we may need to explore generating a full DynamicMethod wrapper around jitted bodies for the JVM6 mode. This is unfortunate, but having doubled dispatch cost for jitted methods compared to 1.7 is not really acceptable. Note that none of this affects invokedynamic mode (generally speaking) because indy call sites will bind the handle directly, and the JVM should optimize it properly. |
Current ideas for eliminating or improving MethodHandle dispatch in non-indy JIT:
This would allow the handle to optimize in-place, but it would require generating quite a bit more bytecode. CachingCallSite would still be used to cache, but instead of dispatching through it or through DynamicMethod.call, we would dispatch directly to the handle.
This would be roughly equivalent to what 1.7 does but perhaps a bit less bytecode because only a single class would be generated. This does not address block bodies, which are normally generated into the same class as the method body, and that class can only be a CompiledIRMethod or a CompiledIRBlockBody.
This won't solve it in the short term, but I'm going to let the JVM folks responsible for MethodHandle performance have a crack at improving it. |
Fascinating to follow your investigation Charlie - thanks :) Was invokedynamic disabled for Java 7? Just wondering why my JRuby doesn't seem to use indy by default :) Cheers, |
No version of JRuby uses indy by default right now due to longer startup times. |
John Rose requested I add assembly to this bug report, so here it is: https://gist.github.com/headius/836ed0d73647fce6da15 I believe I have captured the code for the generated and compiled MethodHandle, the target method, and the actual call to MethodHandle.invokeExact. The invoke appears to inline up to invokeBasic, at which point it does a virtual invocation. The jitted handle also has several call instructions in it and does not inline the target method body. The calls also do not appear to go directly to the target method body, so there's at least one more layer here I'm not seeing. |
For what it's worth, this is still slower on 9.1.6.0, also with JDK 8.
|
I just tested this with JRuby master (9.1.7.0) on JDK 8 (8u111 or so) and the warmed-up results barely differ from 1.7.25:
JRuby 1.7 appears to warm up a little faster, or perhaps it's just faster in the non-jitted form (which would make sense...IR has remained a bit slower than AST interpreter), but the warmed-up results are very close...perhaps 3-5% faster in 1.7. Interestingly, the numbers with invokedynamic enabled are significantly faster in JRuby 9.1.7.0 than in 1.7.25:
I think we can close this. Seems like we've made enough small improvements in the past few months to largely eliminate the gap. |
Cool, thanks Charlie! :) |
This might be related to #2544 - this is based on a benchmark @jasonrclark wrote for shoes4 and I abstracted away everything so what we are left with is fairly simple, a value assignment and returning it:
And that script is down to 80% of the 1.7.19 performance on 9k:
Rather simple scripts are important to us as we do a ton of small calculations around the dimensions of elements to position them. :)
As always thanks for all your work on JRuby ❤️
Tobi
The text was updated successfully, but these errors were encountered: