-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby 9.0.1.0 has slower $global_var usage #3350
Comments
The perf impact here seems to be mostly the writes; I split your benchmark into reads and writes and global var reads are about the same perf as all the others (and probably lost in the noise of the loop): 9k:
1.7.21:
Note, however, that 1.7 also has some invokedynamic optimizations for reads to make them faster. I suspect that's lost in the loop noise too, but here's those numbers: 1.7.21 with indy, first iteration
However, globals that are written repeatedly destroy this optimization (so we don't constantly try to keep rebinding it) and the second iteration goes back to the same perf: 1.7.21 with indy, second iteration
Ultimately 9k does not perform much differently than 1.7 (other than the missing indy optimization) but I think it's worth taking a few minutes to improve this. |
This is refreshing to hear. I had been worried when first I ran this test; Not sure why I didn't think to check that. Thank you!
|
This improves global read perf for #3350 but global variable writes still seem to be slower than MRI. I'm not exactly sure why but it probably has to do with doing a full lookup for our too- abstract wrapper around these values.
With f2612a2, 9k should have the same perf for globals as 1.7.x. Why our globals are slower to write than MRI's...that's still an open question. |
Note to self: include modified benchmarks in the original report. I suspect the cost of writing is that even after we've given up on caching the global value, we continue to invalidate a SwitchPoint for the indy logic in the global variable. This is certainly not free, because invalidating it means we also construct a new one. I think the right way to fix this would be to also mark the variable as "no longer cacheable" which in turn means "don't bother invalidating". I'll try to fix this tomorrow. |
Split up read/write benchmark. Note that the reads of some of these types of variables will be optimized completely away by either our IR or by the JVM, so they're really just measuring the block dispatch overhead. require 'benchmark'
LOOPS = 30000000
$global = 0
@outer_inst = 0
outer_scope = 0
loop { Benchmark.bmbm {|test|
@inner_inst = 0
inner_scope = 0
test.report('$global w'){ LOOPS.times { $global = 1 } }
test.report('@outer_inst w'){ LOOPS.times { @outer_inst = 1 } }
test.report('outer_scope w'){ LOOPS.times { outer_scope = 1 } }
test.report('@inner_inst w'){ LOOPS.times { @inner_inst = 1 } }
test.report('inner_scope w'){ LOOPS.times { inner_scope = 1 } }
test.report('local_scope w'){ local_scope = 0; LOOPS.times { local_scope = 1 } }
test.report('$global r'){ LOOPS.times { $global } }
test.report('@outer_inst r'){ LOOPS.times { @outer_inst } }
test.report('outer_scope r'){ LOOPS.times { outer_scope } }
test.report('@inner_inst r'){ LOOPS.times { @inner_inst } }
test.report('inner_scope r'){ LOOPS.times { inner_scope } }
test.report('local_scope r'){ local_scope = 0; LOOPS.times { local_scope } }
} } |
I was wrong; the overhead is not from constantly re-invalidating -- we have a failover point at which we just leave it permanently invalid -- but from doing a slow hash lookup of the variable every time. Since this affects all globals, including some that are expected to be updated if you're using them at all, I'll make appropriate modifications to avoid this lookup. |
Calling this one done. Global variable reads cache their value unless invalidated too many times (set by -Xinvokedynamic.global.maxfail=100) and cache the wrapper object after that to avoid hash lookup overhead. Global variable sets cache the wrapper object. That's about as cheap as I can make these right now. |
Thanks for this! Just saw your progress and I'm glad to hear there's a good resolution. |
Using a (very simple) benchmark to test how JRuby handles differing scopes for variables, I found that unlike MRI 2.2.2, JRuby 9.0.1.0 was performing far slower than other variable types / scopes with globals.
The test code is as follows:
The results for JRuby:
jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [Windows 7-amd64]
And MRI:
ruby 2.2.2p95 (2015-04-13 revision 50295) [x64-mingw32]
It seems that while MRI is fairly normalized across the board, JRuby only shows relatively equal times for local, inner, and instance variable scopes. In particular, JRuby's time for globals is ~226% of the average of all other tested scopes.
The good news, however, is that JRuby's jitting is definitely resulting in faster overall performance.
The text was updated successfully, but these errors were encountered: