-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby 9.x slower than 1.7.21 when running Brakeman #3735
Comments
Looking into it. |
My own numbers: 9.1 --dev: 63.542s JRuby 9.1 with indy took much longer (105.145s), indicating that there's probably a bug in it somewhere. It should be investigated as well. With the --dev numbers so close together, it's looking like it may be JIT producing less-optimized code in 9k than 1.7 for something. |
Ah yes, my own experiments with invoke dynamic haven't ever shown any improvement. |
Unsurprisingly this shows most hot spots in the Ruby parser, especially the giant case/when method It also uses a lot of memory but bumping it beyond 500MB does not seem to change the timings much. |
I was able to get some improvement by modifying the non-indy JIT to do a better job of caching constants, but it's still off from 1.7 (35s to 30s in most recent runs). I think it might be useful to audit the non-indy JIT to see where it's falling down on optimizing like 1.7. |
After a quick review I have found one culprit: dynamic regular expressions. For the following code, 9k is 3-5x slower than 1.7: A, B, C, D, E, F = %w[1 2 3 4 5 1]
loop do
t = Time.now
i = 0; while i < 1_000_000; i+=1
case A
when /#{B}/o
when /#{C}/o
when /#{D}/o
when /#{E}/o
when /#{F}/o
end
end
puts Time.now - t
end |
So a few updates...
We're talking through some ideas to improve |
* Ensure only one regexp is ever cached for //o This is done using an AtomicReference in the compiled method for JVM6 and a field + atomic updater in the indy call site. In both cases, we may end up evaluating the operands twice, and the code that produced them may still run after caching (a bug, #2798), but we will at least guarantee to return exactly one regexp. * Add non-boxed paths to construct dregexp with up to 5 elements. * Add a ThreadContext-local Encoding[1] to use for encoding negotiation when preprocessing the dregexp elements. * If, at JIT time, a once-dregexp has already been encountered and cached in the instr, just emit that regexp directly into the bytecode. This new logic is faster than what we had before, likely because the locking I put in place for JVM6 was preventing the JVM from jitting (punted out with "COMPILE SKIPPED: invalid parsing" due to a flaw in my code). This new logic is lighter-weight and JITs fine. Given the benchmark from #3735: 9.0.5: 3.87s 9.1: 0.70s 1.7.24: 0.72s
For #3735, though it has only a minor impact on a scan bench.
Still more work to do. With my recent tweaks, 9.1 runs brakeman against redmine in 32s and 1.7.24 runs it in 31s. More to come. |
Wow, thanks @headius! |
@presidentbeef Thanks for reporting it! 9k still has some rough edges, perf-wise, but having real-world use cases helps us smooth them out sooner. |
If you want an even more intense test case, try scanning https://github.com/ManageIQ/manageiq |
Some of this is fixed in 9.1, and it should be about the same perf as 1.7. However it should be faster. We'll leave this open and do some additional optimization passes in 9.1.x. @presidentbeef I'd be interested to see if you get the same improvement in 9.1 that I do, if you feel like re-running some of your benchmarks. |
I'm seeing roughly one second slower in general, which is totally awesome 👍 |
A few recent discoveries. The largest issue definitely appears to be My first attempt to improve performance was to work on StringScanner, by creating a patch to cache Matcher objects. This required two patches: https://gist.github.com/headius/d30b0db212d89e81726dc5f336b9e7be Unfortunately this only had as small impact in perf, if any at all. The next thing I tried was hand-restructuring the
This brought the time to scan Redmine from 40s down to 34s on my oddly fickle machine. The biggest gain I found was not quite kosher. I modified the So there was one other discovery I made. I modified the brakeman bin script to keep running in a loop, reprocessing args and everything. JRuby with stock brakeman improved from 40s in the first iteration to 34s in the second and 30 in the third. JRuby with all the safe patches improved to mid-20s, and JRuby with the unsafe patch got down below 20s. That's at least good news for longer-running processes. |
Here's the safe part of the patches, the hand-massaging of |
Thank you for continuing to look into this! However, I am not able to reproduce your performance improvement with the Just FYI, you are right that it's the parsing that is under-performing. I broke down the timing for a scan of ManageIQ:
Once it gets going JRuby meets and then beats MRI performance. |
@presidentbeef Looping back to this again finally. So you say once JRuby gets going it meets and beats MRI performance. Does that mean we can call this fixed? Of course JRuby has a longer warm-up time, but there's only so much we can do about that on the JVM. I'm going to punt this to 9.1.4.0 for further discussion about what we need to improve. Parsing is obviously still an issue, and the size of the methods created by racc prevents us from optimizing well. |
Perf has not changed much since 9.1.3.0...my ratios are still about the same. Since this is a 1.7 migration issue I think we can safely move this to 9.2 and plan to do a pass over performance bugs for that release. |
Seems to have regressed in perf? 9.2 pre with jit: 50s Bumping. |
Apparently I have a slow machine...I see:
|
Running on newer JVMs, different machine...
Note:
|
Thank you for the update! Still a bit of a mystery what's slowest here but it's still likely an issue with a key parsing method not compiling. @enebo has been working on the compiler more lately, he may have some insight on how we could get some of these larger methods to compile in chunks, so we can at least get it out of the interpreter. |
Update with JRuby 9.2.8 master as of this moment (642a1e2)... Unfortunately Brakeman 3.2.1 appears to error on JRuby now. I'm not sure if this is due to Ruby language changes or due to some regression in JRuby, but we cannot test 3.2.1 on JRuby 9.2.8:
Also, Brakeman 4.6.1 depends on Ruby 2.3+, which means we can't test that on JRuby 1.7.27. So I have numbers comparing 4.6.1 on JRuby 9.2.8 with 3.2.1 on JRuby 1.7.27: JRuby 9.2.8 + Brakeman 4.6.1:
JRuby 1.7.27 + Brakeman 3.2.1:
I am not sure if this is a valid comparison, but if most of the critical/hot code paths are the same, JRuby 9.2.8 finally appears to be faster. |
Ok I tried Brakeman 3.2.1 on MRI 2.5.3 and it fails the same way. That issue, at least, seems to be in Brakeman 3.2.1. MRI 2.5.3 with Brakeman 4.6.1 is still faster than JRuby, but the ratio is better than the last time we updated this bug:
So with JRuby 9.2.8 at 28s and MRI at 15s, we're less than 2x slower. No combination has been that close. |
I think we can safely close this, since it seems like JRuby's ratio compared to MRI has improved well beyond where it was with JRuby 1.7. @presidentbeef If you would like to work with us in pursuing the MRI performance gap, I'd say open a new bug and we can start to break down the work necessary. It may come down to a critical method not getting jitted, as before, but we may be able to address it better now. |
The error is an incompatibility with sexp_processor 4.12.1. If you pin to I think 4.11 it should be okay. But in any case - thank you for continuing to follow up on this issue! Always glad to see improvements in JRuby. |
Just to round this out, I ran a couple longer scans (> 2 minutes) with Brakeman 4.6.1 using MRI 2.6.3 and JRuby 9.2.8.0. The timings were only a few seconds apart! Basically within margin of error. 👍 |
@presidentbeef thank you for following up! That's great to hear, since we have steadily been improving cold performance over the last year (and continue to do so!) |
Per https://twitter.com/headius/status/709447899902578689
Environment
Brakeman 3.2.1
Linux 4.1.15-desktop-2.mga5 SMP Wed Jan 20 17:05:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
These Ruby versions:
Expected Behavior
I expect (and hope) performance will improve with each release, or at least stay the same.
Actual Behavior
Scan times for Brakeman (and Brakeman Pro) have gotten slower.
I tested by scanning Redmine, which is decent size but not gigantic. Here I am reporting scan times (wall time) as reported by Brakeman, which excludes startup and report generation. This helps JRuby's numbers.
Command (after cloning Redmine):
brakeman -q --summary redmine
I averaged over five runs (they were pretty consistent).
Some things to consider:
I have played with different GCs to no real effect.
The text was updated successfully, but these errors were encountered: