JRuby 9.x slower than 1.7.21 when running Brakeman #3735

presidentbeef · 2016-03-14T21:33:25Z

Per https://twitter.com/headius/status/709447899902578689

Environment

Brakeman 3.2.1
Linux 4.1.15-desktop-2.mga5 SMP Wed Jan 20 17:05:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

These Ruby versions:

jruby 1.7.21 (1.9.3p551) 2015-07-07 a741a82 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 +jit [linux-amd64]
jruby 9.0.5.0 (2.2.3) 2016-01-26 7bee00d Java HotSpot(TM) 64-Bit Server VM 25.65-b01 on 1.8.0_65-b17 +jit [linux-amd64]
jruby 9.1.0.0-SNAPSHOT (2.3.0) 2016-03-14 79cfa58 Java HotSpot(TM) 64-Bit Server VM 25.65-b01 on 1.8.0_65-b17 +jit [linux-amd64]
MRI ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-linux]

Expected Behavior

I expect (and hope) performance will improve with each release, or at least stay the same.

Actual Behavior

Scan times for Brakeman (and Brakeman Pro) have gotten slower.

I tested by scanning Redmine, which is decent size but not gigantic. Here I am reporting scan times (wall time) as reported by Brakeman, which excludes startup and report generation. This helps JRuby's numbers.

Command (after cloning Redmine):

brakeman -q --summary redmine

I averaged over five runs (they were pretty consistent).

Version	Time (seconds)	% slower than MRI
9.1	36.823	80.2%
9.0.5.0	35.303	72.8%
1.7.21	28.570	39.8%
MRI 2.3.0	20.430	0

Some things to consider:

Brakeman tends to use a lot of memory
Brakeman allocates a lot of small objects
GC time is usually significant % of run time
There is some I/O going on

I have played with different GCs to no real effect.

The text was updated successfully, but these errors were encountered:

headius · 2016-03-14T22:56:59Z

Looking into it.

headius · 2016-03-14T23:07:20Z

My own numbers:

9.1 --dev: 63.542s
1.7.24 --dev: 60.829s
9.1 w/ JIT: 35.205s
1.7.24 w/ JIT: 27.164s

JRuby 9.1 with indy took much longer (105.145s), indicating that there's probably a bug in it somewhere. It should be investigated as well.

With the --dev numbers so close together, it's looking like it may be JIT producing less-optimized code in 9k than 1.7 for something.

presidentbeef · 2016-03-14T23:23:35Z

Ah yes, my own experiments with invoke dynamic haven't ever shown any improvement.

headius · 2016-03-14T23:36:02Z

Unsurprisingly this shows most hot spots in the Ruby parser, especially the giant case/when method next_token.

It also uses a lot of memory but bumping it beyond 500MB does not seem to change the timings much.

headius · 2016-03-15T00:15:32Z

I was able to get some improvement by modifying the non-indy JIT to do a better job of caching constants, but it's still off from 1.7 (35s to 30s in most recent runs). I think it might be useful to audit the non-indy JIT to see where it's falling down on optimizing like 1.7.

headius · 2016-03-15T00:34:56Z

After a quick review I have found one culprit: dynamic regular expressions. For the following code, 9k is 3-5x slower than 1.7:

A, B, C, D, E, F = %w[1 2 3 4 5 1]
loop do
  t = Time.now
  i = 0; while i < 1_000_000; i+=1
    case A
    when /#{B}/o
    when /#{C}/o
    when /#{D}/o
    when /#{E}/o
    when /#{F}/o
    end
  end
  puts Time.now - t
end

headius · 2016-03-15T16:45:50Z

So a few updates...

With locking in place (Dynamic "once" regexps are not as atomic as in MRI #2798) hotspot does not appear to like the code I'm emitting for //o regexps. However the locking I do have is not really working like it should.
The next_token method from parser is really big, and although we can emit JVM bytecode for it, Hotspot does not actually JIT it to native code. My best perf improvement so far has been from preventing next_token's compilation, since our interpreter appears to be faster than the JVM's at this code.
I'm not seeing much evidence of any bottlenecks in brakeman itself, and currently I'm focusing on the performance of parser.

We're talking through some ideas to improve //o execution.

* Ensure only one regexp is ever cached for //o This is done using an AtomicReference in the compiled method for JVM6 and a field + atomic updater in the indy call site. In both cases, we may end up evaluating the operands twice, and the code that produced them may still run after caching (a bug, #2798), but we will at least guarantee to return exactly one regexp. * Add non-boxed paths to construct dregexp with up to 5 elements. * Add a ThreadContext-local Encoding[1] to use for encoding negotiation when preprocessing the dregexp elements. * If, at JIT time, a once-dregexp has already been encountered and cached in the instr, just emit that regexp directly into the bytecode. This new logic is faster than what we had before, likely because the locking I put in place for JVM6 was preventing the JVM from jitting (punted out with "COMPILE SKIPPED: invalid parsing" due to a flaw in my code). This new logic is lighter-weight and JITs fine. Given the benchmark from #3735: 9.0.5: 3.87s 9.1: 0.70s 1.7.24: 0.72s

For #3735, though it has only a minor impact on a scan bench.

headius · 2016-03-16T06:53:38Z

Still more work to do. With my recent tweaks, 9.1 runs brakeman against redmine in 32s and 1.7.24 runs it in 31s. More to come.

presidentbeef · 2016-03-16T14:44:07Z

Wow, thanks @headius!

headius · 2016-03-17T14:03:46Z

@presidentbeef Thanks for reporting it! 9k still has some rough edges, perf-wise, but having real-world use cases helps us smooth them out sooner.

presidentbeef · 2016-03-17T15:34:08Z

If you want an even more intense test case, try scanning https://github.com/ManageIQ/manageiq

headius · 2016-04-20T20:07:26Z

Some of this is fixed in 9.1, and it should be about the same perf as 1.7. However it should be faster. We'll leave this open and do some additional optimization passes in 9.1.x.

@presidentbeef I'd be interested to see if you get the same improvement in 9.1 that I do, if you feel like re-running some of your benchmarks.

presidentbeef · 2016-05-06T22:15:12Z

I'm seeing roughly one second slower in general, which is totally awesome 👍

headius · 2016-05-20T07:16:53Z

A few recent discoveries.

The largest issue definitely appears to be next_token and downstream calls within ruby_parser (ruby_lexer.rex.rb). This method has a very large case/when with almost all the cases being StringScanner#scan or match? calls.

My first attempt to improve performance was to work on StringScanner, by creating a patch to cache Matcher objects. This required two patches: https://gist.github.com/headius/d30b0db212d89e81726dc5f336b9e7be

Unfortunately this only had as small impact in perf, if any at all.

The next thing I tried was hand-restructuring the next_token method. I made the following changes:

The smaller blocks of case/when moved to their own methods.
All the case/whens were converted to if/else.

This brought the time to scan Redmine from 40s down to 34s on my oddly fickle machine.

The biggest gain I found was not quite kosher. I modified the process_symbol method in ruby_lexer.rb to not unescape the symbol tokens. Performance dropped down to 24s, matching MRI on my machine. Now this was MRI with unmodified ruby_parser, but I see this as very illustrative...the unescaping loop is costing lots of time.

So there was one other discovery I made. I modified the brakeman bin script to keep running in a loop, reprocessing args and everything. JRuby with stock brakeman improved from 40s in the first iteration to 34s in the second and 30 in the third. JRuby with all the safe patches improved to mid-20s, and JRuby with the unsafe patch got down below 20s. That's at least good news for longer-running processes.

headius · 2016-05-20T07:18:37Z

Here's the safe part of the patches, the hand-massaging of ruby_lexer.rex.rb: https://gist.github.com/headius/a403d56a68d6c733f76b9ec539a23289

presidentbeef · 2016-05-20T19:26:51Z

Thank you for continuing to look into this!

However, I am not able to reproduce your performance improvement with the unescape call. I'm only seeing a couple seconds difference when unescape just returns its output.

Just FYI, you are right that it's the parsing that is under-performing. I broke down the timing for a scan of ManageIQ:

Operation	MRI 2.3.0	JRuby 9.1.1.0
gems	0.022	0.321
config	0.035	0.458
parsing	43.509	89.603
inits	0.070	0.29
libs	4.051	5.171
routes	0.216	0.249
templates	2.892	2.127
data_templates	10.411	8.822
models	26.387	20.568
controllers	13.591	10.313
data_controllers	207.698	137.969
indexing	14.378	10.932

Once it gets going JRuby meets and then beats MRI performance.

headius · 2016-08-22T16:38:00Z

@presidentbeef Looping back to this again finally.

So you say once JRuby gets going it meets and beats MRI performance. Does that mean we can call this fixed? Of course JRuby has a longer warm-up time, but there's only so much we can do about that on the JVM.

I'm going to punt this to 9.1.4.0 for further discussion about what we need to improve. Parsing is obviously still an issue, and the size of the methods created by racc prevents us from optimizing well.

headius · 2016-11-08T06:24:16Z

Perf has not changed much since 9.1.3.0...my ratios are still about the same. Since this is a 1.7 migration issue I think we can safely move this to 9.2 and plan to do a pass over performance bugs for that release.

headius · 2018-05-15T18:57:41Z

Seems to have regressed in perf?

9.2 pre with jit: 50s
1.7.26 with jit: 40s

Bumping.

presidentbeef · 2018-05-25T19:01:21Z

Apparently I have a slow machine...I see:

Version	Time
9.2.0.0	83s
1.7.26	74s
MRI 2.5.1	30s

jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 OpenJDK 64-Bit Server VM 25.171-b10 on 1.8.0_171-b10 +jit [linux-x86_64]

jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on OpenJDK 64-Bit Server VM 1.8.0_171-b10 +jit [linux-amd64]

presidentbeef · 2018-11-01T21:11:09Z

Running on newer JVMs, different machine...

Version	JVM	Time
9.2.0.0	HotSpot	54s
9.2.0.0	GraalVM	47s
1.7.26	HotSpot	49s
MRI 2.5.1		23s

Note: -Xcompile.invokedynamic made JRuby way slower (like 2-3x as long) on both VMs.

jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 Java HotSpot(TM) 64-Bit Server VM 10.0.1+10 on 10.0.1+10 +jit [darwin-x86_64]

jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 GraalVM 1.0.0-rc8 25.71-b01-internal-jvmci-0.48 on 1.8.0_172-20180626105433.graaluser.jdk8u-src-tar-g-b11 +jit [darwin-x86_64]

headius · 2018-11-03T06:17:30Z

Thank you for the update! Still a bit of a mystery what's slowest here but it's still likely an issue with a key parsing method not compiling. @enebo has been working on the compiler more lately, he may have some insight on how we could get some of these larger methods to compile in chunks, so we can at least get it out of the interpreter.

headius · 2019-08-08T22:03:38Z

Update with JRuby 9.2.8 master as of this moment (642a1e2)...

Unfortunately Brakeman 3.2.1 appears to error on JRuby now. I'm not sure if this is due to Ruby language changes or due to some regression in JRuby, but we cannot test 3.2.1 on JRuby 9.2.8:

[] ~/projects/jruby $ time jruby -v -S brakeman -q --summary ../redmine
jruby 9.2.8.0-SNAPSHOT (2.5.3) 2019-08-08 642a1e2 OpenJDK 64-Bit Server VM 25.202-b08 on 1.8.0_202-b08 +jit [darwin-x86_64]
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/brakeman/checks/check_dynamic_finders.rb:47: warning: Ambiguous first argument; make sure.
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/brakeman/processors/lib/processor_helper.rb:35: warning: instance variable @current_class not initialized
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/ruby_parser-3.8.4/lib/ruby_parser_extras.rb:819: warning: Unknown escape \A is ignored
ArgumentError: setting s(:args).line nil
                         line at /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/sexp_processor-4.12.1/lib/sexp.rb:214
                   deep_clone at /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/ruby_parser/bm_sexp.rb:42
                   deep_clone at /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/ruby_parser/bm_sexp.rb:31
                         each at org/jruby/RubyArray.java:1800

Also, Brakeman 4.6.1 depends on Ruby 2.3+, which means we can't test that on JRuby 1.7.27.

So I have numbers comparing 4.6.1 on JRuby 9.2.8 with 3.2.1 on JRuby 1.7.27:

JRuby 9.2.8 + Brakeman 4.6.1:

real	0m28.334s
user	1m24.947s
sys	0m2.825s

JRuby 1.7.27 + Brakeman 3.2.1:

real	0m50.628s
user	1m37.504s
sys	0m2.414s

I am not sure if this is a valid comparison, but if most of the critical/hot code paths are the same, JRuby 9.2.8 finally appears to be faster.

headius · 2019-08-08T22:10:30Z

Ok I tried Brakeman 3.2.1 on MRI 2.5.3 and it fails the same way. That issue, at least, seems to be in Brakeman 3.2.1.

MRI 2.5.3 with Brakeman 4.6.1 is still faster than JRuby, but the ratio is better than the last time we updated this bug:

real	0m15.865s
user	0m15.081s
sys	0m0.518s

So with JRuby 9.2.8 at 28s and MRI at 15s, we're less than 2x slower. No combination has been that close.

headius · 2019-08-08T22:17:16Z

I think we can safely close this, since it seems like JRuby's ratio compared to MRI has improved well beyond where it was with JRuby 1.7.

@presidentbeef If you would like to work with us in pursuing the MRI performance gap, I'd say open a new bug and we can start to break down the work necessary. It may come down to a critical method not getting jitted, as before, but we may be able to address it better now.

presidentbeef · 2019-08-08T22:35:12Z

The error is an incompatibility with sexp_processor 4.12.1. If you pin to I think 4.11 it should be okay.

But in any case - thank you for continuing to follow up on this issue! Always glad to see improvements in JRuby.

presidentbeef · 2019-08-19T16:57:16Z

Just to round this out, I ran a couple longer scans (> 2 minutes) with Brakeman 4.6.1 using MRI 2.6.3 and JRuby 9.2.8.0. The timings were only a few seconds apart! Basically within margin of error. 👍

headius · 2019-08-19T17:04:40Z

@presidentbeef thank you for following up! That's great to hear, since we have steadily been improving cold performance over the last year (and continue to do so!)

headius added a commit that referenced this issue Mar 16, 2016

Preallocate blocking task for strscan.

a1dbe82

For #3735, though it has only a minor impact on a scan bench.

kares added performance JRuby 9000 labels Mar 17, 2016

headius added this to the JRuby 9.1.1.0 milestone Apr 20, 2016

headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.2.0 May 11, 2016

enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016

headius modified the milestones: JRuby 9.2.0.0, JRuby 9.1.6.0 Nov 8, 2016

headius modified the milestones: JRuby 9.2.0.0, JRuby 9.2.1.0 May 15, 2018

headius modified the milestones: JRuby 9.2.1.0, JRuby 9.2.2.0 Oct 26, 2018

enebo modified the milestones: JRuby 9.2.2.0, JRuby 9.2.3.0, JRuby 9.2.4.0, JRuby 9.2.5.0 Nov 7, 2018

enebo modified the milestones: JRuby 9.2.5.0, JRuby 9.2.6.0 Dec 6, 2018

enebo modified the milestones: JRuby 9.2.6.0, JRuby 9.2.7.0 Dec 19, 2018

enebo modified the milestones: JRuby 9.2.7.0, JRuby 9.2.8.0 Jan 9, 2019

headius closed this as completed Aug 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

presidentbeef commented Mar 14, 2016

headius commented Mar 14, 2016

headius commented Mar 14, 2016

presidentbeef commented Mar 14, 2016

headius commented Mar 14, 2016

headius commented Mar 15, 2016

headius commented Mar 15, 2016

headius commented Mar 15, 2016

headius commented Mar 16, 2016

presidentbeef commented Mar 16, 2016

headius commented Mar 17, 2016

presidentbeef commented Mar 17, 2016

headius commented Apr 20, 2016

presidentbeef commented May 6, 2016

headius commented May 20, 2016

headius commented May 20, 2016

presidentbeef commented May 20, 2016

headius commented Aug 22, 2016

headius commented Nov 8, 2016

headius commented May 15, 2018

presidentbeef commented May 25, 2018

presidentbeef commented Nov 1, 2018

headius commented Nov 3, 2018

headius commented Aug 8, 2019

headius commented Aug 8, 2019

headius commented Aug 8, 2019

presidentbeef commented Aug 8, 2019

presidentbeef commented Aug 19, 2019

headius commented Aug 19, 2019

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

Comments

presidentbeef commented Mar 14, 2016

Environment

Expected Behavior

Actual Behavior

headius commented Mar 14, 2016

headius commented Mar 14, 2016

presidentbeef commented Mar 14, 2016

headius commented Mar 14, 2016

headius commented Mar 15, 2016

headius commented Mar 15, 2016

headius commented Mar 15, 2016

headius commented Mar 16, 2016

presidentbeef commented Mar 16, 2016

headius commented Mar 17, 2016

presidentbeef commented Mar 17, 2016

headius commented Apr 20, 2016

presidentbeef commented May 6, 2016

headius commented May 20, 2016

headius commented May 20, 2016

presidentbeef commented May 20, 2016

headius commented Aug 22, 2016

headius commented Nov 8, 2016

headius commented May 15, 2018

presidentbeef commented May 25, 2018

presidentbeef commented Nov 1, 2018

headius commented Nov 3, 2018

headius commented Aug 8, 2019

headius commented Aug 8, 2019

headius commented Aug 8, 2019

presidentbeef commented Aug 8, 2019

presidentbeef commented Aug 19, 2019

headius commented Aug 19, 2019