Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

Closed
presidentbeef opened this issue Mar 14, 2016 · 33 comments
Closed

JRuby 9.x slower than 1.7.21 when running Brakeman #3735

presidentbeef opened this issue Mar 14, 2016 · 33 comments

Comments

@presidentbeef
Copy link

Per https://twitter.com/headius/status/709447899902578689

Environment

Brakeman 3.2.1
Linux 4.1.15-desktop-2.mga5 SMP Wed Jan 20 17:05:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

These Ruby versions:

  • jruby 1.7.21 (1.9.3p551) 2015-07-07 a741a82 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 +jit [linux-amd64]
  • jruby 9.0.5.0 (2.2.3) 2016-01-26 7bee00d Java HotSpot(TM) 64-Bit Server VM 25.65-b01 on 1.8.0_65-b17 +jit [linux-amd64]
  • jruby 9.1.0.0-SNAPSHOT (2.3.0) 2016-03-14 79cfa58 Java HotSpot(TM) 64-Bit Server VM 25.65-b01 on 1.8.0_65-b17 +jit [linux-amd64]
  • MRI ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-linux]

Expected Behavior

I expect (and hope) performance will improve with each release, or at least stay the same.

Actual Behavior

Scan times for Brakeman (and Brakeman Pro) have gotten slower.

I tested by scanning Redmine, which is decent size but not gigantic. Here I am reporting scan times (wall time) as reported by Brakeman, which excludes startup and report generation. This helps JRuby's numbers.

Command (after cloning Redmine):

brakeman -q --summary redmine

I averaged over five runs (they were pretty consistent).

Version Time (seconds) % slower than MRI
9.1 36.823 80.2%
9.0.5.0 35.303 72.8%
1.7.21 28.570 39.8%
MRI 2.3.0 20.430 0

Some things to consider:

  • Brakeman tends to use a lot of memory
  • Brakeman allocates a lot of small objects
  • GC time is usually significant % of run time
  • There is some I/O going on

I have played with different GCs to no real effect.

@headius
Copy link
Member

headius commented Mar 14, 2016

Looking into it.

@headius
Copy link
Member

headius commented Mar 14, 2016

My own numbers:

9.1 --dev: 63.542s
1.7.24 --dev: 60.829s
9.1 w/ JIT: 35.205s
1.7.24 w/ JIT: 27.164s

JRuby 9.1 with indy took much longer (105.145s), indicating that there's probably a bug in it somewhere. It should be investigated as well.

With the --dev numbers so close together, it's looking like it may be JIT producing less-optimized code in 9k than 1.7 for something.

@presidentbeef
Copy link
Author

Ah yes, my own experiments with invoke dynamic haven't ever shown any improvement.

@headius
Copy link
Member

headius commented Mar 14, 2016

Unsurprisingly this shows most hot spots in the Ruby parser, especially the giant case/when method next_token.

It also uses a lot of memory but bumping it beyond 500MB does not seem to change the timings much.

@headius
Copy link
Member

headius commented Mar 15, 2016

I was able to get some improvement by modifying the non-indy JIT to do a better job of caching constants, but it's still off from 1.7 (35s to 30s in most recent runs). I think it might be useful to audit the non-indy JIT to see where it's falling down on optimizing like 1.7.

@headius
Copy link
Member

headius commented Mar 15, 2016

After a quick review I have found one culprit: dynamic regular expressions. For the following code, 9k is 3-5x slower than 1.7:

A, B, C, D, E, F = %w[1 2 3 4 5 1]
loop do
  t = Time.now
  i = 0; while i < 1_000_000; i+=1
    case A
    when /#{B}/o
    when /#{C}/o
    when /#{D}/o
    when /#{E}/o
    when /#{F}/o
    end
  end
  puts Time.now - t
end

@headius
Copy link
Member

headius commented Mar 15, 2016

So a few updates...

  • With locking in place (Dynamic "once" regexps are not as atomic as in MRI #2798) hotspot does not appear to like the code I'm emitting for //o regexps. However the locking I do have is not really working like it should.
  • The next_token method from parser is really big, and although we can emit JVM bytecode for it, Hotspot does not actually JIT it to native code. My best perf improvement so far has been from preventing next_token's compilation, since our interpreter appears to be faster than the JVM's at this code.
  • I'm not seeing much evidence of any bottlenecks in brakeman itself, and currently I'm focusing on the performance of parser.

We're talking through some ideas to improve //o execution.

headius added a commit that referenced this issue Mar 16, 2016
* Ensure only one regexp is ever cached for //o

This is done using an AtomicReference in the compiled method for
JVM6 and a field + atomic updater in the indy call site. In both
cases, we may end up evaluating the operands twice, and the code
that produced them may still run after caching (a bug,
#2798), but we will at least guarantee to return
exactly one regexp.

* Add non-boxed paths to construct dregexp with up to 5 elements.

* Add a ThreadContext-local Encoding[1] to use for encoding
  negotiation when preprocessing the dregexp elements.

* If, at JIT time, a once-dregexp has already been encountered and
  cached in the instr, just emit that regexp directly into the
  bytecode.

This new logic is faster than what we had before, likely because
the locking I put in place for JVM6 was preventing the JVM from
jitting (punted out with "COMPILE SKIPPED: invalid parsing" due
to a flaw in my code). This new logic is lighter-weight and JITs
fine. Given the benchmark from #3735:

9.0.5:  3.87s
9.1:    0.70s
1.7.24: 0.72s
headius added a commit that referenced this issue Mar 16, 2016
For #3735, though it has only a minor impact on a scan bench.
@headius
Copy link
Member

headius commented Mar 16, 2016

Still more work to do. With my recent tweaks, 9.1 runs brakeman against redmine in 32s and 1.7.24 runs it in 31s. More to come.

@presidentbeef
Copy link
Author

Wow, thanks @headius!

@headius
Copy link
Member

headius commented Mar 17, 2016

@presidentbeef Thanks for reporting it! 9k still has some rough edges, perf-wise, but having real-world use cases helps us smooth them out sooner.

@presidentbeef
Copy link
Author

If you want an even more intense test case, try scanning https://github.com/ManageIQ/manageiq

@headius headius added this to the JRuby 9.1.1.0 milestone Apr 20, 2016
@headius
Copy link
Member

headius commented Apr 20, 2016

Some of this is fixed in 9.1, and it should be about the same perf as 1.7. However it should be faster. We'll leave this open and do some additional optimization passes in 9.1.x.

@presidentbeef I'd be interested to see if you get the same improvement in 9.1 that I do, if you feel like re-running some of your benchmarks.

@presidentbeef
Copy link
Author

I'm seeing roughly one second slower in general, which is totally awesome 👍

@headius headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.2.0 May 11, 2016
@headius
Copy link
Member

headius commented May 20, 2016

A few recent discoveries.

The largest issue definitely appears to be next_token and downstream calls within ruby_parser (ruby_lexer.rex.rb). This method has a very large case/when with almost all the cases being StringScanner#scan or match? calls.

My first attempt to improve performance was to work on StringScanner, by creating a patch to cache Matcher objects. This required two patches: https://gist.github.com/headius/d30b0db212d89e81726dc5f336b9e7be

Unfortunately this only had as small impact in perf, if any at all.

The next thing I tried was hand-restructuring the next_token method. I made the following changes:

  • The smaller blocks of case/when moved to their own methods.
  • All the case/whens were converted to if/else.

This brought the time to scan Redmine from 40s down to 34s on my oddly fickle machine.

The biggest gain I found was not quite kosher. I modified the process_symbol method in ruby_lexer.rb to not unescape the symbol tokens. Performance dropped down to 24s, matching MRI on my machine. Now this was MRI with unmodified ruby_parser, but I see this as very illustrative...the unescaping loop is costing lots of time.

So there was one other discovery I made. I modified the brakeman bin script to keep running in a loop, reprocessing args and everything. JRuby with stock brakeman improved from 40s in the first iteration to 34s in the second and 30 in the third. JRuby with all the safe patches improved to mid-20s, and JRuby with the unsafe patch got down below 20s. That's at least good news for longer-running processes.

@headius
Copy link
Member

headius commented May 20, 2016

Here's the safe part of the patches, the hand-massaging of ruby_lexer.rex.rb: https://gist.github.com/headius/a403d56a68d6c733f76b9ec539a23289

@presidentbeef
Copy link
Author

Thank you for continuing to look into this!

However, I am not able to reproduce your performance improvement with the unescape call. I'm only seeing a couple seconds difference when unescape just returns its output.

Just FYI, you are right that it's the parsing that is under-performing. I broke down the timing for a scan of ManageIQ:

Operation MRI 2.3.0 JRuby 9.1.1.0
gems 0.022 0.321
config 0.035 0.458
parsing 43.509 89.603
inits 0.070 0.29
libs 4.051 5.171
routes 0.216 0.249
templates 2.892 2.127
data_templates 10.411 8.822
models 26.387 20.568
controllers 13.591 10.313
data_controllers 207.698 137.969
indexing 14.378 10.932

Once it gets going JRuby meets and then beats MRI performance.

@enebo enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016
@headius
Copy link
Member

headius commented Aug 22, 2016

@presidentbeef Looping back to this again finally.

So you say once JRuby gets going it meets and beats MRI performance. Does that mean we can call this fixed? Of course JRuby has a longer warm-up time, but there's only so much we can do about that on the JVM.

I'm going to punt this to 9.1.4.0 for further discussion about what we need to improve. Parsing is obviously still an issue, and the size of the methods created by racc prevents us from optimizing well.

@headius
Copy link
Member

headius commented Nov 8, 2016

Perf has not changed much since 9.1.3.0...my ratios are still about the same. Since this is a 1.7 migration issue I think we can safely move this to 9.2 and plan to do a pass over performance bugs for that release.

@headius headius modified the milestones: JRuby 9.2.0.0, JRuby 9.1.6.0 Nov 8, 2016
@headius
Copy link
Member

headius commented May 15, 2018

Seems to have regressed in perf?

9.2 pre with jit: 50s
1.7.26 with jit: 40s

Bumping.

@headius headius modified the milestones: JRuby 9.2.0.0, JRuby 9.2.1.0 May 15, 2018
@presidentbeef
Copy link
Author

Apparently I have a slow machine...I see:

Version Time
9.2.0.0 83s
1.7.26 74s
MRI 2.5.1 30s
jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 OpenJDK 64-Bit Server VM 25.171-b10 on 1.8.0_171-b10 +jit [linux-x86_64]

jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on OpenJDK 64-Bit Server VM 1.8.0_171-b10 +jit [linux-amd64]

@headius headius modified the milestones: JRuby 9.2.1.0, JRuby 9.2.2.0 Oct 26, 2018
@presidentbeef
Copy link
Author

Running on newer JVMs, different machine...

Version JVM Time
9.2.0.0 HotSpot 54s
9.2.0.0 GraalVM 47s
1.7.26 HotSpot 49s
MRI 2.5.1 23s

Note: -Xcompile.invokedynamic made JRuby way slower (like 2-3x as long) on both VMs.

jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 Java HotSpot(TM) 64-Bit Server VM 10.0.1+10 on 10.0.1+10 +jit [darwin-x86_64]

jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 GraalVM 1.0.0-rc8 25.71-b01-internal-jvmci-0.48 on 1.8.0_172-20180626105433.graaluser.jdk8u-src-tar-g-b11 +jit [darwin-x86_64]

@headius
Copy link
Member

headius commented Nov 3, 2018

Thank you for the update! Still a bit of a mystery what's slowest here but it's still likely an issue with a key parsing method not compiling. @enebo has been working on the compiler more lately, he may have some insight on how we could get some of these larger methods to compile in chunks, so we can at least get it out of the interpreter.

@headius
Copy link
Member

headius commented Aug 8, 2019

Update with JRuby 9.2.8 master as of this moment (642a1e2)...

Unfortunately Brakeman 3.2.1 appears to error on JRuby now. I'm not sure if this is due to Ruby language changes or due to some regression in JRuby, but we cannot test 3.2.1 on JRuby 9.2.8:

[] ~/projects/jruby $ time jruby -v -S brakeman -q --summary ../redmine
jruby 9.2.8.0-SNAPSHOT (2.5.3) 2019-08-08 642a1e2 OpenJDK 64-Bit Server VM 25.202-b08 on 1.8.0_202-b08 +jit [darwin-x86_64]
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/brakeman/checks/check_dynamic_finders.rb:47: warning: Ambiguous first argument; make sure.
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/brakeman/processors/lib/processor_helper.rb:35: warning: instance variable @current_class not initialized
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/ruby_parser-3.8.4/lib/ruby_parser_extras.rb:819: warning: Unknown escape \A is ignored
ArgumentError: setting s(:args).line nil
                         line at /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/sexp_processor-4.12.1/lib/sexp.rb:214
                   deep_clone at /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/ruby_parser/bm_sexp.rb:42
                   deep_clone at /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/brakeman-3.2.1/lib/ruby_parser/bm_sexp.rb:31
                         each at org/jruby/RubyArray.java:1800

Also, Brakeman 4.6.1 depends on Ruby 2.3+, which means we can't test that on JRuby 1.7.27.

So I have numbers comparing 4.6.1 on JRuby 9.2.8 with 3.2.1 on JRuby 1.7.27:

JRuby 9.2.8 + Brakeman 4.6.1:

real	0m28.334s
user	1m24.947s
sys	0m2.825s

JRuby 1.7.27 + Brakeman 3.2.1:

real	0m50.628s
user	1m37.504s
sys	0m2.414s

I am not sure if this is a valid comparison, but if most of the critical/hot code paths are the same, JRuby 9.2.8 finally appears to be faster.

@headius
Copy link
Member

headius commented Aug 8, 2019

Ok I tried Brakeman 3.2.1 on MRI 2.5.3 and it fails the same way. That issue, at least, seems to be in Brakeman 3.2.1.

MRI 2.5.3 with Brakeman 4.6.1 is still faster than JRuby, but the ratio is better than the last time we updated this bug:

real	0m15.865s
user	0m15.081s
sys	0m0.518s

So with JRuby 9.2.8 at 28s and MRI at 15s, we're less than 2x slower. No combination has been that close.

@headius
Copy link
Member

headius commented Aug 8, 2019

I think we can safely close this, since it seems like JRuby's ratio compared to MRI has improved well beyond where it was with JRuby 1.7.

@presidentbeef If you would like to work with us in pursuing the MRI performance gap, I'd say open a new bug and we can start to break down the work necessary. It may come down to a critical method not getting jitted, as before, but we may be able to address it better now.

@headius headius closed this as completed Aug 8, 2019
@presidentbeef
Copy link
Author

The error is an incompatibility with sexp_processor 4.12.1. If you pin to I think 4.11 it should be okay.

But in any case - thank you for continuing to follow up on this issue! Always glad to see improvements in JRuby.

@presidentbeef
Copy link
Author

Just to round this out, I ran a couple longer scans (> 2 minutes) with Brakeman 4.6.1 using MRI 2.6.3 and JRuby 9.2.8.0. The timings were only a few seconds apart! Basically within margin of error. 👍

@headius
Copy link
Member

headius commented Aug 19, 2019

@presidentbeef thank you for following up! That's great to hear, since we have steadily been improving cold performance over the last year (and continue to do so!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants