-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected performance with GraphQL gem: JRuby indy < JRuby < MRI #4623
Comments
At a quick glance this suffers from the Ruby lexer generating a nearly 500 line Ruby method which once JITd will not actually compile down to native code. This is a known issue of JRuby but it is a difficult problem to solve hence no solution...yet. With that said this only accounts for 1/2 of the profile but since all things are dispatched from this method I am unsure what other issues arise from calling through a non-native method. |
For fun here is unJITd (-X-C) perf:
This is vs about 255 i/s with JIT enabled on my machine. |
Just looking at how this generates and I forgot that there is a lambda to capture a couple of vars: emit_token = ->(name) {
emit(name, ts, te, meta)
} This is not free either. |
I manually removed begin/ends (this we might be able to reduce cost of internally -- I will examine this) and then also consolidated a line I replaced emit_token with emit which then put it back over too big of a method but then replaced on case when with just returning the name so I had something like |
Crazy I just mentioned removing excess begin/end but I see we already only have a cost of extra line instrs. I guess that this method will have hundreds of them. Perhaps I should try and prune adjacent ones? |
Thanks @enebo for the great analysis! I also spent the day digging a bit into this code. The lexer itself is written in the So most of the code is ragel's output. One interesting detail is that the compiled version of the lexer in the project looks like it was generated with ragel 6, the "stable" ragel version, whereas ragel 7 has a few variations in the way it can output ruby code. Unfortunately, ragel 7 has also decided to drop support for ruby (amongst other languages) so to change it we'll need to fork version 7.0.0.9, the last version that supported ruby. The good news is that one of the code generation options (-T1, which they describe as "Flat table with expanded actions") generates code that is both faster on MRI:
and on JRuby
than the current version with the gem. (For completeness, it's also the highest performing version w/ invokedynamic, but it still doesn't compare to indy off).
Quick gist with this code: https://gist.github.com/ivoanjo/041cfdb4fb07457c40a12e77bc738b00 My plan is to validate this version a bit and then submit it as a PR to the GraphQL gem. Funnyly enough, it actually has an ever larger Other than that, you mentioned a few optimizations that can be manually done in the generated code. Do you think it is useful to go in an change this in ragel (or even just do manually on its output), or do you believe it would be easy to just change JRuby instead? |
@ivoanjo I think the only one I see which is not great for us is the lambda. I think we can eventually improve that perf on our side but indirecting through a closure is not a win for us (past the fact that it is less bytecode overall). I did just commit b6254e843c3d5a8be8e256601f58c88e7f3bdd66 which did not have quite as much impact in this case but removed a number of unneeded line number instrs. I can also see an extra copy instr per case but I am not sure both changes are enough to reduce the size. Obviously, this is not a general solution but just shaving stuff done here and there. I was wondering about ragel which indirected bodies through methods and set instance variables instead of local variables. That would likely be slower on MRI but on JRuby it could be faster since it would pretty much guarantee all things became native and the most common paths would likely inline. I think we do lose some optimizations like lvars becoming tempvars but compiling to native is nice :) Just some grist for the mill. In truth, what we really need is a generic solution in out JIT generation itself (the most popular idea is to make each BasicBlock of code its own synthetic method -- even though it means a level of extra indirection for temp vars). |
I think I can refactor out that lambda, I had a branch for that once but I didn't see a perf improvement on MRI, so I gave up. Let me give it a try over on that graphql-ruby PR! |
Thanks @enebo, it looks like you were right on the money 🎯. By using ragel 7 AND getting rid of the lambda, performance is now as expected: JRuby + indy > JRuby > MRI. I've opened a PR with those fixes rmosolgo/graphql-ruby#737 . |
The fix has been merged and released, so I believe the issue can be marked as closed. |
While investigating why query lexing was measurably slower in JRuby than with MRI (see jruby/jruby#4623 ) I noticed that GraphQL's lexer seemed to be generated with Ragel 6. I then noticed that this version was rather old and that a newer Ragel 7 seemed to have a lot of improvements on the Ruby output backend, even adding four different output options for generated lexers: * `-T0` Binary search (default) * `-T1` Binary search with expanded actions * `-F0` Flat table * `-F1` Flat table with expanded actions Testing all four of these combinations and comparing with the current GraphQL lexer on both MRI and JRuby showed that for a lexing-only benchmark (included below), Ragel 7's `-F1` improved MRI peformance by 87% and JRuby performance by 263%: * MRI before: ``` Running with ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux] Warming up -------------------------------------- Lexer 111.000 i/100ms Calculating ------------------------------------- Lexer 1.124k (± 3.9%) i/s - 33.744k in 30.069500s ``` * MRI after: ``` Running with ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux] Warming up -------------------------------------- Lexer 209.000 i/100ms Calculating ------------------------------------- Lexer 2.102k (± 3.0%) i/s - 63.118k in 30.052143s ``` * JRuby before: ``` Running with jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [linux-x86_64] Warming up -------------------------------------- Lexer 24.000 i/100ms Calculating ------------------------------------- Lexer 250.435 (± 4.8%) i/s - 7.512k in 30.080584s ``` * JRuby after: ``` Running with jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [linux-x86_64] Warming up -------------------------------------- Lexer 88.000 i/100ms Calculating ------------------------------------- Lexer 910.242 (± 3.0%) i/s - 27.280k in 29.999337s ``` Unfortunately, this does not yet bring JRuby performance to MRI's level, and I plan to continue looking at how to improve this. An important note is that the latest Ragel 7 release drops support for a lot of languages, unfortunately including Ruby (see https://www.colm.net/news/2016/07/31/ragel-target-langs.html for the announcement) so the last currently-available Ragel 7 version that can be used is 7.0.0.9, which was the version I used. Lexing benchmark used to obtain the numbers above: ```ruby require 'graphql' require 'benchmark/ips' # https://githubengineering.com/the-github-graphql-api/ input_string = "{ viewer { login starredRepositories { totalCount } repositories(first: 3) { edges { node { name stargazers { totalCount } forks { totalCount } watchers { totalCount } issues(states:[OPEN]) { totalCount } } } } } } " puts "Running with #{RUBY_DESCRIPTION}" Benchmark.ips do |benchmark| benchmark.time = 30 benchmark.warmup = 30 benchmark.report('Lexer') { GraphQL::Language::Lexer.tokenize(input_string) } benchmark.compare! end ```
As @enebo suggested in jruby/jruby#4623 there is a non-trivial performance impact of using the emit_token lambda instead of directly calling the emit method during lexing. By changing this to a normal method call performance in MRI is mostly not impacted, but this finally enables JRuby to optimize lexing code, putting it finally ahead and beyond MRI. * MRI before (e.g. after previous commit): ``` Running with ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux] Warming up -------------------------------------- Lexer 209.000 i/100ms Calculating ------------------------------------- Lexer 2.102k (± 3.0%) i/s - 63.118k in 30.052143s ``` * MRI after (negligible impact): ``` Running with ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux] Warming up -------------------------------------- Lexer 211.000 i/100ms Calculating ------------------------------------- Lexer 2.121k (± 3.3%) i/s - 63.722k in 30.080212s ``` * JRuby before (e.g. after previous commit): ``` Running with jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [linux-x86_64] Warming up -------------------------------------- Lexer 88.000 i/100ms Calculating ------------------------------------- Lexer 910.242 (± 3.0%) i/s - 27.280k in 29.999337s ``` * JRuby after: ``` Running with jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [linux-x86_64] Warming up -------------------------------------- Lexer 372.000 i/100ms Calculating ------------------------------------- Lexer 4.078k (± 5.0%) i/s - 122.016k in 30.008956s ``` * JRuby after (with -Xcompile.invokedynamic=true): ``` Running with jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +indy +jit [linux-x86_64] Warming up -------------------------------------- Lexer 800.000 i/100ms Calculating ------------------------------------- Lexer 9.649k (± 7.9%) i/s - 287.200k in 30.009197s ``` Finally, note that in the master branch, JRuby with -Xcompile.invokedynamic=true is the worst performing option (clocking in at about 141k i/s), so it went from being the slowest to being the fastest by far, with a speedup of about 68x.
Environment
Rubies:
jruby 9.1.9.0 (2.3.3) 2017-05-15 28aa830 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 [linux-x86_64]
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]
Gems:
graphql 1.15.13
Linux:
Linux maruchan 4.11.0-041100-generic #201705041534 SMP Thu May 4 19:36:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 17.04
Expected Behavior
Performance when running the GraphQL gem's query lexer on JRuby is 1/5th of MRI's performance, and 1/10th when using JRuby with invokedynamic. It's possible that the code is subptimal and there's just nothing that can be improved without changing it, but I find especially weird the part where it's even slower with invokedynamic and thus decided to open this issue.
This has a big impact on our system since every GraphQL web request goes through this app (and I've identified this issue as profiling showed that about half of request time with JRuby was spent lexing the GraphQL query).
I'm using here an example graphql query from Github's documentation, but the performance behavior is the same as with the queries I was using.
Benchmark:
Actual Behavior
The text was updated successfully, but these errors were encountered: