-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oat gem slower at steady-state with invokedynamic than without it #4596
Comments
Yeah something's definitely wrong here. |
It looks like something's getting endlessly invalidated, which is very costly. Most places where we use indy should already be failing over to a less-optimized path rather than constantly invalidating, so I'll poke around a bit. |
Ahh I see. The indy logic in 9k does not take into consideration code that calls |
Ok, elaborating on this a bit more. In any call site (using indy) we wrap the cached method in a SwitchPoint from the original class. This acts as an "off" switch for the cache, flipped when the original class is modified. We lazily create that SwitchPoint index only when requested. In the case of However, the oat library is constantly generating new classes. This in itself is a very bad practice, since it also affects other Ruby implementations that cache methods based on the last-seen class. In JRuby, however, it has the additional wrinkle that because it's a new class every time, it's a new SwitchPoint every time. So we go through all the effort of setting up the class and the SwitchPoint only to throw it away and start again the next iteration. This is the main source of performance. It's marginally indy-related, since there are call sites and SwitchPoint and so on, but it's more a symptom of class churn in oat. The flaw in the JRuby logic is that if we keep failing to cache a method, because we're supposed to call through The simple fix will be to "fail" the call site after some number of I'm not sure what a more comprehensive fix would be. The classes entering your benchmark's |
Ok, so with a quick fix that immediately fails the
I am not familiar with this library. If this pattern you are benchmarking is typical of hot-path runtime execution, then the library author may want to reconsider creating so many singleton objects and doing I will put a more thoughtful fix into 9.1.9.0 for this. |
Fixes jruby#4596. JRuby currently does not cache anything for method_missing calls (i.e. calls to methods that do not exist on the target class's method table). Instead, after discovering the missing method, it proceeds down a slow lookup path, dispatching to method_missing with a symbolic method name. However this logic still attempts to acquire a SwitchPoint from the target class, since that SP must be acquired before the initial method lookup to ensure proper invalidation ordering. Normally, this isn't a problem. If the class is still relatively static, the SwitchPoint will be created once, and then the cost of calling method_missing is the same as for non-indy. However for type very rapidly. This led to a large number of unused SwitchPoint being created, one for each class. And since the method_missing path never cached anything, it never updated call site thresholds, resulting in call sites that would continue this logic forever. Normal calls, if they fail repeatedly, will eventually fail the entire site and revert to non-indy method dispatch (currently a simple monomorphic cache). This patch lifts the threshold-bumping out of the caching logic so it can also be used by method_missing dispatches. This allows repeated method_missing calls against different classes to also fail the site, reverting it to non-indy behavior. With this patch, the benchmark in jruby#4596 runs faster with indy than without, as expected.
I have pushed #4602 which contains a reasonable fix for this issue. I will again point out, however, that the pattern being benchmarked will never perform "well" due to the large number of singleton classes being created. |
Wow @headius thanks for the lightning fast answer and awesomely detailed analysis! I started looking at this as I was benchmarking a test application with invokedynamic enabled, and in that configuration the serialization of responses (this was an http json hal api) was spending a noticeable amount of time on this part of the code. With the invokedynamic off configuration the impact became rather small compared to other parts of request processing, but as I improve those parts I think I'll get to the point where this will become a lot more noticeable. I'll report this to the oat developers and perhaps I'll have some time to pick it up and try to fix it myself.
Just one question regarting to your comment: are the singleton objects (and the constant generation of new classes) a result of the calls to instance_eval? I don't see any other place which may be causing this on this test case. |
Environment
Rubies:
jruby 9.1.8.0 (2.3.1) 2017-03-06 90fc7ab Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [linux-x86_64]
Linux:
Linux maruchan 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Expected Behavior
Performance on the benchmark below consistently regresses when running JRuby with
-Xcompile.invokedynamic=true
.I was under the impression that after warmup, running JRuby with invokedynamic should never be slower than without it, but that does not seem to be the case on this benchmark.
Actual Behavior
FooSerializer 251.627k (±23.9%) i/s - 28.264M in 120.011879s
FooSerializer 33.785k (± 6.9%) i/s - 4.032M in 120.030513s
I've also tested with latest master and while the numbers go up slightly in either case, the general relation between the two results keeps the same.
The text was updated successfully, but these errors were encountered: