Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to subclass management and method cache invalidation #4886

Merged
merged 2 commits into from
Jan 10, 2018

Conversation

headius
Copy link
Member

@headius headius commented Dec 7, 2017

In #4879, we got a report of heavy thread contention on the addSubclasses method of RubyClass. After investigating the locking involved (at @kares's recommendation), it seems as though much of this locking is excessively restrictive, using a runtime-global lock to prevent multiple threads from modifying any class's list of subclasses at the same time.

The subclasses list is also used heavily by method invalidation, and exploring that end of things showed me many places where we could improve performance.

  • The global synchronization may be unnecessary most of these places, so long as the collections being modified are themselves thread-safe.
    I have removed most of this but we need much more testing, review, analysis to confirm it's safe.
  • The subclass aggregation logic creates many intermediate lists, which can be reduced to a single list.
    I do not believe this code is used in any hot paths (it's mainly there for ObjectSpace) but I modified the logic to only create a single list for all subclasses and descendants in a given subhierarchy.
  • The invalidation logic must aggregate invalidators from all subclasses.
    It does this by creating a list of invalidators and adding every invalidator from that class on down, recursively. However, this list is usually all GenerationAndSwitchPointInvalidator, and the SwitchPoints are later reprocessed into another array for bulk invalidation. These collection could be cached, thrown out when any descendant is added or removed by calling up-hierarchy. This would reduce the cost of invalidation, but have a higher in-memory cost. It would also provide no benefits (but no detriments) running on a system that creates many classes.
    This work has yet to be done, and I have not investigated the actual cost of the invalidator gathering.

This is a high-risk change but if successful we could remove a
great deal of overhead at boot time and for libraries that spin
a lot of classes at runtime.

First in a series of optimizations for class hierarchy management
and method cache invalidation.
@headius
Copy link
Member Author

headius commented Dec 7, 2017

@kares @mkristian @lopex @enebo This is a work in progress, but I'll definitely need help reviewing and testing these changes. Anything you can suggest or do to help will be appreciated :-)

@ketan
Copy link
Member

ketan commented Dec 7, 2017

Perhaps I'm wrong — but looking at this PR, a lot of it would likely simplify if this.subclasses were initialized to an empty concurrent set.

@ketan
Copy link
Member

ketan commented Dec 7, 2017

Perhaps I'm wrong — but looking at this PR, a lot of it would likely simplify if this.subclasses were initialized to an empty concurrent set.

Ignore me. I think I understand why it can't be.

@headius
Copy link
Member Author

headius commented Dec 7, 2017

I'm not sure what you were thinking, but I guess it would be no better than initializing to null. And the reason we leave such fields blank is so there's no overhead for classes that never have subclasses. It's worth some thought though; elsewhere in the codebase we separate read and write usage so reads can be mostly noops when there's nothing to do.

@kares
Copy link
Member

kares commented Dec 7, 2017

LGTM

@GaneshSPatil
Copy link

Hi,

We managed to build our app with this PR, the current bottleneck seems to have been resolved, but we have discovered another similar bottleneck with another synchronized method, which is blocking about 278 threads and grinding the server to a halt as we speak.

We have attached the thread dumps at http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTcvMTIvOC8tLWZpbGVfMi0tNS00My0zODs7LS1maWxlXzItLTUtNDUtMTE= if you'd like to take a look.

qtp951880373-287636 - priority:5 - threadId:0x00007fc7a4150800 - nativeId:0x5a88 - state:RUNNABLE
stackTrace:
java.lang.Thread.State: RUNNABLE
at java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Native Method)
at java.lang.invoke.CallSite.setTargetNormal(CallSite.java:258)
at java.lang.invoke.MutableCallSite.setTarget(MutableCallSite.java:154)
at java.lang.invoke.SwitchPoint.invalidateAll(SwitchPoint.java:224)
at org.jruby.runtime.opto.SwitchPointInvalidator.invalidate(SwitchPointInvalidator.java:42)
- locked <0x00000006c9c95ef8> (a org.jruby.runtime.opto.ConstantInvalidator)
at org.jruby.runtime.opto.ConstantInvalidator.invalidate(ConstantInvalidator.java:20)
at org.jruby.RubyModule.invalidateConstantCache(RubyModule.java:1554)
at org.jruby.RubyModule.invalidateConstantCacheForModuleInclusion(RubyModule.java:1358)
at org.jruby.RubyModule.includeModule(RubyModule.java:889)
- locked <0x00000006996b50a8> (a org.jruby.MetaClass)
at org.jruby.RubyModule.extend_object(RubyModule.java:2581)
at org.jruby.RubyModule$INVOKER$i$1$0$extend_object.call(RubyModule$INVOKER$i$1$0$extend_object.gen)
at org.jruby.RubyClass.finvoke(RubyClass.java:908)
at org.jruby.runtime.Helpers.invoke(Helpers.java:421)
at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:363)
at org.jruby.RubyBasicObject.extend(RubyBasicObject.java:2634)
at org.jruby.RubyKernel.extend(RubyKernel.java:1977)
at org.jruby.RubyKernel$INVOKER$s$0$0$extend.call(RubyKernel$INVOKER$s$0$0$extend.gen)
at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:721)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:153)
at var.go.server.go_minus_server.go_minus_server_minus_17_dot_12_dot_0.work.jetty_minus_0_dot_0_dot_0_dot_0_minus_8153_minus_cruise_dot_war_minus__go_minus_any_minus_.webapp.WEB_minus_INF.rails_dot_new.vendor.bundle.jruby.$2_dot_3_dot_0.gems.representable_minus_2_dot_1_dot_7.lib.representable.invokeOther0:extend(/var/go/server/go-server/go-server-17.12.0/work/jetty-0.0.0.0-8153-cruise.war-_go-any-/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/representable-2.1.7/lib/representable.rb:105)

@kares
Copy link
Member

kares commented Dec 8, 2017

@GaneshSPatil that seems to be the extend explained in your original report at #4879
constant cache invalidation comes with a cost, however changes in this PR still make sense for others ...

p.s. me walking in your shoes ... would definitely be looking into refactoring/patching that extend piece

Copy link
Member

@mkristian mkristian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enebo
Copy link
Member

enebo commented Dec 8, 2017

LGTM but since 9.1.15.0 is out lets put this in today to give it a month to break something.

@headius
Copy link
Member Author

headius commented Dec 11, 2017

@GaneshSPatil This additional case appears to be due to the cost of walking all constants neededing to be invalidated and invalidating them individually. There will be one for every name, but the code (invalidateConstantCacheForModuleInclusion) here does a couple things wrong:

  • It walks all constants in all supermodules/superclasses of the incoming module, which will almost certainly invalidate some constants more than once.
  • It invalidates each name individually, which comes with locking and JVM JIT impact each time.
  • Because it walks all the way up to Kernel, it will invalidate all constants in the system every time, regardless of whether they're redefined or not.

I have a fix for the first two items that should reduce how much locking and JIT impact this has.

The third item is a bit trickier. We will need to experiment a bit to figure out which constants really need to be invalidated.

@headius
Copy link
Member Author

headius commented Dec 11, 2017

@GaneshSPatil Please retest.

@GaneshSPatil
Copy link

Hello @headius,
Apology for the delayed response.

We tested the changes with the same performance setup we have. We don't any blocked threads as of now, but the performance has degraded going from 1.7.26 to 9.1.15.

One of our APIs which does JSON serialization used to take 12s to 15s with JRuby v1.7.26, with the latest JRuby v9.1.15 (including this PR patch) takes around 45s to a minute to respond.
The timings are captured on an idle server. On a heavily loaded server, we expect the APIs to take more time to respond.

We'll investigate more into the performance degradation and will provide you an update.

@GaneshSPatil
Copy link

@headius -- We didn't see any blocked threads around addSubClass method with these changes. So we think this PR is good to merge which resolves the original issue #4879 .

We'll investigate more into the performance degradation and will provide you an update.

We have upgraded rails from 4.x.x to 5.x.x, and JRuby from 1.7.26 to 9.1.15. We'll investigate what exactly has caused a performance hit and provide you an update.

@headius
Copy link
Member Author

headius commented Dec 19, 2017

@GaneshSPatil Ok thank you. You may get some good info about the performance issue by using --sample flag to JRuby. This uses a JVM-provided sampling profiler which outputs results on exit. These results may show if there's an obvious bottleneck.

@enebo enebo merged commit 56db7a9 into jruby-9.1 Jan 10, 2018
@enebo
Copy link
Member

enebo commented Jan 10, 2018

whoops we planned on landing this much earlier but we should have some time left before 9.1.16 so it is in!

@enebo enebo added this to the JRuby 9.1.16.0 milestone Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants