Metaspace fills with java.lang.invoke.LambdaForms$ over time #4391

hydrogen18 · 2016-12-15T21:34:43Z

Environment

JRuby version: 9.1.60

java -version:
openjdk version "1.8.0_91" OpenJDK Runtime Environment (build 1.8.0_91-b14) OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

OS: Linux community-app2.onr.spiceworks.com 2.6.32-573.18.1.el6.x86_64 #1 SMP Tue Feb 9 22:46:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

JRUBY_OPTS=-J-XX:+UseCompressedOops -J-XX:-TieredCompilation -J-Xms6144m -J-Xmx6144m -J-XX:MaxMetaspaceSize=768m -J-XX:ReservedCodeCacheSize=512m -J-XX:InitialCodeCacheSize=256m -J-Djava.net.preferIPv4Stack=true -J-Dsun.rmi.dgc.server.gcInterval=3600000 -J-Djava.awt.headless=true -J-XX:+PerfDisableSharedMem -J-XX:+UseConcMarkSweepGC -J-XX:+CMSParallelRemarkEnabled -J-XX:+CMSClassUnloadingEnabled -J-XX:+UseParNewGC -J-XX:+CMSConcurrentMTEnabled -J-XX:ConcGCThreads=4 -J-XX:NewSize=2000m -J-XX:MaxNewSize=2000m -J-XX:-UseAdaptiveSizePolicy -J-XX:SurvivorRatio=5

Running a rails 4.1 application.

The application starts and works correctly. Over time the metaspace in use always goes up at a constant rate. Eventually performance starts to suffer and full GC seems to start taking place, degrading performance. A heap dump at this point reveals many 1000s of classes defined with a name of java.lang.invoke.LambdaForms$ the part after the dollar sign is DMH or BMH. The rest of the heap is typical.

This same application works without issue on 1.7.22.

My understanding is that this class is somehow related to java.lang.invoke.MethodHandle, with the classes actually being an anonymous class. Looking at the code, the prevalent usage of this is under the org.jruby.ir.targets. This has been extensively refactored since 1.7.22 it appears.

Please let me know what additional troubleshooting steps I may perform to try and troubleshoot the cause of this issue. I cannot provide the heap dump but could provide further statistics from it.

The text was updated successfully, but these errors were encountered:

headius · 2016-12-19T17:15:36Z

First confirm for me whether or not you are enabling invokedynamic. 9k still uses it for a few things even when off, which would narrow things down a bit.

I'm not sure if "1000s" is a lot or not. Every chain of LambdaForm used internally by method handles could be a class eventually, and given that we use it for constants, globals, and a few other things it is easy to see how thousands of such sites might exist in a Rails app.

I would suggest a few things:

Disable invokedynamic if you have enabled it.
Try most recent Java 8. Yours is pretty recent, though.
Take a heap dump so we can see what is holding references to those LambdaForm and figure out if it is correct.

hydrogen18 · 2016-12-19T21:15:53Z

Thanks for the feedback.

We have invokedynamic off, it is too problematic.

I will consider updating to a newer OpenJDK version. Is there a difference between OpenJDK & Oracle here?

I have the heap dump, I will do more investigation and try and determine what has references to those classes.

hydrogen18 · 2016-12-21T21:30:05Z

I can open the Heap Dump with Eclipse MAT. The issue is from what I can tell, all the heap dump tools are designed to show you the things holding references to the instances of a class. Each of these classes has zero instances. But something can still hold an instance to a class, which I suspect is happening. On JRuby 1.7.22 (same app) I can see the class unloaded count go up when the metaspace size go backs down. This is the expected behavior with CMS class unloading enabled. This never happens on 9k.

I ended up parsing the Hprof output myself and running it against a heap dump from a trivial irb session on JRuby 9.1.6.0. I observe the same thing, no shortage of class definitions starting with the name java.lang.invoke.LambdaForm. The heap dump has a class loader ID of zero for each class, which I guess makes sense if the classes are dynamically defined. There are no instances, but I can find object instances that reference the classes. Each of those instances has a class type of java.lang.invoke.MemberName.

Do I just need to produce some sort of tree structure tracing those references back to an object in the org.jruby namespace? Should I expect to trace it back to a java thread or some global value?

headius · 2017-01-06T17:29:14Z

OracleJDK and OpenJDK are 99% the same thing.

LambdaForm classes should still be heap objects of type java.lang.Class, so that's how you'd track their reference chain. If they are not loaded into a normal classloader (I believe this is the case; they use an "AnonymousClassloader") the problem may actually be a OpenJDK bug. This is especially likely if you're seeing large numbers of LambdaForm classes with no instances. JRuby itself never even touches LambdaForm objects directly; we would hold references to MethodHandle objects, which themselves should hold a LambdaForm instance. There's no way to call a method against a Class reference.

Is there sensitive information in your heap dump, or would be you able to share it with us and with OpenJDK engineers? It's starting to sound like it's not our bug.

hydrogen18 · 2017-01-06T20:05:11Z

Philosophically, I agree that the class object should be an instance of java.lang.Class. However, I've found no such tool that can show me anything interesting about a java.lang.Class or search for instances of java.lang.Class that define a certain class.

Inspecting any java heap dump with jvisualvm shows me only 9 instances of java.lang.Class. These correspond to the primitive types in the Java language.

So I'm not sure that is helpful.

headius · 2017-01-06T23:16:32Z

Ok, I tossed an email to some JVM friends, and they don't know why you'd see LF class counts this high. One of them thought perhaps there's a lot of caching happening and not enough GC pressure to force the soft links to clear. You might try passing -XX:SoftRefLRUPolicyMSPerMB=0 to the JVM to see if that helps clear these out.

You do mention that the metaspace expands enough that full GCs start happening so I wouldn't be surprised if that flag doesn't help. It would be very helpful if I could look at that heap, even if you just host a jhat instance I can poke around in for a while.

hydrogen18 · 2017-01-06T23:19:06Z

I will put that on my list of options to try next time we run the application in production.

headius · 2017-01-06T23:42:09Z

Thanks for your help with this. Don't worry, if there's a problem we'll find it :-)

hydrogen18 · 2017-01-10T19:17:12Z

We have options configured on the JVM to force hourly full GCs, additionally the JVM is literally out of metaspace so full GCs wind up happening frequently.

I've had a chance to do some more investigation into this. Since I've established there are no instances of the problematic classes, I've done the following

Found all the instances that are the class definitions
Found all the instances that references those class definitions
Repeated step 2 for each set of objects identified
Never search for references to the same instance twice
If JRuby class instances are included in the iteration, only search for those in the next iteration
Stop searching when all you can find are java finalizer instances.

Obviously there is more to it with various optimizations to permit this to run in minutes on a multi-gigabyte heap dump rather than days.

The output of this produces a tree that traces back to JRuby objects that are thread contexts as far as I can tell. The tree is unfortunately too large to simply look at and try and look at without additional processing. So somehow the class definitions are still referenced. Notably, I found in the middle of this tree references to a JRuby class ConstantEntry, lending support to your prior notion that Ruby constants are somehow involved in this.

I did conclude that the only problematic class is java.lang.invoke.LambdaForm$DMH with over 20000 class definitions in my application. This apparently is an anonymous inner class of LambdaForm that is a Direct Method handle. I found a bunch of documentation about what a DMH is and why you would want one here: https://wiki.openjdk.java.net/display/HotSpot/Direct+method+handles . Without spending too much time trying to understand the nature of this thing, my guess is that it permits the virtual lookup performed when calling a method on a java object instance to be performed once and the cost to be reduced or eliminated in future calls. So it seems that usage of this class is a very obvious optimization for JRuby.

I am leaning towards this being an OpenJDK issue, because for the first 3 levels of the tree I produced consists of class instances that are entirely internals of the JVM. In other words it appears that JRuby does not directly interact with this class ever. But I feel like it could still be caused by the breadth of our application. Is there a Java method inside of JRuby that can be called to flush the optimized code paths? This could be called with a long period to possibly prevent accumulation of these class definitions. I think the JVM has a similar concept in relation to its JIT engine that allows the flushing of JIT'd code.

The first JRuby class I find in the tree is org/jruby/internal/runtime/methods/CompiledIRMethod with a field of specific. I can't figure out what IR stands for in this class, do you know?

Are there further options in JRuby I could alter to troubleshoot this? If we think this is an OpenJDK problem, do we suspect the same problem exists in Oracle's JDK?

headius · 2017-01-10T21:40:23Z

Thank you for the very thorough investigation!

I am also leaning toward this being a bug in OpenJDK, though I won't rule out something being amiss in JRuby. I'd actually prefer the latter, since I can fix JRuby :-)

The first JRuby class I find in the tree is org/jruby/internal/runtime/methods/CompiledIRMethod with a field of specific. I can't figure out what IR stands for in this class, do you know?

CompiledIRMethod is basically what gets stored in a class's method table when we JIT a method defined in Ruby. You'll probably see a MixedModeIRMethod above that, which triggers that JIT to happen after 50 calls. Most code in a typical application will never JIT because it's never called (or not called enough to matter). Only code that JITs will produce a CompiledIRMethod (unless you're running in a mode that forces JIT to run earlier or immediately) which for a recent Rails app we investigated was only a couple thousand methods.

IR refers to JRuby 9k's compiler, which has its own Intermediate Representation.

DMH handles are basically the last stop before the target method. They're the direct function pointer. So in theory you should only ever see 2 * number of jitted methods in JRuby (2x because many methods will have a variable-arity and a specific-arity entry point; that's the "specific" you saw). So again, unless your system is huge AND all those methods are getting hit, I would not expect tens of thousands of DMH.

Are there further options in JRuby I could alter to troubleshoot this?

Well, you could try disabling JIT altogether. That would reduce our use of invokedynamic to almost nothing. But that's not really what you want, since it will run many times slower without JIT. You're also not running with invokedynamic "on" in JRuby, so we can't do any more there. JRuby will still use method handles for some types of variables, frame construction, and constants, but these should all resolve to a handful of back-end method handles that rarely change.

So I guess I'm not sure.

If we think this is an OpenJDK problem, do we suspect the same problem exists in Oracle's JDK?

My OpenJDK friends would like to know more about this. It has been suggested that this could be "method handle caching gone rogue" (unexplored) or that soft references are keeping too much alive (unlikely since you have tried full GCs). Outside that, we'd need to be able to provide them some way to reproduce, which may simply be providing them a heap dump or may require runnable code.

I suggest you join the MLVM mailing list here: http://openjdk.java.net/projects/mlvm/

Jump on my thread about this issue and share whatever you can. The folks listening are JVM guys at Oracle who work on exactly this stuff.

We'll leave this open for now.

hydrogen18 · 2017-01-19T14:21:44Z

We ran with -XX:SoftRefLRUPolicyMSPerMB=0 as you suggested and saw no change. The metaspace still fills up and stays full.

hydrogen18 · 2017-01-23T22:04:02Z

I'm still looking into this. Does it make sense that each use of string interpolation could emit one of these? I see lots of org/jruby/ast/EvStrNode being the root instance ultimately holding a reference to the DMH classes.

enebo · 2017-01-23T22:32:08Z

@hydrogen18 we don't use EvStrNode for execution in JRuby 9k. We translate to our own bytecode and then interpret or JIT that generated code. We will have ast nodes in memory in 9k but only because they have never been executed (we lazily generate IR on demand).

headius · 2017-01-24T02:46:05Z

@hydrogen18 @enebo Yeah that is surprising. I'll look around the area.

headius · 2017-01-24T02:50:39Z

I don't see anything special about EvStr or how it is used. It does not reference any method handles directly, but perhaps something downstream from it does.

If it's at all possible for us to poke around a heap dump we might be able to find this more quickly. I'm willing to SSH or VPN to your server if you'd rather keep the data locally (a heap dump could contain sensitive information from your app).

Alternatively if you can somehow provide examples of root reference chains that are keeping DMH alive, we could at least see a path.

hydrogen18 · 2017-01-24T03:54:25Z

@headius I'm pretty close to being able to generate those chains. I need to parse the thread stacks and track down a few bugs I can't explain.

headius · 2017-01-24T04:01:38Z

Ok, we'll be standing by then.

hydrogen18 · 2017-01-25T16:26:11Z

I got permission to share the heap dumps with you @headius for analysis purposes.

What email can I send you the details at? You can contact me at ericu@spiceworks.com

headius · 2017-01-25T22:56:31Z

@hydrogen18 I have the dumps and will inspect them tonight or tomorrow.

headius · 2017-01-26T17:16:46Z

Tell me, do you make a lot of singleton classes in your application?

What I'm seeing so far are a large number of DMH being held by MetaClass objects (our singleton class representation) for the "idTest" used to type-check a target module when caching a constant retrieved from it. Or at least, that's what it would be used for if that code were live. Instead, the handle is never used and we create one for every RubyModule, which includes all MetaClass.

I am modifying the code to make idTest lazy but since it is not actually used by any live code all those instances should just disappear. In case we decide to use it again (it might be useful for other checks) I also modified it to reuse the same DirectMethodHandle rather than a new one for each RubyModule instance.

I'll commit this and hopefully you will be able to test it. Investigation continues but these references seem to dominate the list of method handles in heap.

The handle here was originally intended for use in checking that the target module for a constant cache is the same as when the value was cached. However the code that used it is no longer active, but we still created these handles for every module, class, singleton class, included module, etc in the system. This commit makes the acquisition of this handle lazy, and also modifies it to reuse the direct handle reference to the test method rather than recreating that every time. This should reduce the number of method handles in flight. For #4391.

hydrogen18 · 2017-01-26T17:25:39Z

When you say singleton class, do you mean Ruby's Singleton or something else?

headius · 2017-01-26T17:41:54Z

I mean lots of class << some_object or some_object.extend some_module and so on. Anything that creates a custom class for an individual object.

Also candidates: lots of anonymous modules or classes.

Is it easy for you to test the change I made?

hydrogen18 · 2017-01-26T17:50:02Z

As far as using class << some_object, or some_object.extend yes to all of the above. Virtually every language feature is present in this codebase and used in every possible manner. Particularly egregious is what I have called the dependency injected controller pattern, where a before_filter on the controller extends the instance with the required behavior for the action. This results in every single request calling extend on an instance of a Rails controller.

We also call Class.new to define a bunch of classes based off values retrieved from our DB, but that is a one and done thing mostly.

The only thing I have not found widespread usage of is Ruby's throw/catch.

In any case, testing the change is not particularly easy but I should be able to figure out a way to do it. Previously someone linked me to a complete one-off build of JRuby that I basically just dropped over my existing one. Can we do that?

headius · 2017-01-26T17:58:08Z

Ugh, ok, so this is a very good candidate fix then. The dependency-injection you speak of is a particularly bad case, since it creates thousands of one-off throw-away classes that are each quite heavy (and in this case, referring to other things that are heavy).

Eliminating this handle by making it lazy should also reduce the cost of those classes, at least a little bit.

It should be possible to take a JRuby build from master and just drop lib/jruby.jar into any other installation of JRuby. Depending on how you're packaging it, the jruby-jars gem might need to be monkeyed with as well.

If you're able to build master (need maven 3.1+, then clone, mvn package -Pbootstrap and you should have a build...see BUILDING) and swap the jars, you should be able to see if it has helped. Let me know if you need more help with that or stop by #jruby on Freenode IRC.

hydrogen18 · 2017-01-26T18:01:58Z

Also, thank you and the rest of the developers for taking time to investigate this particular issue. This is one of our blocking issues on the path to getting on a more modern Ruby implementation.

headius · 2017-01-26T18:14:02Z

@hydrogen18 No problem! This is a good find, especially since it doesn't appear that we are using this field at all. I'm auditing that now but may simply remove the field until we actually need it.

hydrogen18 · 2017-01-27T21:51:44Z

I checked out the commit you mentioned here. It did not work, we got Null Pointer Exceptions from the IR classes.

I checked out the 9.1.7.0 tag and cherry picked your commit onto that, then built that. It appears to be working. We hope to try this out Monday.

headius · 2017-01-30T04:50:40Z

@hydrogen18 Ok that's good and bad to hear :-) If you haven't already, please file bugs for the NullPointerException. Master should always work, even when things are in flux as they are now.

Here's hoping the fix sticks! 🤞

hydrogen18 · 2017-02-01T20:09:08Z

I've got the patched build in prod presently. The performance seems a bit better. The class count is continuing to increase (which I expect). It should hit a large enough value for GC to start unloading old classes by tomorrow morning.

headius · 2017-02-02T04:54:50Z

@hydrogen18 Ok, thanks for the update. Hopefully when GC fires it cleans everything up that it's supposed to.

hydrogen18 · 2017-02-02T17:10:59Z

We ran our app for approx. 24 hrs. It did not max out its metaspace this time.

From the heap dump, I see the following anomalies

461 classes named $_28_eval_29_, no instances
20,330 classes named java.lang.invoke.LambaForm$DMH, no instances
534 classes matching org.jruby.parser.RubyParser$.*, each with 1 instance
1554 classes matching org.jruby.Ruby.*$INVOKER$.*, each with < 10 instances

It does seem to be a bit better behaved since it did not grow to the maximum metaspace size. If you think inspecting the heap dumps again would be beneficial let me know.

There are also a huge number of classes that correspond to RubyJIT method, but I expect that given our configuration.

headius · 2017-02-23T15:21:50Z

So we still have a lot of LambdaForm classes floating around. If you want to pull another couple heap dumps, I'd like to take a look at it. I'm glad we've resolved your base issue, at least for now.

If there's no instances of the eval classes, they shouldn't survive a full GC (or two). If they do that's something worth looking into.

The RubyParser classes are known; it's a very large parser and each production gets its own little function class to avoid the base code being too big for the JVM to load. 😆

The invoker classes are also expected; they're used to bind the Java-based core class methods.

hydrogen18 · 2017-02-23T17:06:03Z

Thanks,

I'll contact you via email with the heap dumps

headius · 2017-03-01T19:14:02Z

Ok, I'm looking at app4 now and I see the 20k of various LambdaForm classes. But the retained heap is under 5MB, so I'm not too worried about them. I surveyed the first few thousand instances and they all appear to be from jitted methods, which is to be expected since that's how we bind jitted methods back into our class structure.

Given that we've drastically reduced the number of handles and LambdaForm floating around the system, and your situation seems to be resolved, I'm going to call this one fixed.

BTW, there was a small additional fix to my fix that should reduce duplicate handles further: 24a14e4

Let us know if you have further metaspace problems.

headius added this to the JRuby 9.1.8.0 milestone Feb 23, 2017

headius added invokedynamic JRuby 9000 labels Feb 23, 2017

headius closed this as completed Mar 1, 2017

abrandoned mentioned this issue Jan 13, 2019

add benchmark-ips and remove the need to write a new method on an obj… liveh2o/protobuf-activerecord#39

Merged

Metaspace fills with java.lang.invoke.LambdaForms$ over time #4391

Metaspace fills with java.lang.invoke.LambdaForms$ over time #4391

Comments

hydrogen18 commented Dec 15, 2016

Environment

headius commented Dec 19, 2016

hydrogen18 commented Dec 19, 2016

hydrogen18 commented Dec 21, 2016

headius commented Jan 6, 2017

hydrogen18 commented Jan 6, 2017

headius commented Jan 6, 2017

hydrogen18 commented Jan 6, 2017

headius commented Jan 6, 2017

hydrogen18 commented Jan 10, 2017 • edited Loading

headius commented Jan 10, 2017

hydrogen18 commented Jan 19, 2017

hydrogen18 commented Jan 23, 2017

enebo commented Jan 23, 2017

headius commented Jan 24, 2017

headius commented Jan 24, 2017

hydrogen18 commented Jan 24, 2017

headius commented Jan 24, 2017

hydrogen18 commented Jan 25, 2017

headius commented Jan 25, 2017

headius commented Jan 26, 2017

hydrogen18 commented Jan 26, 2017

headius commented Jan 26, 2017

hydrogen18 commented Jan 26, 2017

headius commented Jan 26, 2017

hydrogen18 commented Jan 26, 2017

headius commented Jan 26, 2017

hydrogen18 commented Jan 27, 2017

headius commented Jan 30, 2017

hydrogen18 commented Feb 1, 2017

headius commented Feb 2, 2017

hydrogen18 commented Feb 2, 2017 • edited Loading

headius commented Feb 23, 2017

hydrogen18 commented Feb 23, 2017

headius commented Mar 1, 2017

hydrogen18 commented Jan 10, 2017 •

edited

Loading

hydrogen18 commented Feb 2, 2017 •

edited

Loading