Unexpected memory usage when running with native.enabled=true #4651

ivoanjo · 2017-06-07T17:09:01Z

Environment

JRuby: jruby 9.1.10.0 (2.3.3) 2017-05-25 fffffff OpenJDK 64-Bit Server VM 25.121-b13 on 1.8.0_121-cedar14-b13 +indy +jit [linux-x86_64]
Java: Linux d85df954-171a-4c14-ae8f-60cfaf0e9aec 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Expected behavior

I'm trying to configure one of Talkdesk's internal web services that uses JRuby to fit on a 1GB Heroku 2x dyno.

My current JAVA_OPTS are:

-Xss300k -Xms450m -Xmx450m -XX:+PrintCodeCache -XX:+CrashOnOutOfMemoryError -XX:ReservedCodeCacheSize=200m -XX:InitialCodeCacheSize=200m -XX:+PrintGCDetails -XX:NativeMemoryTracking=detail -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics -XX:MaxMetaspaceSize=120m -XX:CompressedClassSpaceSize=32m

and JRUBY_OPTS:

-Xcompile.invokedynamic=true -Xmanagement.enabled=true

My expectation is that this would result in at most memory usage below or around 1gb of ram.

Actual behavior

The v253 label indicates a reboot of the system. A few minutes after a load test starts starts, system memory usage jumps abnormally, hitting more than 2gb of ram. After a few minutes, it lowers to around 1.5gb of ram and stays there.

After trying to check everything I could think of (malloc_arena_max, heap dumps, native method tracking, buffer pools, etc.) I ended up trying -Xnative.enabled=false, and the difference is noticeable -- in the graph below I toggle between enabled and disabled, and it's pretty visible: it starts disabled and stable at 1gb, then I enable it again and memory jumps up, then I disable again for a while and it goes down, and then I enable it again and it jumps up again.

The app itself is a rather simple web app using puma/grape/mongoid/mongo and it seems to work fine with either setting for native.enabled.

I've been trying to debug and create a simpler test case for this but I'm coming up short:

I've tried reproducing this on my local machine and the JVM RSS memory usage ends up at about 1.2gb, but never much more than that
I've tried the latest Oracle JVM on heroku, same behavior
I've tried with JRuby master (which includes the fix to FFI memory leak regression #4625) on heroku, same behavior
I've tried jemalloc but didn't get a lot of out of it (may also be my noobness in using it)

Can you help me debug this further?

The text was updated successfully, but these errors were encountered:

headius · 2017-06-09T17:03:52Z

So you can't reproduce this anywhere but on Heroku?

If you don't see a difference on the same app locally, then I would suspect there's something peculiar about the environment there and native library support.

I might expect that native=true would use some additional memory, since it will have a larger pool of native resources waiting to be collected, but this seems like too much.

Next step would be to acquire some information from the larger instance, like a full heap dump and OS-level memory stats, so we can see if there's something visible in the heap taking up this additional memory. I'm not sure how to do that on Heroku, so pinging @hone and @jkutner.

ivoanjo · 2017-06-10T10:53:58Z

So you can't reproduce this anywhere but on Heroku?

When I try doing it on my local (linux) machine the results are a bit more inconclusive. I usually see 100-200mb more from using native.enabled=true but never see the huge jump to 2gb temporarily and never see the ~1.6gb I get in heroku.

Next step would be to acquire some information from the larger instance, like a full heap dump and OS-level memory stats, so we can see if there's something visible in the heap taking up this additional memory. I'm not sure how to do that on Heroku, so pinging @hone and @jkutner.

I had previously opened a ticket with heroku and @jkutner was helping out on that side but we also didn't reach any conclusions, and so he suggested I reported what I saw here.

Since I can reproduce this at-will on heroku with a test workload on an isolated environment, I can share any dumps needed to help pin this issue down.

What is the best way to get a relevant heap dump for this issue? I'm assuming one taken with java debugging tools such as VisualVM will probably not be enough as this really does not seem to be related to the regular Java heap.

Similarly, which OS stats would be helpful? A regular snapshot of smaps from proc? Something else? (It's trivial to get these now that heroku allows ssh'ing into a dyno).

As an extra datapoint, I actually believe this issue may have existed for a while now.

At Talkdesk we've had a JRuby app in production for about 9 months and I had already attempted to battle JVM memory usage on our staging and qa environments, where randomly memory seemed to jump up and never go down, even though we were using every JVM tunable I knew of to tightly control memory.

But this happened more rarely, and since we run in production with larger dynos we were always able to make the JVM fit, and thus we never spent a lot of time looking into it.

For this new system the plan is to use several smaller dynos in production (2x dynos with 1gb of ram) and thus the issue became relevant again.

kares · 2017-06-12T08:35:25Z

@ivoanjo would generate dumps (on dyno) with/w-out native.enabled esp. on peek usage and compare ...
assuming the difference shows up in heap memory usage - which is not clear, maybe its all native memory.

ivoanjo · 2017-06-13T16:47:27Z

Sounds good, I'll try to see if there's something unexpected on the heap. I'll be off for a few days, but I'll get back with the info so we can get to the bottom of this asap :)

ivoanjo · 2017-07-07T17:25:47Z

Thanks @kares and @headius for your input. I've picked this investigation up again this week, and I made some breakthroughs.

As described above, I'm testing a puma/grape/mongoid/mongo web application with a load tester, fully loading JRuby and the test machine.

First, I finally was able to reproduce this on my machine, so it's not something heroku-specific.

Furthermore, this is clearly a heisenbug, and I think that's what threw me on the "it's the native code!" path. I have now been able to observe this with the native code disabled; it only becamer rarer, so I think we can scratch that one off.

After being able to reproduce the issue locally, I tried to generate dumps and look at all JVM memory segements I knew of, but no, dice, the extra memory does not show up anywhere other than the memory (RSS) consumption as measured by my OS (linux).

Then I'm not sure how I made the jump, but I tried to disable invokedynamic. And boom, I was no longer able to reproduce this anymore.

I wondered if getting extra pressure on invokedynamic would make it trigger more often, and so I tried running with -Xcompile.invokedynamic=true -Xcompile.mode=FORCE and indeed this makes it happen every time.

So my working theory is: perhaps this is an invokedynamic/methodhandle race/bug/leak? I searched around for it and it does seem to happen, from time to time. For instance the current stable Oracle JDK release still suffers from http://bugs.java.com/view_bug.do?bug_id=8162795 , although I've tried with the latest Java 8 beta (1.8.0_152-ea-b05) which no longer suffers from the testcase in that bug, but my issue still remains, so it's not that one (that would've been too easy, right?).

So the good news is, disabling invokedynamic is a bummer but rather OK workaround for our the affected apps at Talkdesk.

The bad news is, I wasn't very successful in trimming this down to a nice small testcase. And evidence seems to suggest only a moderate size codebase may be visibly affected by this.

Any suggestions on how we can further debug this? 😄

ivoanjo · 2017-09-28T20:33:21Z

I'm closing down this issue for now, as I've left Talkdesk and thus won't really be able to debug or try any more steps, so let's not leave it rotting on the bugtracker :)

Hopefully with increased use of invokedynamic, either this gets fixed (perhaps in JDK 9 it already is --- I did not get a chance to test), or eventually someone bumps into it again in an easier state.

headius · 2017-11-08T17:10:55Z

@ivoanjo Thanks for closing the loop here. We've improved indy support, including some memory use and potential leaks, since you originally reported this. Hopefully we managed to find the right fixes, or this has healed on its own.

ivoanjo closed this as completed Sep 28, 2017

headius added this to the Invalid or Duplicate milestone Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Unexpected memory usage when running with native.enabled=true #4651

Unexpected memory usage when running with native.enabled=true #4651

ivoanjo commented Jun 7, 2017 •

edited

Loading

headius commented Jun 9, 2017

ivoanjo commented Jun 10, 2017

kares commented Jun 12, 2017

ivoanjo commented Jun 13, 2017

ivoanjo commented Jul 7, 2017 •

edited

Loading

ivoanjo commented Sep 28, 2017

headius commented Nov 8, 2017

Unexpected memory usage when running with native.enabled=true #4651

Unexpected memory usage when running with native.enabled=true #4651

Comments

ivoanjo commented Jun 7, 2017 • edited Loading

Environment

Expected behavior

Actual behavior

headius commented Jun 9, 2017

ivoanjo commented Jun 10, 2017

kares commented Jun 12, 2017

ivoanjo commented Jun 13, 2017

ivoanjo commented Jul 7, 2017 • edited Loading

ivoanjo commented Sep 28, 2017

headius commented Nov 8, 2017

ivoanjo commented Jun 7, 2017 •

edited

Loading

ivoanjo commented Jul 7, 2017 •

edited

Loading