-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected memory usage when running with native.enabled=true #4651
Comments
So you can't reproduce this anywhere but on Heroku? If you don't see a difference on the same app locally, then I would suspect there's something peculiar about the environment there and native library support. I might expect that native=true would use some additional memory, since it will have a larger pool of native resources waiting to be collected, but this seems like too much. Next step would be to acquire some information from the larger instance, like a full heap dump and OS-level memory stats, so we can see if there's something visible in the heap taking up this additional memory. I'm not sure how to do that on Heroku, so pinging @hone and @jkutner. |
When I try doing it on my local (linux) machine the results are a bit more inconclusive. I usually see 100-200mb more from using
I had previously opened a ticket with heroku and @jkutner was helping out on that side but we also didn't reach any conclusions, and so he suggested I reported what I saw here. Since I can reproduce this at-will on heroku with a test workload on an isolated environment, I can share any dumps needed to help pin this issue down. What is the best way to get a relevant heap dump for this issue? I'm assuming one taken with java debugging tools such as VisualVM will probably not be enough as this really does not seem to be related to the regular Java heap. Similarly, which OS stats would be helpful? A regular snapshot of As an extra datapoint, I actually believe this issue may have existed for a while now. At Talkdesk we've had a JRuby app in production for about 9 months and I had already attempted to battle JVM memory usage on our staging and qa environments, where randomly memory seemed to jump up and never go down, even though we were using every JVM tunable I knew of to tightly control memory. But this happened more rarely, and since we run in production with larger dynos we were always able to make the JVM fit, and thus we never spent a lot of time looking into it. For this new system the plan is to use several smaller dynos in production (2x dynos with 1gb of ram) and thus the issue became relevant again. |
@ivoanjo would generate dumps (on dyno) with/w-out native.enabled esp. on peek usage and compare ... |
Sounds good, I'll try to see if there's something unexpected on the heap. I'll be off for a few days, but I'll get back with the info so we can get to the bottom of this asap :) |
Thanks @kares and @headius for your input. I've picked this investigation up again this week, and I made some breakthroughs. As described above, I'm testing a First, I finally was able to reproduce this on my machine, so it's not something heroku-specific. Furthermore, this is clearly a heisenbug, and I think that's what threw me on the "it's the native code!" path. I have now been able to observe this with the native code disabled; it only becamer rarer, so I think we can scratch that one off. After being able to reproduce the issue locally, I tried to generate dumps and look at all JVM memory segements I knew of, but no, dice, the extra memory does not show up anywhere other than the memory (RSS) consumption as measured by my OS (linux). Then I'm not sure how I made the jump, but I tried to disable invokedynamic. And boom, I was no longer able to reproduce this anymore. I wondered if getting extra pressure on invokedynamic would make it trigger more often, and so I tried running with So my working theory is: perhaps this is an invokedynamic/methodhandle race/bug/leak? I searched around for it and it does seem to happen, from time to time. For instance the current stable Oracle JDK release still suffers from http://bugs.java.com/view_bug.do?bug_id=8162795 , although I've tried with the latest Java 8 beta (1.8.0_152-ea-b05) which no longer suffers from the testcase in that bug, but my issue still remains, so it's not that one (that would've been too easy, right?). So the good news is, disabling invokedynamic is a bummer but rather OK workaround for our the affected apps at Talkdesk. The bad news is, I wasn't very successful in trimming this down to a nice small testcase. And evidence seems to suggest only a moderate size codebase may be visibly affected by this. Any suggestions on how we can further debug this? 😄 |
I'm closing down this issue for now, as I've left Talkdesk and thus won't really be able to debug or try any more steps, so let's not leave it rotting on the bugtracker :) Hopefully with increased use of invokedynamic, either this gets fixed (perhaps in JDK 9 it already is --- I did not get a chance to test), or eventually someone bumps into it again in an easier state. |
@ivoanjo Thanks for closing the loop here. We've improved indy support, including some memory use and potential leaks, since you originally reported this. Hopefully we managed to find the right fixes, or this has healed on its own. |
Environment
jruby 9.1.10.0 (2.3.3) 2017-05-25 fffffff OpenJDK 64-Bit Server VM 25.121-b13 on 1.8.0_121-cedar14-b13 +indy +jit [linux-x86_64]
Linux d85df954-171a-4c14-ae8f-60cfaf0e9aec 3.13.0-112-generic #159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Expected behavior
I'm trying to configure one of Talkdesk's internal web services that uses JRuby to fit on a 1GB Heroku 2x dyno.
My current
JAVA_OPTS
are:-Xss300k -Xms450m -Xmx450m -XX:+PrintCodeCache -XX:+CrashOnOutOfMemoryError -XX:ReservedCodeCacheSize=200m -XX:InitialCodeCacheSize=200m -XX:+PrintGCDetails -XX:NativeMemoryTracking=detail -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics -XX:MaxMetaspaceSize=120m -XX:CompressedClassSpaceSize=32m
and
JRUBY_OPTS
:-Xcompile.invokedynamic=true -Xmanagement.enabled=true
My expectation is that this would result in at most memory usage below or around 1gb of ram.
Actual behavior
The v253 label indicates a reboot of the system. A few minutes after a load test starts starts, system memory usage jumps abnormally, hitting more than 2gb of ram. After a few minutes, it lowers to around 1.5gb of ram and stays there.
After trying to check everything I could think of (malloc_arena_max, heap dumps, native method tracking, buffer pools, etc.) I ended up trying
-Xnative.enabled=false
, and the difference is noticeable -- in the graph below I toggle between enabled and disabled, and it's pretty visible: it starts disabled and stable at 1gb, then I enable it again and memory jumps up, then I disable again for a while and it goes down, and then I enable it again and it jumps up again.The app itself is a rather simple web app using
puma
/grape
/mongoid
/mongo
and it seems to work fine with either setting fornative.enabled
.I've been trying to debug and create a simpler test case for this but I'm coming up short:
jemalloc
but didn't get a lot of out of it (may also be my noobness in using it)Can you help me debug this further?
The text was updated successfully, but these errors were encountered: