-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New generation exhausted rapidly #4535
Comments
Wow, this is very peculiar! I would recommend hooking up a sampling profiler. You can do this with IntelliJ, and I believe also with NetBeans or VisualVM. It should not be too invasive but will give an idea of what's causing all this allocation. One possibility is that you've got some code somewhere that's forcing GC to run, so you might audit for that. Another is that some code somewhere is allocating huge objects, like very large arrays, which are consuming all of eden space and forcing another GC. Given the size of your new generation, we'd have to be doing a massive amount of allocation for force young GC to run this often. |
We do have some logic that can result in the allocation of large arrays of small objects. However, those did not appear to the cause of the issue. We log stuff when those run and I couldn't find any indication of them running. One possible idea I had was moving the new generation in the opposite direction, and manually setting the tenuring threshold to 1. Making the new generation smaller would result in the ParNew GC being faster. Setting the tenuring threshold down would result in a higher promotion rate into the Old Gen. Once in the old gen the concurrent garbage collectors can deal with stuff. I've measured garbage generation on both 1.7.22 & 9.1.8.0. Our application generates significantly more garbage on 1.7 in terms of megabytes per second. So in theory we should have much better performance on JRuby 9.1.8.0 if GC was our bottleneck. It really isn't when we run on 1.7.22 however. It's also really not until JRuby 9.1.8.0 goes into whatever crazy state that causes the eden space to exhaust so quickly. One observation is that I am seeing the class loader get hammered on each request to the Ruby on Rails app. This seems strange, especially since the classes are all https://gist.github.com/hydrogen18/b73ff5d79669805ed7feca595cd5877c Also I see lots of JIT activity as our application warms up. Suprisingly some gems go from ruby code -> JRuby IR -> Java Bytecode -> JIT pipeline -> native code just during the startup process. We're also JIT'ing endless variants of I used visualvm to profile our memory allocation. It seems to be dominated by what I would expect, |
Looking at our logs some more, it is possible that some of our cache population logic does align with the issues we saw. However, what I can't figure out is why this is well behaved on JRuby 1.7 but so problematic on JRuby 9.1.8.0 |
I was dreaming up some elaborate benchmarking schemes for this, but realized I can just run the code with GC logging enabled in both versions and compare. The code in question takes the an SQL query that returns a single column. The rows get put into a list in Redis so they can be paginated quickly. Obviously when we do this to ~100 rows its a non issue, but at ~1 million rows its a case where neither Rails nor Ruby excels. In any case this hasn't been problematic in 1.7. So in both JRuby versions I started our app, then called java.lang.system.gc, then flush our cache, then run the cache population logic. I visualized the GC log from doing this with both 1.7.22 & 9.1.8.0. As I previously noted, JRuby 9k seems to have much better performance from a memory standpoint. So I am leaning towards my GC issue being cause by buggy code, not by overaggressive allocation of memory. I will have to try and reproduce the problem with NetBeans attached. My one question is, is it possible that the JIT engine of JRuby could produce code that has memory performance that is inferior to the interpreted Ruby code? This could explain a sudden performance degradation in our production environment, that I can't measure in my development environment. |
I have been able to collect some additional interesting observations on this. We have numerous code paths that allocate many objects(100000s) on a request thread. The average request thread completes in 100ms. So typical object lifetime is something like 100ms. There are long lived objects that are part of the framework but I am assuming that those move to the old gen and stay there. In JRuby 1.7 we sometimes have those requests allocating 100000+ objects. This can cause a spike in the number of GC events and a spike in the GC pause. However, the application always recovers. Sometimes it can take 5 minutes, but it always recovers. In JRuby 9000 we see the same requests, the spike in GC frequency, and the spike in GC pause time. However, JRuby never really recovers until we restart. This is what bugs me, I can't explain why JRuby 9k acts differently than 1.7. |
@hydrogen18 I am seeing something similar when going from jruby 9.1.5.0 to jruby 9.1.8.0+. Do you see the same issue in jruby 9.1.7.0 or below? Just curious if it is something that was changed recently. |
Hello, as I've mentioned before we are trying to get our Ruby on Rails 4.1 app running on JRuby 9000.
When we initially ran on JRuby 9000 we found we had problems with new generation exhaustion. I set the new generation to 3000 megabytes with a survivor ratio of 5. This helped tremendously. Next I noticed promotion failures in the GC log, so I set the initiating occupancy fraction to 70%. This seems to have solved the problem with promotion failures.
The issue now is somehow we go from our new gen humming along nice with the ParNew GC cleaning it out almost entirely, to the ParNew GC running like mad. It still frees memory, but somehow the eden space seems to be exhausted again almost instantly. This does not happen on JRuby 1.7. Here is an example of normal running
Here we have seconds between allocation failures. The actual pause time is very low as well, so this indicates acceptable performance.
Here is what happens when it goes south:
Here we are seeing allocation failure and ParNew GC run multiple times per second. Performance degrades rapidly. Once it is in this state it never goes back to normal. Interestingly during this time period the old gen doesn't start filling up. So it is apparently able to actually free memory, somehow it is just getting exhausted almost instantly.
This doesn't happen on JRuby 1.7, but I don't know if this is a problem we've created in our app code or not. Is there a way I can profile object allocation while running? I would like to know what classes are being allocated and what sizes of objects are being allocated by JRuby. I can run a custom patched version of JRuby if needed. Any other suggestions for troubleshooting this would be appreciated.
Here is the gc log file: https://gist.github.com/hydrogen18/0ecf8aa94d7d5d73b25ff5a911d3466e
Environment
JRuby 9.1.8.0
JRUBY_OPTS:
OS
The text was updated successfully, but these errors were encountered: