-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM due to unbounded rescuePCs growth #4865
Comments
Wow this one is pretty odd. It does not happen without the second file??? I really expected this to be something where the next somehow is missing hitting a matching push to our pop but it looks like it works ok if called from a lexical parent but not if it is not. |
It doesn’t seem related to lexical scope, but to compiling. If I disable compilation, a second file isn’t needed:
|
@andrewdotn ah yeah we compile main file by default. Thanks for that update. |
I also see leaks with
and
I’ve been trying to make |
This seems like a basic VM-level issue in our IR interpreter. The logic as compiled into JVM bytecode works just fine, because the management of try/catch regions is handled properly by the JVM. Any branch out of the "try" area automatically leaves that area, while for us it must be done manually at each exit point. |
At first glance we create begin/end exc range which is great for building our CFG but then we try to leverage these as actual usable instrs in our startup interp, Their design is not really intended to pair up (e.g. start called then end called) as they are really for marking regions. This is really only a problem in these cases that are infinite. It is also mildly ironic that we do not compile methods like this because they never exit so it is sort of an unhappy combination. |
More notes. I tried pushing a new start exc only if it was different than top of stack (based on idea that lexically you could never raise to the same place twice) but this had issues with us prematurely popping too many elements. I cannot just ignore empty stack because I might have prematurely nuked a rescue ip we need. @subbuss suggested another hack would be for start/end to both record the label they pair with and make sure to only pop if the push label matches. |
Deleted my previous comment because I don't think that simplified logic is right ... but yes, this is a problem of retrofitting the startup interpreter on top of the IR meant for CFG construction. The hacky soln should work for now. Will let @enebo test it. If there is a cleaner non-hacky soln that we can think of, we can use it instead. |
The comment in this commit hopefully spells out how the solution works but one tl;dr is we cannot guarantee start and end exception region instructions will occur in the presence of branches or jumps. The workaround is to clean up the stack whenever we decide to add a new exception region. This had a small side-benefit of removing a hack we must have added when seeing a jump-like instr just doing a pop and hoping for the best.
@andrewdotn since you were digging in last week I take it you can test this and give it the gold seal of approval :) Looking at this problem in retrospect it seems obvious our solution was pretty lacking but we would only be able to observe this in an endless loop function which was not the main program file (which we compile by default). Even a huge finite loop I bet took a long time to OOM. |
@andrewdotn Can you provide a test or spec that exercises this? Probably as part of either spec/compiler/general_spec.rb or something in test/jruby. |
I don't think this is amenable to testing. Even with 32m heap it takes like 40 seconds to OOME |
@enebo We can write a test that checks the stack size, though. Maybe it should be in Java. |
@headius it is a local variable. |
@enebo Ahh I didn't notice it was not part of the interpreter's instance state. I tweaked the given script a bit to reduce allocation and eliminate backtrace generation, and it only takes about 5s to OOM with args = [Exception, "foo", []]
while true
begin
raise *args
rescue java::lang::OutOfMemoryError
exit 1
rescue Exception
next
end
end It's still a bit clunky for a test. |
Wow, thanks for the amazing turnaround time on this! I can confirm that your fix works. When I was debugging this, I added a check inside If I was adding a test for my own code here, and I wasn’t worried about performance, I’d add a |
…understand in original fix is that GEB and exception region for exceptions raised in ensures is that the all push the same label and nest. The fix is simple we now capture the instr itself since it is unique and continue using the pruning technique. The original solutions only mistake was not realizing we would nest regions to the same destination.
…understand in original fix is that GEB and exception region for exceptions raised in ensures is that the all push the same label and nest. The fix is simple we now capture the instr itself since it is unique and continue using the pruning technique. The original solutions only mistake was not realizing we would nest regions to the same destination.
Ok so I found another issue but my confidence is quite a bit higher now. The reason I see no value in testing the stack size for unbounded growth is the algorithm on insertion will look to see if it is already on the stack and if so potentially reduce the size of the stack (but only grow if not present). So there should never be unbounded growth from that alone. However, we only bother to add these stack elements when we traverse a lexical section of code. I am super confident we will never be more than n lexical nestings of stack size. |
Environment
Provide at least:
JRuby version (
jruby -v
) and command line (flags, JRUBY_OPTS, etc)Operating system and platform (e.g.
uname -a
)Observed on both CentOS 7 and Mac OS 10.13.1
Actual Behavior
The following short program, extracted from a much larger application, runs out of memory due to unbounded growth in
StartupInterpreterEngine.rescuePCs
.The text was updated successfully, but these errors were encountered: