-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blowing Java stack can leave ObjectProxyCache deadlocked #3857
Comments
thanks, would it be possible to have a reproduction test-case (as I was asking for on IRC already). |
StackOverflowError is generally considered to be a fatal error, since it can do all sorts of nasty things to runtime state. In general we don't make any guarantees about the robustness of JRuby after a stack overflow. However, seems in this case we could at least try to unlock the lock, since without that we might have other threads stuck forever. |
Ahh, I see you mentioned that the SOE is raised in the finally. I'm not sure there's anything we can do about this. There are similar locks all over JRuby, and this is exactly why SOE is usually considered fatal. |
I guess this is a choice between using |
Monitor-based locking is also likely heavier than the ReentrantLock implementation. However I'm not sure it matters as much now that we only use ObjectProxyCache for objects that actually need idempotence. We might not be hitting it hard enough to matter anymore. I also wish the code in ObjectProxyCache was a bit more approachable. Switching this all to monitors would be an interesting job. |
We've encountered this live, so this is not a totally theoretical problem. However, in cases where we've blown the stack, there seems to be some odd things happening. We've had stack blown with just ~70 Ruby frames which, based on this estimate should take around 70 kB ( We have Thus, I don't think this should happen in normal usage. We have something odd going on either in Rails or in how JRuby reports the stack in back trace. |
sounds like this could use some detailed examination. knowing more about the problematic stack and in general about the app could help resolvingthe issue. |
May I ask how you are deploying? JRuby's launcher at startup bumps the default JVM thread stack size up to 2MB. If you are deploying in an embedded scenario or launching JRuby without one of our launchers (e.g. We bump it up because yes, JRuby does consume a fair bit of stack. We work to reduce this periodically, but having an interpreter means we'll always use more stack than other JVM languages. The answer to this bug is most likely one of the following:
I did ask around on Twitter and received pointers to OpenJDK9 JEP-270, which seeks to provide reserved stack space for operations like locking and unlocking, to help ensure they never blow the stack. However stack overflow is always going to be an issue, and as an asynchronous exception it's always going to be considered fatal. We're happy to help you investigate the actual stack overflow. That's the problem we should be chasing here. Open an issue for that, please :-) |
I did think of one possible way we could improve this that's kinda silly and probably very fragile: make sure that I will look at that for a moment. |
Such a fix would be fragile because we don't control what happens at the JDK level, and they may not balance these methods properly at some point. It's silly because it would work pretty well despite being fragile :-) |
Yeah no dice I'm afraid. The best we can hope is that the JDK implementation of this logic is balanced (it appears to be, but there's many different paths) and that the JVM JITs them so they use balanced amounts of stack (which may be unrealistic to ever expect). In any case I am back to thinking there's nothing we can do but look at the original SOE. Toss what you know in another issue and we'll look into it. |
Yeah, balancing the But still, is my arithmetic totally off on the used stack as my estimates are one magnitude smaller (~70 KB) than what amount of memory should be available. |
Your math isn't wrong, but the Ruby stack trace you see is likely
misleading you.
Normally we don't try to rethrow Java's StackOverflowError as a Ruby
SystemStackError because it requires catching SOE all over the place. There
are, I believe a few places that still do it. What is likely happening is
that you're getting an SOE fairly deep into Ruby, but it doesn't get
reported until much further up the stack.
Feel free to add more info about your env. Without getting a full Java
trace from the SOE it is hard to know how deep it actually was getting. I
can say that I was able to recurse a single-variable Ruby method about 1000
times with 2MB stack, and with a 5MB stack it shoots up to 10k recursion
(probably because more JIT starts kicking in.
|
Environment
All JRuby versions in Github
Expected Behavior
Running out of Java stack does not deadlock JRuby-Java interop
Actual Behavior
Needing a proxied object when there's a deep call stack, locks for ObjectProxyCache segments can stay unreleased as calling the
finally
parts throwStackOverflowError
.This leads eventual deadlocking of application that uses proxied objects.
Using monitor-based locking does not have this problem.
The text was updated successfully, but these errors were encountered: