Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blowing Java stack can leave ObjectProxyCache deadlocked #3857

Closed
jmiettinen opened this issue May 6, 2016 · 13 comments
Closed

Blowing Java stack can leave ObjectProxyCache deadlocked #3857

jmiettinen opened this issue May 6, 2016 · 13 comments

Comments

@jmiettinen
Copy link
Contributor

Environment

All JRuby versions in Github

Expected Behavior

Running out of Java stack does not deadlock JRuby-Java interop

Actual Behavior

Needing a proxied object when there's a deep call stack, locks for ObjectProxyCache segments can stay unreleased as calling the finally parts throw StackOverflowError.

This leads eventual deadlocking of application that uses proxied objects.

Using monitor-based locking does not have this problem.

@kares
Copy link
Member

kares commented May 6, 2016

thanks, would it be possible to have a reproduction test-case (as I was asking for on IRC already).

@headius
Copy link
Member

headius commented May 9, 2016

StackOverflowError is generally considered to be a fatal error, since it can do all sorts of nasty things to runtime state. In general we don't make any guarantees about the robustness of JRuby after a stack overflow.

However, seems in this case we could at least try to unlock the lock, since without that we might have other threads stuck forever.

@headius
Copy link
Member

headius commented May 9, 2016

Ahh, I see you mentioned that the SOE is raised in the finally. I'm not sure there's anything we can do about this. There are similar locks all over JRuby, and this is exactly why SOE is usually considered fatal.

@jmiettinen
Copy link
Contributor Author

I guess this is a choice between using ReentrantLocks and monitor-based locking. The latter are not suspectible to this, but the runtime might still be broken by other stuff not being run in finally-blocks.

@headius
Copy link
Member

headius commented May 10, 2016

Monitor-based locking is also likely heavier than the ReentrantLock implementation. However I'm not sure it matters as much now that we only use ObjectProxyCache for objects that actually need idempotence. We might not be hitting it hard enough to matter anymore.

I also wish the code in ObjectProxyCache was a bit more approachable. Switching this all to monitors would be an interesting job.

@jmiettinen
Copy link
Contributor Author

We've encountered this live, so this is not a totally theoretical problem.

However, in cases where we've blown the stack, there seems to be some odd things happening.
Is there a way a JRuby internals stack frame would consume more than ~200 bytes (say, 16 * 8 bytes for registers and then still 10 variables more)?

We've had stack blown with just ~70 Ruby frames which, based on this estimate should take around 70 kB (70 Ruby frames * 5 Java Frames / Ruby Frame * 200 bytes / Java Frame).

We have -Djruby.compile.invokedynamic=false so that should not play a role here.

Thus, I don't think this should happen in normal usage. We have something odd going on either in Rails or in how JRuby reports the stack in back trace.

@kares
Copy link
Member

kares commented May 12, 2016

sounds like this could use some detailed examination. knowing more about the problematic stack and in general about the app could help resolvingthe issue.

@headius
Copy link
Member

headius commented May 12, 2016

May I ask how you are deploying? JRuby's launcher at startup bumps the default JVM thread stack size up to 2MB. If you are deploying in an embedded scenario or launching JRuby without one of our launchers (e.g. java -jar ...), it will use the JVM's default (1MB on my system).

We bump it up because yes, JRuby does consume a fair bit of stack. We work to reduce this periodically, but having an interpreter means we'll always use more stack than other JVM languages.

The answer to this bug is most likely one of the following:

  • Some library is recursing too deeply (in error). Fix it.
  • JRuby itself is consuming more stack than the JVM provides. Bump up the JVM value.

I did ask around on Twitter and received pointers to OpenJDK9 JEP-270, which seeks to provide reserved stack space for operations like locking and unlocking, to help ensure they never blow the stack. However stack overflow is always going to be an issue, and as an asynchronous exception it's always going to be considered fatal.

We're happy to help you investigate the actual stack overflow. That's the problem we should be chasing here. Open an issue for that, please :-)

@headius headius closed this as completed May 12, 2016
@headius headius added this to the Won't Fix milestone May 12, 2016
@headius
Copy link
Member

headius commented May 12, 2016

I did think of one possible way we could improve this that's kinda silly and probably very fragile: make sure that lock and unlock deepen the stack exactly the same amount. That way, if we're at the end of the stack in the current method, lock will SOE and we'll never get to the deadlock.

I will look at that for a moment.

@headius headius reopened this May 12, 2016
@headius
Copy link
Member

headius commented May 12, 2016

Such a fix would be fragile because we don't control what happens at the JDK level, and they may not balance these methods properly at some point.

It's silly because it would work pretty well despite being fragile :-)

@headius
Copy link
Member

headius commented May 12, 2016

Yeah no dice I'm afraid. The best we can hope is that the JDK implementation of this logic is balanced (it appears to be, but there's many different paths) and that the JVM JITs them so they use balanced amounts of stack (which may be unrealistic to ever expect).

In any case I am back to thinking there's nothing we can do but look at the original SOE. Toss what you know in another issue and we'll look into it.

@headius headius closed this as completed May 12, 2016
@jmiettinen
Copy link
Contributor Author

Yeah, balancing the ReentrantLocks isn't something that's very doable, only monitors work there.
And yeah, we're deploying a Warbler package to Tomcat, and I just noticed that we have default stack size (1 MB) there.

But still, is my arithmetic totally off on the used stack as my estimates are one magnitude smaller (~70 KB) than what amount of memory should be available.

@headius
Copy link
Member

headius commented May 13, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants