-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in jnr-ffi finalizer #4506
Comments
I have not found a solution for this, but I have a hunch. I believe the jffi library is getting unloaded by classloader finalization before all the memory allocated by it has been cleaned up. Evidence based on my local dump:
I suspect that the jffi library is getting unloaded by the classloader before the finalizers run to clear up data it has allocated, like the CallContext in my case and potentially any number of other memory addresses since finalization can happen in any order. I've seen a couple different Java frames before the faulting frame:
And we're operating under a situation where jffi could easily be getting unloaded before all finalizers have run, since there's no way to guarantee finalization ordering. I've pinged a JVM friend to see if I'm on the right track with this. If I am, we have a bit of a conundrum:
I'm researching. Any thoughts you might have are welcome. |
My JVM friend says my theory is "highly plausible". And I think I have proof now. I modified your example to extend URLClassLoader and log when it gets finalized, along with logging when these jffi methods attempt to free resources. The last three lines before the segfault:
So it does indeed seem like the jffi structures are attempting to free, via JNI functions, after those JNI functions have been unloaded. |
Ok, so I think there is some silver lining here. Your particular example is an extreme case, in which JRuby and all its dependencies are getting unloaded, including jffi. If you were to have JRuby and friends in a higher-level classloader, so they don't unload, this would not be a problem. Typically if people are embedding JRuby, they don't also isolated it to its own classloader. That's the key issue here; by allowing JRuby to completely unload between runs, you are also forcing the JNI library to get unloaded, while there are still resources from jffi remaining to be finalized. The latter problem is a tricky one to solve, since there are very few hooks we can use to know that the JNI library has been unloaded. Even JNI_onUnload is problematic since we'd need to track all those native resources in the C library, and we'd need to do a release of jffi that rebuilds all our jffi backends across the dozen+ platforms we support. This is still a serious problem, however, since a WAR-file deployment should also be unloading jffi and causing the native library to unmap, as in your example (which is the best reproduction of this complex problem I've seen, so kudos to you). I'm going to have to pow-wow with @enebo and weigh some options. If feasible, a solid workaround for you would be to disable native library support in JRuby via JVM property jruby.native.enabled=false, which will avoid loading (or at least using) the native backend of jffi, avoiding the chicken/egg problem we have here. Actually solving the problem is going to require some creativity. |
This is not new and not easy to fix. It won't be in 9.1.8.0. |
According to the JLS, the Java language defines no order of the execution of finalize methods, so the actual order is an implementation detail. |
@zhanhb Yes...unfortunately I have found no way to work around this problem. |
Register the finalize to native, when the native lib get finalized in method |
@zhanhb In order to do that we'd need to track in the jffi native code all memory allocated via any function so that it can be freed during unload rather than in a finalizer. The root issue here is that we use some native functions to free allocated memory, we call those functions in a finalizer, and the finalizer might run after the JNI library has already been unloaded. This leads us to try to call a JNI function that's no longer there. |
@zhanhb Very nice! Perhaps we can merge our efforts? We'd be happy to cooperate with you. Meanwhile I will look at utilizing your change in jnr-ffi. |
What you have seems feasible; however there's a few improvements I might make. I do hope we can work together to bring your changes into jnr-ffi!
Otherwise, I thank you for coming up with an elegant solution! Please let me know how we can work together to improve jnr-ffi! |
This likely applies to any other Runnable de-allocators you are registering. |
We've also met this issue several times per week. Here's one of the heap dump:
Is it possible to run |
It looks like this issue popped up in the Jython tracker as well: http://bugs.jython.org/issue2701 @zhanhb I wanted to look at your links posted in #4506 (comment) but they are broken (404). I was able to kind of repair them, at least to find the files. If someone else wants to have a look, here are the restored links: |
Link updated, it's ok now. |
We need help here, like a PR for jffi or jnr-ffi :-( Going to have to bump. |
There's no issue to keep strong reference with the cleaner with. I rewrote the onUnload methods these days, maybe these code can be updated to jffi project. |
I'm having a possibly similar issue in a JRuby project running on Java 11. After upgrading to Java 11 I am getting SIGSEGV in several different native I/O routines during JRuby startup ~90% of the time, so it takes 10x+ as long to deploy the app.
Here are some of the locations of the crash:
In each case it kind of looks like libc was loaded at the address in question, but somehow unloaded before the crashing method was called. I tried putting the latest versions of jnr-ffi, jffi, jnr-posix, and jnr-enxio in the classpath with no improvement. I opened this issue on jnr-ffi: jnr/jnr-ffi#194 Some crash logs are here: https://gist.github.com/mike-bourgeous/c24c225d86eb26db629e8fb09f57d6a2 WorkaroundMy final workaround was passing Some oddities:
|
@mike-bourgeous I suspect you are right in thinking your issue is linked to this one. Unfortunately I'm not sure we have a preferred solution for this yet. The problem is core to how the JVM does classloading and native libraries. I believe another workaround is to make sure jffi is in a higher-up classloader, so it doesn't go away and unload the library...but I have not confirmed that. |
This is copied from #4312 where @shirosaki discovered jnr-ffi crashes in the finalizer for its generated assembly code stubs. Repro and output follows.
hs_err_pid11608.log contains:
The text was updated successfully, but these errors were encountered: