Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught USR2 signal kills the whole JVM with SIGSEGV #5049

Closed
perlun opened this issue Feb 16, 2018 · 8 comments
Closed

Uncaught USR2 signal kills the whole JVM with SIGSEGV #5049

perlun opened this issue Feb 16, 2018 · 8 comments

Comments

@perlun
Copy link
Contributor

perlun commented Feb 16, 2018

Environment

  • JRuby version (jruby -v) and command line (flags, JRUBY_OPTS, etc): 9.1.15.0, no command line flags or JRUBY_OPTS needed to reproduce.
  • Operating system and platform (e.g. uname -a): macOS 17.4.0 and Linux 4.14.12.

Expected Behavior

Raising the USR2 signal with no USR2 signal handler defined should output a message, like on MRI.

Actual Behavior

It kills the whole JVM with a SIGSEGV:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f3d8583680e, pid=5306, tid=0x00007f3d86b14740
#
# JRE version: Java(TM) SE Runtime Environment (8.0_151-b12) (build 1.8.0_151-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x92680e]  SR_handler(int, siginfo*, ucontext*)+0x3e
#
# Core dump written. Default location: /home/travis/build/perlun/jruby-core-dump/core or core.5306
#
# An error report file with more information is saved as:
# /home/travis/build/perlun/jruby-core-dump/hs_err_pid5306.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
/home/travis/.travis/job_stages: line 57:  5306 Aborted                 (core dumped) bundle exec rspec foo.rb

I set up this repro repo which also documents my findings a bit: https://github.com/perlun/jruby-core-dump. Here is the failing Travis job: https://travis-ci.org/perlun/jruby-core-dump/jobs/342244129

(I didn't include the coredump in this case since it's so easily reproducible, but let me know if you want it and I can post it somewhere.)


The easiest way to trigger this is by simply running ruby -e "Process.kill('USR2', Process.pid)". On MRI 2.3.6 and 2.5.0, this works fine - it prints a harmless message saying "User defined signal 2: 31". On JRuby OTOH, it kills the whole process.

I tried handling the signal to see if it makes an exception (I know it normally works, since Puma uses USR2 and it works fine on JRuby), but my code below didn't behave as expected:

Signal.trap('USR2') do
  puts 'USR2 raised'
end

Process.kill('USR2', Process.pid)

The "USR2 raised" message was printed out on MRI, but on JRuby nothing was printed. It didn't kill the process though, so it seems like it's the "signal unhandled" scenario that triggers this bug/difference in behavior.

@olleolleolle
Copy link
Member

@perlun Maybe this wiki article adds detail to "USR2 is... taken" https://github.com/jruby/jruby/wiki/Signal-Handling#jvm-occupied-signals

@perlun
Copy link
Contributor Author

perlun commented Feb 16, 2018

@olleolleolle

@perlun Maybe this wiki article adds detail to "USR2 is... taken" https://github.com/jruby/jruby/wiki/Signal-Handling#jvm-occupied-signals

Yeah, I noted that detail also. But on this JVM incarnation, catching USR2 typically works (in the Puma use case described above.) so I think that this is not the root cause for the problem in this case.

@headius @enebo or others, any ideas?

@perlun
Copy link
Contributor Author

perlun commented Feb 16, 2018

@olleolleolle mentioned the JAVA_SR_SIGNUM setting to me privately, thank you for this. 👍 When I try to override it, it changes the semantics here:

$ _JAVA_SR_SIGNUM=30 ruby -e "Process.kill('USR2', Process.pid)"
User defined signal 2: 31

(Note: I don't know what a suitable setting for _JAVA_SR_SIGNUM would be. 30 = USR1, so this effectively changes the suspend/resume to use USR1 - I don't know if this would work, or if it causes other issues.)

@headius
Copy link
Member

headius commented Feb 16, 2018

Confirmed on MacOS too. Looking at the crash dump, it does not look like JRuby is involved in the crash itself:

Stack: [0x00007ffeed6fd000,0x00007ffeedefd000],  sp=0x00007ffeedef9db0,  free space=8179k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x4884d4]
C  [libsystem_platform.dylib+0x1f5a]  _sigtramp+0x1a
C  [libobjc.A.dylib+0x85b1]  _objc_fetch_pthread_data+0x22
C  [CoreFoundation+0x86575]  __CFRunLoopServiceMachPort+0x155
C  [CoreFoundation+0x858c7]  __CFRunLoopRun+0x6f7
C  [CoreFoundation+0x84f43]  CFRunLoopRunSpecific+0x1e3
C  [java+0x6465]  CreateExecutionEnvironment+0x367
C  [java+0x218c]  JLI_Launch+0x7a0
C  [java+0x84c2]  main+0x65
C  [java+0x19e4]  start+0x34

And if I kill from the command line it appears to have the same effect.

And if I just launch a Java-only app, it also has the same effect.

$ cat Foo.java
public class Foo {
  public static void main(String[] args) {
    java.util.concurrent.locks.LockSupport.park();
  }
}

$ javac Foo.java

$ java Foo &
[1] 17256

$ kill -USR2 17256
#
# A fatal error has been detected by the Java Runtime Environment:
#
...

So I'm afraid you'll have to take this one up with JVM folks. Of course that kinda includes me.

Check out the OpenJDK bug tracker and see if anyone else has reported this. If not, I'll file an issue.

@headius headius closed this as completed Feb 16, 2018
@headius headius added this to the Invalid or Duplicate milestone Feb 17, 2018
@perlun
Copy link
Contributor Author

perlun commented Feb 20, 2018

Thanks @headius. Unfortunately, the OpenJDK bug tracker isn't publicly accessible; I cannot enter a bug there without becoming a "project author" or similar. Since I'm not working on the JDK (at least not at the moment 😄) this is not easily doable for me.

I could file it via https://bugreport.java.com though.

Would you mind filing this via the OpenJDK bug tracker for me? You have already done some of the important research here so it shouldn't be so much work; we could probably just copy-paste the relevant parts from here.

@headius
Copy link
Member

headius commented Feb 20, 2018

@perlun Yes, I know you can't file anything, but I thought you might poke around and see if there's any existing issue. I'm happy to file a proper report once we determine nobody else has done so.

@perlun
Copy link
Contributor Author

perlun commented Feb 21, 2018

Yes, I know you can't file anything, but I thought you might poke around and see if there's any existing issue.

Alright. I will check a bit more first, thanks. Will ping you again if needed here.

@perlun
Copy link
Contributor Author

perlun commented May 14, 2018

@headius Long time no see, but: I finally got down to cleaning out my email inbox and revisited this issue. I looked at https://bugs.openjdk.java.net/projects/JDK/issues/, searched for USR2 but couldn't find any open issue about it.

I also revisited your Foo.java example and verified that the problem persists with Java 10.0.1+10:

$ java Foo &
[1] 90729
$ kill -USR2 90729
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000010c43621f, pid=90729, tid=775
#
# JRE version: Java(TM) SE Runtime Environment (10.0.1+10) (build 10.0.1+10)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0.1+10, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# V  [libjvm.dylib+0x63621f]  SR_handler(int, __siginfo*, __darwin_ucontext*)+0x2f
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/plundberg/tmp/hs_err_pid90729.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

[1]+  Abort trap: 6           java Foo

Please submit this to the OpenJDK bug tracker so it can be reported upstream. Thanks a lot in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants