Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spawn+Process.wait2 sometimes raises ECHILD (on OSX) #3274

Closed
e2 opened this issue Aug 22, 2015 · 5 comments
Closed

spawn+Process.wait2 sometimes raises ECHILD (on OSX) #3274

e2 opened this issue Aug 22, 2015 · 5 comments
Labels
Milestone

Comments

@e2
Copy link

e2 commented Aug 22, 2015

Example overview (from guard/guard-rspec#341):

  1. use spawn to run RSpec with parameters (returns pid)
  2. after a few seconds, command succeeds (output is present)
  3. use Process.wait2(pid) -> raises Errno::ECHILD

Expected: shouldn't raise Errno::ECHILD

Actual:

  • output from RSpec (succeeds):
Finished in 0.126 seconds (files took 1.51 seconds to load)
1 example, 0 failures
  • Stack trace (given current JRuby master):
10:23:05 - ERROR - Guard::RSpec failed to achieve its <run_on_modifications>, exception was:
> [#] Errno::ECHILD: No child processes - No child processes
> [#] org/jruby/RubyProcess.java:536:in `waitpid'
> [#] org/jruby/RubyProcess.java:521:in `waitpid'
> [#] org/jruby/RubyProcess.java:716:in `waitpid2'
> [#] org/jruby/RubyProcess.java:724:in `waitpid2'
> [#] org/jruby/RubyProcess.java:771:in `wait2'
> [#] /Users/jclark/.rbenv/versions/jruby-1.7.22/lib/ruby/gems/shared/gems/guard-rspec-4.6.4/lib/guard/rspec/rspec_process.rb:39:in `_really_run'

JRuby versions:

  • jruby 1.7.20.1 (1.9.3p551) 2015-06-10 d7c8c27 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26 [darwin-x86_64]
  • current master (?)

Notes

@jasonrclark
Copy link

This is happened for me on both 1.7.20.1 and 1.7.22. (Running OS X 10.10.5 FWIW)

Fortunately, I don't see it with my repro case with JRuby 9.0.0.0. ✨

@headius
Copy link
Member

headius commented Sep 4, 2015

This is a problem due to our using the JDK process APIs. All processes started via java.lang.ProcessBuilder or java.lang.Runtime work the following way (at least in OpenJDK-based impls):

  1. A native call is made to fork+exec the new process
  2. All streams are closed in the child. Stdio streams are connected to the other end of pipes from parent.
  3. The parent immediately starts a new thread that waits on the child pid.

It's this last step that interferes with use of wait* on JRuby 1.7. Most of these functions can be called exactly once by the parent process (exception for WNOHANG), and most of the time the JDK thread gets to it before we do...often before we even finish the spawn logic and return a pid. This causes your subsequent call to wait* to raise ECHILD.

As documented in the man page:

ERRORS
     The wait() system call will fail and return immediately if:

     [ECHILD]           The calling process has no existing unwaited-for child processes.

The "unwaited-for" bit is what I'm talking about here.

It works on 9k because 9k reimplemented process spawning entirely using native calls, so the pid you get back is not touched by any other threads. This also fixes many issues with inherited streams, redirected streams, and so on.

Unfortunately the 9k logic is extensive and we probably will not be backporting it into the 9k codebase, so I'm going to close this as fixed in 9.0.0.0.

@headius headius closed this as completed Sep 4, 2015
@headius
Copy link
Member

headius commented Sep 4, 2015

FWIW, a blog post I wrote about this JDK problem a few years ago: http://blog.headius.com/2013/06/the-pain-of-broken-subprocess.html

@headius headius added the core label Sep 4, 2015
@headius headius added this to the JRuby 9.0.0.0 milestone Sep 4, 2015
@e2
Copy link
Author

e2 commented Sep 4, 2015

Great material for a Halloween horror story, where ECHILD popping up out of
nowhere is still not as terrifying as the Java implemention behind it.

(Oversimplifying a POSIX layer will always force the rest of the world to
implement a truckload of ridiculous hacks on top of hacks at every corner -
I'm thrilled to know JRuby went native with this).

Answers a ton of questions - thanks for what you do.
On Sep 4, 2015 2:09 AM, "Charles Oliver Nutter" notifications@github.com
wrote:

FWIW, a blog post I wrote about this JDK problem a few years ago:
http://blog.headius.com/2013/06/the-pain-of-broken-subprocess.html


Reply to this email directly or view it on GitHub
#3274 (comment).

@jasonrclark
Copy link

Thanks again @headius for the detailed explanation! I learned a lot from it, and echo @e2 that I hugely appreciate what you do with JRuby! ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants