Fix: fork and signal child handlers #6426

ysbaddaden · 2018-07-22T13:09:33Z

Fixes an issue where a parent process could receive signals directed to the child or vice-versa.

RX14 · 2018-07-22T17:25:02Z

Thanks for doing the debugging! It really shows how dangerous fork is with these subtle ordering and error handling issues.

forksaber · 2018-07-23T11:16:16Z

I am not sure if this should be handled as part of this PR, but I think there still exists a subtle race condition in the signal handler implementation. Consider the following order of events:

Parent  : fork()
Child   : spawns
Child   : receives a signal
Child   : writes the signal to parent's pipe
Child   : Signal.after_fork is called
Parent  : processes the signal meant for child (**BUG**)

I haven't actually tried writing down a fix but something like this should work:

#Pseudo code

# block all the signals for the current thread
pthread_sigmask(block all signals) 
pid =LibC.fork
if pid > 0
  #Parent
  pthread_sigmask(unblock all signals)
else
 #child
  if run_hooks       # (Process.fork)
    Signal.after_fork
    pthread_sigmask(unblock all signals)
    run_remaining_hooks
  else                    #(Process.exec)
    # reset all signal handlers  to SIG_DFL or  call Signal.after_fork  
    (signals.each { |signal| signal.reset } ) OR (Signal.after_fork) 
    pthread_sigmask(unblock all signals)
    execvp
  end
end

References:
https://www.redhat.com/archives/libvir-list/2008-August/msg00303.html
https://github.com/libvirt/libvirt/blob/e83da1990cfb75ae9a83559b6b702bb9fc4bc61e/src/util/vircommand.c#L297

ysbaddaden · 2018-07-24T17:35:46Z

You're right again!

I added a commit to disable signals then reenable them after fork (parent and child). Now we should be fine 🤞

I chose to continue to not reset signal handlers in the child process. Either developers don't care about signals and we'll keep the default handlers (i.e. SIGSEGV, SIGCHLD) or she cares about them and she's responsible for resetting them if needed.

forksaber · 2018-07-25T02:26:24Z

Thank you @ysbaddaden for working on this PR. Everything is fine except that the race condition still exists in fork+exec scenario as signals are not blocked before forking when run_hooks is false. So the following changes should make this patch perfect:

module Crystal::Signal
  def self.clear_signal_handlers
    @@handlers.keys.each { |sig| sig.reset }
  end
end

-if run_hooks
-  LibC.sigfillset(pointerof(newmask))
-  ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
-  raise Errno.new("pthread_sigmask") unless ret == 0
-end
+LibC.sigfillset(pointerof(newmask))
+ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
+raise Errno.new("pthread_sigmask") unless ret == 0
 
 pid = LibC.fork
 errno = Errno.value
 
 case pid
 when 0
   # child:
   pid = nil
   if run_hooks
     Process.after_fork_child_callbacks.each(&.call)
-    LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
+  else
+    # exec scenario
+    Crystal::Signal.clear_signal_handlers
   end
+  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
 when -1
   # error:
-  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) if run_hooks
+  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
   raise Errno.new("fork", errno)
 else
   # parent:
-  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) if run_hooks
+  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
 end

Someone experienced a similar in go:
golang/go@df0892c

PS: Ideally, the method clear_signal_handlers should be implemented using an array of atomic flags for registered handlers as when multiple threads are present the @@handlers hash could be in an inconsistent state and might cause segfaults. But that can be tackled in a later PR.

# Psuedo code for thread-safe  clear_signal_handlers
module Crystal::Signal
  @@handled = StaticArray(AtomicFlag, 32).new(AtomicFlag.new(0_u8))
  
  def self.trap(signal, handler) : Nil
    @@handled[signal].set(1_u8)
    previous_def
  end
       
  def self.clear_signal_handlers
    @@handled.each_with_index do |flag, signal|
      next if flag.get == 0_u8
      signal.reset
     end
   end
end

bararchy · 2018-07-26T13:09:55Z

@RX14 or @jhass
Two approves here, LGTM?

Sija · 2018-07-26T14:38:12Z

@bararchy Hold your horses, let's wait for conclusions regarding newest @forksaber's comment.

bararchy · 2018-07-26T15:01:40Z

@Sija sure, I was under the impression this has already been reviewed.

Sija · 2018-07-26T15:05:29Z

Well, it was. Before that comment ;)

ysbaddaden · 2018-07-27T08:27:30Z

Yes, @forksaber is right again! I'm fixing this.

ysbaddaden · 2018-07-27T09:30:45Z

src/process.cr

+      if run_hooks
+        Process.after_fork_child_callbacks.each(&.call)
+        LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
+      end


In chose to not reenable signals in a fork/exec situation. I just leave the handlers and mask in place, and let execve(2) override that for the new process.

I think if you don't reenable the signals, they will remain blocked across the exec. So, it is necessary to reenable the signals and reset them.

The new process shall inherit at least the following attributes from the calling process image: Nice value (see nice()) semadj values (see semop()) Process ID Parent process ID Process group ID Session membership Real user ID Real group ID Supplementary group IDs Time left until an alarm clock signal (see alarm()) Current working directory Root directory File mode creation mask (see umask()) File size limit (see ulimit()) Process signal mask (see sigprocmask()) Pending signal (see sigpending()) tms_utime, tms_stime, tms_cutime, and tms_cstime (see times()) Resource limits Controlling terminal Interval timers

Reference:
http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html

ysbaddaden · 2018-07-27T19:45:56Z

Sigh... Okay.

forksaber · 2018-07-28T10:05:47Z

Sorry for being such an annoying person/reviewer :(
I can't help but point out that the race condition(fork/exec scenario) has shifted from

# run_hooks: false
pid = LibC.fork
case pid
when 0
  # child
  # RACE here
  exec
end

to

# run_hooks: false
block_signals
pid = LibC.fork
case pid
when 0
  # child
  unblock_signals
  # RACE here : since parent's pipe exists, any signal 
  # delivered here will end up in parent process
  exec
end

A crucial step is missing to fix this race condition is to reset the signal handlers to SIG_DFL before unblocking the signals:

case pid
when 0
  # child
+  Crystal::Signal.reset_signal_handlers!
  unblock_signals
  # RACE here : since parent's pipe exists, any signal 
  # delivered here will end up in parent process
  exec
end

An attempt to fix this:

module Crystal::Signal
  def self.mutex
    @@mutex
  end

  # safe to invoke only in post-fork/pre-exec state
  def self.reset_signal_handlers!
    @@handlers.keys.each { |sig| sig.reset }
  end
end

 protected def self.fork_internal(run_hooks : Bool = true)
   newmask = uninitialized LibC::SigsetT
   oldmask = uninitialized LibC::SigsetT
 
   LibC.sigfillset(pointerof(newmask))
   ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
   raise Errno.new("Failed to disable signals") unless ret == 0
 
+  # lock signal mutex before forking to ensure @@handlers is in
+  # a consistent state
+  mutex = Crystal::Signal.mutex
+  mutex.lock if ! run_hooks
+
   case pid = LibC.fork
   when 0
     # child:
     pid = nil
     if run_hooks
       Process.after_fork_child_callbacks.each(&.call)
       LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
     else
+      # no need to unlock mutex as we won't be registering
+      # any custom signal handlers before exec.
+      # Also unlocking mutex not safe here as scheduler is not reinitialized
+      Crystal::Signal.reset_signal_handlers!
       # sigmask is inherited: must reset it
       LibC.sigemptyset(pointerof(newmask))
       LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), nil)
     end
   when -1
+    # unlock mutex immediately after fork in parent
+    mutex.unlock if ! run_hooks
     # error:
     errno = Errno.value
     LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
     raise Errno.new("fork", errno)
   else
+    # unlock mutex immediately after fork in parent
+    mutex.unlock if ! run_hooks
     # parent:
     LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
   end
 
   pid
 end

RX14 · 2018-07-28T22:03:03Z

@forksaber honestly we really appreciate your comments! It's much nicer to sort this out now instead of debugging an intermittent race condition in production years later.

ysbaddaden · 2018-07-29T08:06:46Z

Yes, you're right again! Sigh me, too focused on the child state than on the child -> parent leak.

ysbaddaden · 2018-07-29T09:09:04Z

Now I hope it's fine 🤞

forksaber · 2018-07-29T09:51:48Z

The patch LGTM 👍 when there exists only a single thread (which is currently the case).

However, consider this multi-threaded case

Thread-1 : locks Crystal::Signal mutex

Thread-1 : is in the middle of modifying @@handlers hash....execution state could be 
           inside any method of the Hash implementation ( hash's internal state might 
           be inconsistent depending on the implementation)

Thread-2 : fork

Child : now accesses said inconsistent @@handlers hash possibly 
        causing segfault or undefined behaviour

Thread-1 : finishes modifying @@handlers

So, mutex.lock is needed before fork to ensure that no other thread modifies the @@handlers hash while the current thread forks.

I can understand that you may not want to include the mutex.lock since Crystal is currently single threaded.
An alternative is to add some TODO comment related to this, so we can revisit this once thread-support lands.

ysbaddaden · 2018-07-30T10:16:52Z

Let's reset all signals and be done with it now.

forksaber · 2018-07-30T10:40:52Z

@ysbaddaden Agreed

forksaber · 2018-07-30T12:05:21Z

Assumption: An average crystal program defines 3 custom signal handlers

To make this decision easier, I am adding a benchmark that compares the following approaches:

hash: @@handlers.keys is used to find signals and reset them
reset all: reset all signal handlers irrespective of whether they have non default values
array of atomic flags: used alongside @@handlers hash to register and query handler state change

require "benchmark"
class AtomicFlag
  @flag = Atomic(UInt8).new(0_u8)

  def get : Bool
    @flag.get == 1_u8
  end

  def set(val : Bool)
    if val
      @flag.set(1_u8)
    else
      @flag.set(0_u8)
    end
  end
end

arr = StaticArray(AtomicFlag, 32).new { AtomicFlag.new }
handlers = Hash(Int32, Proc(Signal, Nil)).new

n = 3
(1..n).each do |i|
  arr[i].set true
  handlers[i] = ->(sig : Signal) { nil }
end

Benchmark.ips do |x|
  x.report("reset changed - hash  ") do
    handlers.each_key do |i|
      LibC.signal(i, LibC::SIG_DFL)
    end
  end

  x.report("reset changed - atomic") do
    (1..31).each do |i|
      flag = arr[i-1].get
      LibC.signal(i, LibC::SIG_DFL) if flag
    end
  end

  x.report("reset all             ") do
    (1..31).each { |i| LibC.signal(i, LibC::SIG_DFL) }
  end
end

Output

reset changed - hash     1.47M (678.12ns) (± 0.42%)  0 B/op        fastest
reset changed - atomic   1.42M (705.08ns) (± 0.29%)  0 B/op   1.04× slower
reset all              144.33k (  6.93µs) (± 0.42%)  0 B/op  10.22× slower

I am fine with either solution....using atomics to reset select signals or resetting all the signals.

ysbaddaden · 2018-08-01T09:48:42Z

Thanks for the benchmark!

I maintain a sigset_t, because signals are tricky: they usually range 1-32, but there ain't no guarantee about that. In addition the POSIX functions are fast enough. With this patch it's only 1.07× slower than the unsafe @@handlers.each_key, so Process.run won't see a significant performance loss.

The set is only mutated within a mutex so mutation is thread safe. It's also modified before the actual trap and after the actual reset. I assume it's fine to use without the mutex in fork/exec. It's just a fixed set of longs, and either a thread is adding a signal to the set and we don't care (the signal isn't trapped, yet) or it's being removed and we'll just have a useless reset.

All cases look fine to me. I hope you agree @forksaber 🤞

bcardiff · 2018-08-04T20:18:38Z

@ysbaddaden unless the last message from @forksaber push you to do something else in this PR we are good to go and include this PR as is.

@forksaber you probably know this already, but you input was of great great value. Thanks! 🙇

ysbaddaden · 2018-08-05T07:09:16Z

It condos my understanding: Go is using an atomic to avoid a mutex that would block.

The race is quite edge case: it requires that a thread would modify a signal while another is exactly calling fork.

I naively assume that a program does setup a few signals at startup and won't change them much afterwards.

Maybe we can just close the pipe in fork/exec and add a "write byte unless closed" in the signal handler. That would avoid passing child signal to parent and exec will reset any signals we missed to SIG_DFL anyway.

forksaber · 2018-08-05T08:49:23Z

Nice idea @ysbaddaden
Building on your idea....something like this should work great.

module Crystal::Signal
  @@after_fork_before_exec = false

  def self.after_fork_before_exec
    @@after_fork_before_exec = true
    ::Signal.each do |signal|
      LibC.signal(signal, LibC::SIG_DFL) if signal.set?
    end
  end

  def self.trap(signal, handler) : Nil
    @@mutex.synchronize do
      unless @@handlers[signal]?
        signal.set_add
####
        LibC.signal(signal.value, ->(value : Int32) {
          if @@after_fork_before_exec
             # signal handler was not reset due to the above race...lets reset it
             LibC.signal(value, LibC::SIG_DFL)
             # re raise the signal so that its handled with SIG_DFL semantics
             LibC.raise(value)
          else
             writer.write_bytes(value)
          end
        })
####
      end
      @@handlers[signal] = handler
    end
  end

end

We can even double down on this idea and forget about maintaining a sigset. Then, every signal handled in after_fork_before_exec state will be reset dynamically. 😁

Thanks @bcardiff, I appreciate it 😃

sdogruyol

Some really great improvements here 🌮 Thank you @ysbaddaden @forksaber 👍

bcardiff · 2018-08-06T20:55:06Z

@ysbaddaden given that we should release soon, maybe either we merge as is this pr and defer future work, or we release 0.26 without this. Your call.

ysbaddaden · 2018-08-08T20:12:28Z

I pushed a commit to close the pipe anyway, and to check if it's closed when writing the signal. ~~I'm fixing~~ I fixed the conflict.

ysbaddaden · 2018-08-08T20:20:21Z

I think it's okay for now. An atomic sigset would allow to avoid a mutex, but that can wait for multithreaded fibers. We can probably avoid more race conditions, but that's really an edge case, and we figure something better someday if it proves to really be an issue —I doubt an app would trap/reset signals in loops.

ghastmask · 2018-08-21T12:40:54Z

Hi it sounds like you guys already got everything figured out but I wanted to point you to the facebook folly Subprocess c++ implementation which is pretty well documented in case you hadn't seen it:

https://github.com/facebook/folly/blob/master/folly/Subprocess.cpp

bcardiff · 2018-08-22T17:03:25Z

I would rather include this in 0.27. It is a bit sensible and I would prefer to minimize the risk of needing a 0.26.2 for this.

bcardiff · 2018-09-06T02:47:23Z

@ysbaddaden would you rebase on master so we merge this without squashing?

Fixes read operation after fork, forcing the signal handler fiber to retry on the new pipe. Also avoids the child from receiving a parent signal from the pipe. refs crystal-lang#6416

This will avoid a future issue where a thread could process signal in parallel to the signal resets.

Always block signal reception before forking, even in fork/exec cases, so the child will never receive a signal and wrongly forward it to the parent process. Always restores the sigmask after fork. During fork/exec the signal pipe isn't replaced but signal handlers are resetted to `SIG_DFL` so they won't be received anymore (and won't be forwarded to the parent) before we restore the sigmask (inherited after exec).

Improves fork/exec executation by limiting the number of signals to reset to SIG_DFL to the signals that have actually been trapped. We now reset only 1 signal by default, instead of 32.

ysbaddaden · 2018-09-06T08:13:44Z

I just did. There was no conflicts, thought.

RX14 approved these changes Jul 22, 2018

View reviewed changes

RX14 added kind:bug topic:stdlib labels Jul 22, 2018

jhass approved these changes Jul 24, 2018

View reviewed changes

ysbaddaden commented Jul 27, 2018

View reviewed changes

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from 87e6430 to 5d101bf Compare July 27, 2018 20:30

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from 5d101bf to a146666 Compare July 29, 2018 09:08

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch 2 times, most recently from 81d93d5 to 9aa18c2 Compare July 30, 2018 10:16

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from 1905a26 to 17c7fee Compare August 1, 2018 09:30

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from 17c7fee to 7d17917 Compare August 1, 2018 09:56

bcardiff approved these changes Aug 4, 2018

View reviewed changes

bcardiff added this to the 0.26.0 milestone Aug 4, 2018

sdogruyol approved these changes Aug 6, 2018

View reviewed changes

bcardiff removed this from the 0.26.0 milestone Aug 8, 2018

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from 7d17917 to fa84884 Compare August 8, 2018 20:10

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from fa84884 to 8184d6e Compare August 8, 2018 20:15

RX14 approved these changes Aug 10, 2018

View reviewed changes

RX14 added this to the 0.26.1 milestone Aug 21, 2018

RX14 requested a review from bcardiff August 21, 2018 23:31

bcardiff modified the milestones: 0.26.1, 0.27.0 Aug 22, 2018

ysbaddaden added 5 commits September 6, 2018 10:11

Fix: always close parent signal pipe after fork

d195adf

Fixes read operation after fork, forcing the signal handler fiber to retry on the new pipe. Also avoids the child from receiving a parent signal from the pipe. refs crystal-lang#6416

Fix: always reset signal handlers before scheduler reset on fork

dc2695c

This will avoid a future issue where a thread could process signal in parallel to the signal resets.

Use fork_internal(will_exec) instead of run_hooks

e4651cb

Only reset trapped signals

e5b401f

Improves fork/exec executation by limiting the number of signals to reset to SIG_DFL to the signals that have actually been trapped. We now reset only 1 signal by default, instead of 32.

ysbaddaden force-pushed the fix-fork-and-signal-child-handlers branch from 8184d6e to e5b401f Compare September 6, 2018 08:11

bcardiff merged commit a350db8 into crystal-lang:master Sep 6, 2018

ysbaddaden deleted the fix-fork-and-signal-child-handlers branch September 7, 2018 09:31

oprypin added a commit to oprypin/crystal that referenced this pull request Apr 7, 2020

Restore crystal-lang#6426

16875f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: fork and signal child handlers #6426

Fix: fork and signal child handlers #6426

ysbaddaden commented Jul 22, 2018 •

edited

Loading

RX14 commented Jul 22, 2018

forksaber commented Jul 23, 2018

ysbaddaden commented Jul 24, 2018

forksaber commented Jul 25, 2018

bararchy commented Jul 26, 2018

Sija commented Jul 26, 2018

bararchy commented Jul 26, 2018

Sija commented Jul 26, 2018

ysbaddaden commented Jul 27, 2018

ysbaddaden Jul 27, 2018

forksaber Jul 27, 2018 •

edited

Loading

ysbaddaden commented Jul 27, 2018

forksaber commented Jul 28, 2018 •

edited

Loading

RX14 commented Jul 28, 2018

ysbaddaden commented Jul 29, 2018

ysbaddaden commented Jul 29, 2018

forksaber commented Jul 29, 2018

ysbaddaden commented Jul 30, 2018

forksaber commented Jul 30, 2018

forksaber commented Jul 30, 2018

ysbaddaden commented Aug 1, 2018

bcardiff commented Aug 4, 2018

ysbaddaden commented Aug 5, 2018

forksaber commented Aug 5, 2018 •

edited

Loading

sdogruyol left a comment

bcardiff commented Aug 6, 2018

ysbaddaden commented Aug 8, 2018 •

edited

Loading

ysbaddaden commented Aug 8, 2018

ghastmask commented Aug 21, 2018

bcardiff commented Aug 22, 2018

bcardiff commented Sep 6, 2018

ysbaddaden commented Sep 6, 2018

Fix: fork and signal child handlers #6426

Fix: fork and signal child handlers #6426

Conversation

ysbaddaden commented Jul 22, 2018 • edited Loading

RX14 commented Jul 22, 2018

forksaber commented Jul 23, 2018

ysbaddaden commented Jul 24, 2018

forksaber commented Jul 25, 2018

bararchy commented Jul 26, 2018

Sija commented Jul 26, 2018

bararchy commented Jul 26, 2018

Sija commented Jul 26, 2018

ysbaddaden commented Jul 27, 2018

ysbaddaden Jul 27, 2018

Choose a reason for hiding this comment

forksaber Jul 27, 2018 • edited Loading

Choose a reason for hiding this comment

ysbaddaden commented Jul 27, 2018

forksaber commented Jul 28, 2018 • edited Loading

RX14 commented Jul 28, 2018

ysbaddaden commented Jul 29, 2018

ysbaddaden commented Jul 29, 2018

forksaber commented Jul 29, 2018

ysbaddaden commented Jul 30, 2018

forksaber commented Jul 30, 2018

forksaber commented Jul 30, 2018

ysbaddaden commented Aug 1, 2018

bcardiff commented Aug 4, 2018

ysbaddaden commented Aug 5, 2018

forksaber commented Aug 5, 2018 • edited Loading

sdogruyol left a comment

Choose a reason for hiding this comment

bcardiff commented Aug 6, 2018

ysbaddaden commented Aug 8, 2018 • edited Loading

ysbaddaden commented Aug 8, 2018

ghastmask commented Aug 21, 2018

bcardiff commented Aug 22, 2018

bcardiff commented Sep 6, 2018

ysbaddaden commented Sep 6, 2018

ysbaddaden commented Jul 22, 2018 •

edited

Loading

forksaber Jul 27, 2018 •

edited

Loading

forksaber commented Jul 28, 2018 •

edited

Loading

forksaber commented Aug 5, 2018 •

edited

Loading

ysbaddaden commented Aug 8, 2018 •

edited

Loading