-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: fork and signal child handlers #6426
Fix: fork and signal child handlers #6426
Conversation
Thanks for doing the debugging! It really shows how dangerous fork is with these subtle ordering and error handling issues. |
I am not sure if this should be handled as part of this PR, but I think there still exists a subtle race condition in the signal handler implementation. Consider the following order of events: Parent : fork()
Child : spawns
Child : receives a signal
Child : writes the signal to parent's pipe
Child : Signal.after_fork is called
Parent : processes the signal meant for child (**BUG**) I haven't actually tried writing down a fix but something like this should work: #Pseudo code
# block all the signals for the current thread
pthread_sigmask(block all signals)
pid =LibC.fork
if pid > 0
#Parent
pthread_sigmask(unblock all signals)
else
#child
if run_hooks # (Process.fork)
Signal.after_fork
pthread_sigmask(unblock all signals)
run_remaining_hooks
else #(Process.exec)
# reset all signal handlers to SIG_DFL or call Signal.after_fork
(signals.each { |signal| signal.reset } ) OR (Signal.after_fork)
pthread_sigmask(unblock all signals)
execvp
end
end
References: |
You're right again! I added a commit to disable signals then reenable them after fork (parent and child). Now we should be fine 🤞 I chose to continue to not reset signal handlers in the child process. Either developers don't care about signals and we'll keep the default handlers (i.e. SIGSEGV, SIGCHLD) or she cares about them and she's responsible for resetting them if needed. |
Thank you @ysbaddaden for working on this PR. Everything is fine except that the race condition still exists in fork+exec scenario as signals are not blocked before forking when module Crystal::Signal
def self.clear_signal_handlers
@@handlers.keys.each { |sig| sig.reset }
end
end -if run_hooks
- LibC.sigfillset(pointerof(newmask))
- ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
- raise Errno.new("pthread_sigmask") unless ret == 0
-end
+LibC.sigfillset(pointerof(newmask))
+ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
+raise Errno.new("pthread_sigmask") unless ret == 0
pid = LibC.fork
errno = Errno.value
case pid
when 0
# child:
pid = nil
if run_hooks
Process.after_fork_child_callbacks.each(&.call)
- LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
+ else
+ # exec scenario
+ Crystal::Signal.clear_signal_handlers
end
+ LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
when -1
# error:
- LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) if run_hooks
+ LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
raise Errno.new("fork", errno)
else
# parent:
- LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) if run_hooks
+ LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
end Someone experienced a similar in go: PS: Ideally, the method # Psuedo code for thread-safe clear_signal_handlers
module Crystal::Signal
@@handled = StaticArray(AtomicFlag, 32).new(AtomicFlag.new(0_u8))
def self.trap(signal, handler) : Nil
@@handled[signal].set(1_u8)
previous_def
end
def self.clear_signal_handlers
@@handled.each_with_index do |flag, signal|
next if flag.get == 0_u8
signal.reset
end
end
end |
@bararchy Hold your horses, let's wait for conclusions regarding newest @forksaber's comment. |
@Sija sure, I was under the impression this has already been reviewed. |
Well, it was. Before that comment ;) |
Yes, @forksaber is right again! I'm fixing this. |
src/process.cr
Outdated
if run_hooks | ||
Process.after_fork_child_callbacks.each(&.call) | ||
LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In chose to not reenable signals in a fork/exec situation. I just leave the handlers and mask in place, and let execve(2)
override that for the new process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you don't reenable the signals, they will remain blocked across the exec. So, it is necessary to reenable the signals and reset them.
The new process shall inherit at least the following attributes from the calling process image: Nice value (see nice()) semadj values (see semop()) Process ID Parent process ID Process group ID Session membership Real user ID Real group ID Supplementary group IDs Time left until an alarm clock signal (see alarm()) Current working directory Root directory File mode creation mask (see umask()) File size limit (see ulimit()) Process signal mask (see sigprocmask()) Pending signal (see sigpending()) tms_utime, tms_stime, tms_cutime, and tms_cstime (see times()) Resource limits Controlling terminal Interval timers
Reference:
http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html
Sigh... Okay. |
87e6430
to
5d101bf
Compare
Sorry for being such an annoying person/reviewer :( # run_hooks: false
pid = LibC.fork
case pid
when 0
# child
# RACE here
exec
end to # run_hooks: false
block_signals
pid = LibC.fork
case pid
when 0
# child
unblock_signals
# RACE here : since parent's pipe exists, any signal
# delivered here will end up in parent process
exec
end
A crucial step is missing to fix this race condition is to reset the signal handlers to SIG_DFL before unblocking the signals: case pid
when 0
# child
+ Crystal::Signal.reset_signal_handlers!
unblock_signals
# RACE here : since parent's pipe exists, any signal
# delivered here will end up in parent process
exec
end
An attempt to fix this: module Crystal::Signal
def self.mutex
@@mutex
end
# safe to invoke only in post-fork/pre-exec state
def self.reset_signal_handlers!
@@handlers.keys.each { |sig| sig.reset }
end
end protected def self.fork_internal(run_hooks : Bool = true)
newmask = uninitialized LibC::SigsetT
oldmask = uninitialized LibC::SigsetT
LibC.sigfillset(pointerof(newmask))
ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
raise Errno.new("Failed to disable signals") unless ret == 0
+ # lock signal mutex before forking to ensure @@handlers is in
+ # a consistent state
+ mutex = Crystal::Signal.mutex
+ mutex.lock if ! run_hooks
+
case pid = LibC.fork
when 0
# child:
pid = nil
if run_hooks
Process.after_fork_child_callbacks.each(&.call)
LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
else
+ # no need to unlock mutex as we won't be registering
+ # any custom signal handlers before exec.
+ # Also unlocking mutex not safe here as scheduler is not reinitialized
+ Crystal::Signal.reset_signal_handlers!
# sigmask is inherited: must reset it
LibC.sigemptyset(pointerof(newmask))
LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), nil)
end
when -1
+ # unlock mutex immediately after fork in parent
+ mutex.unlock if ! run_hooks
# error:
errno = Errno.value
LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
raise Errno.new("fork", errno)
else
+ # unlock mutex immediately after fork in parent
+ mutex.unlock if ! run_hooks
# parent:
LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
end
pid
end |
@forksaber honestly we really appreciate your comments! It's much nicer to sort this out now instead of debugging an intermittent race condition in production years later. |
Yes, you're right again! Sigh me, too focused on the child state than on the child -> parent leak. |
5d101bf
to
a146666
Compare
Now I hope it's fine 🤞 |
The patch LGTM 👍 when there exists only a single thread (which is currently the case). However, consider this multi-threaded case
So, I can understand that you may not want to include the |
81d93d5
to
9aa18c2
Compare
Let's reset all signals and be done with it now. |
@ysbaddaden Agreed |
Assumption: An average crystal program defines 3 custom signal handlers To make this decision easier, I am adding a benchmark that compares the following approaches:
require "benchmark"
class AtomicFlag
@flag = Atomic(UInt8).new(0_u8)
def get : Bool
@flag.get == 1_u8
end
def set(val : Bool)
if val
@flag.set(1_u8)
else
@flag.set(0_u8)
end
end
end
arr = StaticArray(AtomicFlag, 32).new { AtomicFlag.new }
handlers = Hash(Int32, Proc(Signal, Nil)).new
n = 3
(1..n).each do |i|
arr[i].set true
handlers[i] = ->(sig : Signal) { nil }
end
Benchmark.ips do |x|
x.report("reset changed - hash ") do
handlers.each_key do |i|
LibC.signal(i, LibC::SIG_DFL)
end
end
x.report("reset changed - atomic") do
(1..31).each do |i|
flag = arr[i-1].get
LibC.signal(i, LibC::SIG_DFL) if flag
end
end
x.report("reset all ") do
(1..31).each { |i| LibC.signal(i, LibC::SIG_DFL) }
end
end Output
I am fine with either solution....using atomics to reset select signals or resetting all the signals. |
1905a26
to
17c7fee
Compare
Thanks for the benchmark! I maintain a The set is only mutated within a mutex so mutation is thread safe. It's also modified before the actual trap and after the actual reset. I assume it's fine to use without the mutex in fork/exec. It's just a fixed set of longs, and either a thread is adding a signal to the set and we don't care (the signal isn't trapped, yet) or it's being removed and we'll just have a useless reset. All cases look fine to me. I hope you agree @forksaber 🤞 |
17c7fee
to
7d17917
Compare
@ysbaddaden unless the last message from @forksaber push you to do something else in this PR we are good to go and include this PR as is. @forksaber you probably know this already, but you input was of great great value. Thanks! 🙇 |
It condos my understanding: Go is using an atomic to avoid a mutex that would block. The race is quite edge case: it requires that a thread would modify a signal while another is exactly calling fork. I naively assume that a program does setup a few signals at startup and won't change them much afterwards. Maybe we can just close the pipe in fork/exec and add a "write byte unless closed" in the signal handler. That would avoid passing child signal to parent and exec will reset any signals we missed to SIG_DFL anyway. |
Nice idea @ysbaddaden module Crystal::Signal
@@after_fork_before_exec = false
def self.after_fork_before_exec
@@after_fork_before_exec = true
::Signal.each do |signal|
LibC.signal(signal, LibC::SIG_DFL) if signal.set?
end
end
def self.trap(signal, handler) : Nil
@@mutex.synchronize do
unless @@handlers[signal]?
signal.set_add
####
LibC.signal(signal.value, ->(value : Int32) {
if @@after_fork_before_exec
# signal handler was not reset due to the above race...lets reset it
LibC.signal(value, LibC::SIG_DFL)
# re raise the signal so that its handled with SIG_DFL semantics
LibC.raise(value)
else
writer.write_bytes(value)
end
})
####
end
@@handlers[signal] = handler
end
end
end We can even double down on this idea and forget about maintaining a Thanks @bcardiff, I appreciate it 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some really great improvements here 🌮 Thank you @ysbaddaden @forksaber 👍
@ysbaddaden given that we should release soon, maybe either we merge as is this pr and defer future work, or we release 0.26 without this. Your call. |
7d17917
to
fa84884
Compare
I pushed a commit to close the pipe anyway, and to check if it's closed when writing the signal. |
fa84884
to
8184d6e
Compare
I think it's okay for now. An atomic sigset would allow to avoid a mutex, but that can wait for multithreaded fibers. We can probably avoid more race conditions, but that's really an edge case, and we figure something better someday if it proves to really be an issue —I doubt an app would trap/reset signals in loops. |
Hi it sounds like you guys already got everything figured out but I wanted to point you to the facebook folly Subprocess c++ implementation which is pretty well documented in case you hadn't seen it: https://github.com/facebook/folly/blob/master/folly/Subprocess.cpp |
I would rather include this in 0.27. It is a bit sensible and I would prefer to minimize the risk of needing a 0.26.2 for this. |
@ysbaddaden would you rebase on master so we merge this without squashing? |
Fixes read operation after fork, forcing the signal handler fiber to retry on the new pipe. Also avoids the child from receiving a parent signal from the pipe. refs crystal-lang#6416
This will avoid a future issue where a thread could process signal in parallel to the signal resets.
Always block signal reception before forking, even in fork/exec cases, so the child will never receive a signal and wrongly forward it to the parent process. Always restores the sigmask after fork. During fork/exec the signal pipe isn't replaced but signal handlers are resetted to `SIG_DFL` so they won't be received anymore (and won't be forwarded to the parent) before we restore the sigmask (inherited after exec).
Improves fork/exec executation by limiting the number of signals to reset to SIG_DFL to the signals that have actually been trapped. We now reset only 1 signal by default, instead of 32.
8184d6e
to
e5b401f
Compare
I just did. There was no conflicts, thought. |
Fixes an issue where a parent process could receive signals directed to the child or vice-versa.
fixes #6416