-
Notifications
You must be signed in to change notification settings - Fork 201
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WIP] subprocesses: unify termination logic
Showing
2 changed files
with
55 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there drivers that actually do something useful with SIGTERM on Linux?
34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if there is not, that would not be a reason not to use it here. If I was writing a controller, I would always want to handle SIGTERM as it is the only thing that I can resonably expect to see before being killed, e.g. if the system is shutting down.
On windows we could (and maybe should)
send_signal(CTRL_BREAK_EVENT)
.34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you typically want to handle it? Install a signal handler that raises an exception (similar to
KeyboardInterrupt
) so that e.g.finally
clauses are properly executed?This adds boilerplate to every controller, and it's still an approximate solution as asynchronous exceptions in Python have races (http://dept.cs.williams.edu/~freund/papers/python.pdf).
34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that clean shutdown is already possible with the current mechanism. Call
terminate
on the controller manager usingartiq_rpctool
in your system's shutdown scripts.34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who will write those scripts?
SIGTERM
is already there and is something that you see without any further work and without having to invent any further buggy scripts.Also remember that
terminate()
relies on the entire networking stack working and the interface being up.34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARTIQ servers are always bound on localhost now in addition to any specified network address, so I do not think the network setup should be a problem.
SIGTERM
is generally difficult to handle reliably (without race conditions) as it is asynchronous. Python messed it up. In addition Unix signals have subtle issues wrt the restarting of system calls. Do you really want to useSIGTERM
as the recommended graceful process termination mechanism?34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I will concede that the ability to interrupt a blocking system call at all is an advantage for signals.
34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is
--no-localhost-bind
... Why was that added anyway? I seem to have needed it because of some ITAC networking problems.I would not use
SIGTERM
as the primary way (among other things it does not exist on Windows) but as a (non-ideal) fallback for the cases where I don't get aterminate()
.34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, it makes it easier to use clients when you are logged on the same machine as the server, as they work without any network options then.
34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting
address already in use
from the server without that option. Even though there was nothing using that port.34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh. Seems my usual pessimism about computers that gave me the idea to add that
--no-localhost-bind
option was totally justified. How did that happen though? It tries to bind on both 127.0.0.1 and ::1 by default, maybe there are systems on which binding a on v4 address also binds on the v6?34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it if I see it again. My guess would be IPv6 related in some way.
34403b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the problem, we could bind on v6 only.