New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos-container: Fix destroy
terminating before it's done. Fixes #32545
#32992
Conversation
CC @domenkozar |
CC @johbo @kevincox via NixOS/nixops#759 |
@GrahamcOfBorg eval (sorry for the noise, master was broken by a merge) |
system("machinectl", "terminate", $containerName) == 0 | ||
or die "$0: failed to terminate container\n"; | ||
# Wait for the leader process to exit | ||
while ( kill 0, $leader ) { sleep 1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep 0.1
would probably be preferred, as it is less noticeable.
This solution is unsafe because leader PID may be freed and reused by an unrelated process while terminateContainer
sleeps. machinectl
(or systemctl
) should be used for querying the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep 0.1 would probably be preferred, as it is less noticeable.
@orivej I agree and I'd also prefer a shorter duration, but as far as I understand from here, in perl you can't sleep less than 1 second, unless you use the Time::HiRes
module, which I don't know if it's available here. Improvements welcome.
This solution is unsafe because leader PID may be freed and reused by an unrelated process while terminateContainer sleeps.
Yes
machinectl (or systemctl) should be used for querying the status.
@orivej Do you know if there's a UUID or something like that provided by machinectl
, or do we have to wait based on the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I understand from here, in perl you can't sleep less than 1 second, unless you use the
Time::HiRes
module, which I don't know if it's available here. Improvements welcome.
That page says that this module is shipped with Perl 5.8 and later, so you can easily use it.
Do you know if there's a UUID or something like that provided by
machinectl
, or do we have to wait based on the name?
I don't know, I have not looked at machinectl
before this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we add a comment to improve reliability, this is an improvement on it's own compared to current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have switched to Time::HiRes
as @orivej suggested, and added a TODO about switching away from PIDs in nixos-container
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filed a separate issue to track the removal of PID based logic, and found that machinectl show
already provides an Id
field we could use: #33171
…xOS#32545. This also fixes the race condition found in NixOS#32551. And it fixes nixops's repeated destroy/deploy being broken (NixOS/nixops#809).
f914b53
to
5d83988
Compare
@GrahamcOfBorg test containers-imperative |
No attempt on x86_64-linux The following builds were skipped because they don't evaluate on x86_64-linux: tests.containers-imperative No log is available. |
Failure on aarch64-linux (full log) Attempted: tests.containers-imperative Partial log (click to expand)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NixOS tests runs and code looks good. Thanks!
nixos-container: Fix
destroy
terminating before it's done. Fixes #32545This also fixes the race condition found in #32551.
And it fixes nixops's repeated destroy/deploy being broken
(NixOS/nixops#809).
Things done
build-use-sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)