Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos-container: Fix destroy terminating before it's done. Fixes #32545 #32992

Merged
merged 1 commit into from Mar 12, 2018

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Dec 23, 2017

nixos-container: Fix destroy terminating before it's done. Fixes #32545

This also fixes the race condition found in #32551.

And it fixes nixops's repeated destroy/deploy being broken
(NixOS/nixops#809).

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@nh2
Copy link
Contributor Author

nh2 commented Dec 23, 2017

CC @domenkozar

@nh2
Copy link
Contributor Author

nh2 commented Dec 23, 2017

CC @johbo @kevincox via NixOS/nixops#759

@grahamc
Copy link
Member

grahamc commented Dec 23, 2017

@GrahamcOfBorg eval

(sorry for the noise, master was broken by a merge)

system("machinectl", "terminate", $containerName) == 0
or die "$0: failed to terminate container\n";
# Wait for the leader process to exit
while ( kill 0, $leader ) { sleep 1 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep 0.1 would probably be preferred, as it is less noticeable.

This solution is unsafe because leader PID may be freed and reused by an unrelated process while terminateContainer sleeps. machinectl (or systemctl) should be used for querying the status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep 0.1 would probably be preferred, as it is less noticeable.

@orivej I agree and I'd also prefer a shorter duration, but as far as I understand from here, in perl you can't sleep less than 1 second, unless you use the Time::HiRes module, which I don't know if it's available here. Improvements welcome.

This solution is unsafe because leader PID may be freed and reused by an unrelated process while terminateContainer sleeps.

Yes

machinectl (or systemctl) should be used for querying the status.

@orivej Do you know if there's a UUID or something like that provided by machinectl, or do we have to wait based on the name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I understand from here, in perl you can't sleep less than 1 second, unless you use the Time::HiRes module, which I don't know if it's available here. Improvements welcome.

That page says that this module is shipped with Perl 5.8 and later, so you can easily use it.

Do you know if there's a UUID or something like that provided by machinectl, or do we have to wait based on the name?

I don't know, I have not looked at machinectl before this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add a comment to improve reliability, this is an improvement on it's own compared to current implementation.

Copy link
Contributor Author

@nh2 nh2 Dec 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have switched to Time::HiRes as @orivej suggested, and added a TODO about switching away from PIDs in nixos-container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed a separate issue to track the removal of PID based logic, and found that machinectl show already provides an Id field we could use: #33171

…xOS#32545.

This also fixes the race condition found in NixOS#32551.

And it fixes nixops's repeated destroy/deploy being broken
(NixOS/nixops#809).
@garbas
Copy link
Member

garbas commented Mar 12, 2018

@GrahamcOfBorg test containers-imperative

@GrahamcOfBorg
Copy link

No attempt on x86_64-linux

The following builds were skipped because they don't evaluate on x86_64-linux: tests.containers-imperative

No log is available.

@GrahamcOfBorg
Copy link

Failure on aarch64-linux (full log)

Attempted: tests.containers-imperative

Partial log (click to expand)

machine: exit status 1
machine: output:
error: command `nixos-container create foo --ensure-unique-name' did not succeed (exit code 1)
command `nixos-container create foo --ensure-unique-name' did not succeed (exit code 1)
cleaning up
killing machine (pid 627)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
builder for '/nix/store/ww1ca1y5xhmlqx6j3x8k3h3bacyg1pc5-vm-test-run-containers-imperative.drv' failed with exit code 255
�[31;1merror:�[0m build of '/nix/store/ww1ca1y5xhmlqx6j3x8k3h3bacyg1pc5-vm-test-run-containers-imperative.drv' failed

@fpletz fpletz added this to the 18.03 milestone Mar 12, 2018
Copy link
Member

@fpletz fpletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NixOS tests runs and code looks good. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nixos-container destroy terminates before container is destroyed
7 participants