Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix imperative containers #47917

Merged
merged 3 commits into from Oct 8, 2018
Merged

Conversation

arianvp
Copy link
Member

@arianvp arianvp commented Oct 5, 2018

Motivation for this change

When logging into a container by using
nixos-container root-login
all nix-related commands in the container would fail, as they
tried to modify the nix db and nix store, which are mounted
read-only in the container. We want nixos-container to not
try to modify the nix store at all, but instead delegate
any build commands to the nix daemon of the host operating system.

This already works for non-root users inside a nixos-container,
as it doesn't 'own' the nix-store, and thus defaults
to talking to the daemon socket at /nix/var/nix/daemon-socket/,
which is bind-mounted to the host daemon-socket, causing all nix
commands to be delegated to the host.

However, when we are the root user inside the container, we have the
same uid as the nix store owner, eventhough it's not actually
the same root user (due to user namespaces). Nix gets confused,
and is convinced it's running in single-user mode, and tries
to modify the nix store directly instead.

By setting NIX_REMOTE=daemon in /etc/profile, we force nix
to operate in multi-user mode, so that it will talk to the host
daemon instead, which will modify the nix store for the container.

This fixes #40355

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

@arianvp
Copy link
Member Author

arianvp commented Oct 5, 2018

cc @flokli

Also it would be great to have this backported to 18.09

When logging into a container by using
  nixos-container root-login
all nix-related commands in the container would fail, as they
tried to modify the nix db and nix store, which are mounted
read-only in the container.  We want nixos-container to not
try to modify the nix store at all, but instead delegate
any build commands to the nix daemon of the host operating system.

This already works for non-root users inside a nixos-container,
as it doesn't 'own' the nix-store, and thus defaults
to talking to the daemon socket at /nix/var/nix/daemon-socket/,
which is bind-mounted to the host daemon-socket, causing all nix
commands to be delegated to the host.

However, when we are the root user inside the container, we have the
same uid as the nix store owner, eventhough it's not actually
the same root user (due to user namespaces). Nix gets confused,
and is convinced it's running in single-user mode, and tries
to modify the nix store directly instead.

By setting `NIX_REMOTE=daemon` in `/etc/profile`, we force nix
to operate in multi-user mode, so that it will talk to the host
daemon instead, which will modify the nix store for the container.

This fixes NixOS#40355
…rs"""

nixos-container can now execute nix commands again inside the container

This reverts commit 9622cd3.
@samueldr
Copy link
Member

samueldr commented Oct 5, 2018

👍 for the backport; this is something that could sting some users.

@flokli
Copy link
Contributor

flokli commented Oct 7, 2018

@arianvp, @samueldr verified this works. It's a bit tricky, as nixos-container wanted to use nixpkgs from my current channel, not from the currently generation I switched to, but I simply abused a nixos test vm to verify manually the steps outlined in #40355 (comment) now work.

LGTM, should be merged and backported to 18.09 as well.

@samueldr
Copy link
Member

samueldr commented Oct 8, 2018

Hi!

I am verifying this PR, and it looks like the regression test could be failing to test the regression?

git revert 3624bb536244ea99f9f9a6d18ff00bbe4a5204af
time nix-build nixos/tests/containers-imperative.nix

And part of the log:

- machine: must succeed: nix-instantiate -E 'derivation { name = "empty"; builder = "false"; system = "false"; }'
    machine# [31;1mwarning:[0m you did not specify '--add-root'; the result might be removed by the garbage collector
    machine: exit status 0

image

@arianvp
Copy link
Member Author

arianvp commented Oct 8, 2018

Odd.. works for me now too. Investigating...

@arianvp
Copy link
Member Author

arianvp commented Oct 8, 2018

Okay @samueldr I fixed it.. --amend'ed the last commit with the fix, so'll you'll have to force-pull.

I ran the regression test in the vm instead of the container inside the vm. The regression test should now properly trigger a failure when you revert 3624bb536244ea99f9f9a6d18ff00bbe4a5204af

@arianvp
Copy link
Member Author

arianvp commented Oct 8, 2018

P.s. how did you get such pretty log outputs? Mine are a lot more ... verbose

@flokli
Copy link
Contributor

flokli commented Oct 8, 2018 via email

@samueldr
Copy link
Member

samueldr commented Oct 8, 2018

@GrahamcOfBorg test containers-imperative

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: tests.containers-imperative

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 52.81s
cleaning up
killing machine (pid 631)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/d243gwdmx9sqdds1gxx9w52q8fm1511j-vm-test-run-containers-imperative

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: tests.containers-imperative

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 528.76s
cleaning up
killing machine (pid 600)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/nhz9ngn8ipfk351ad68ag7wh94darikp-vm-test-run-containers-imperative

@samueldr samueldr merged commit 7fb4527 into NixOS:master Oct 8, 2018
@samueldr
Copy link
Member

samueldr commented Oct 8, 2018

Backported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Imperative containers broken on 18.03/nixos-unstable
4 participants