Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] nixos/containers: add unprivileged option #67336

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

uvNikita
Copy link
Contributor

Motivation for this change

Depends on #67332. Fixes #57087.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @mmahut @danbst @Mic92 @fpletz @arianvp

@uvNikita
Copy link
Contributor Author

Here are some errors we should address before merging this. Any input is welcomed :)

nscd.service inside unprivileged container fails with:

nscd.service: Failed to set up mount namespacing; Operation not permitted
nscd.service: Failed at step NAMESPACE spawning

Starting/reloading container results in the the following log output:

machine# [   24.515687] container webserver[809]: running activation script...
machine# [   24.661600] container webserver[809]: setting up /etc...
machine# [   24.858055] container webserver[809]: install: cannot change permissions of '/nix/var/nix/temproots': No such file or directory
machine# [   24.861177] container webserver[809]: install: cannot change permissions of '/nix/var/nix/userpool': No such file or directory
machine# [   24.863719] container webserver[809]: install: cannot create directory '/nix/var/log': Permission denied
machine# [   24.871737] container webserver[809]: Activation script snippet 'nix' failed (1)
machine# [   24.888967] container webserver[809]: mount: /dev: permission denied.
machine# [   24.905158] container webserver[809]: mount: /dev/pts: permission denied.
machine# [   24.925329] container webserver[809]: mount: /dev/shm: permission denied.
machine# [   24.951388] container webserver[809]: mount: /run: permission denied.
machine# [   24.988424] container webserver[809]: Activation script snippet 'specialfs' failed (32)

Those errors do not seem critical since I've been successfully running and reloading unprivileged containers for more than half a year now. My understanding is that we should just skip the mount step when running inside a container.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-container-limitations/1835/7

@arianvp
Copy link
Member

arianvp commented Sep 20, 2019

Would it help to disable DynamicUser on nscd? We recently added that and we could revert that again. Why this is not working is above me though. Franz and I will ask around with the systemd guys this weekend as we're meeting up with them.

@uvNikita
Copy link
Contributor Author

@arianvp Yes, I guess reverting DynamicUser will help. However, we have this option set in other services too, so we will probably have to wait for the fix in upstream.

@fpletz
Copy link
Member

fpletz commented Sep 25, 2019

Just as a reminder: If we can't make this work for 20.03 we have to fix the documentation from #67232.

@uvNikita
Copy link
Contributor Author

I disabled special-fs mounts inside nixos containers which fixes mount errors. Let me know if you are aware of cases when it might break things.

@uvNikita
Copy link
Contributor Author

Also, creating $root/nix/var/nix folder fixes nix errors on container startup, but reloading still fails with:

machine# [  138.248116] systemd[1]: Reloading Container 'webserver'.
machine# [  138.437982] container webserver[1292]: mkdir: cannot create directory ‘/nix/var/nix/profiles/per-user’: Permission denied
machine# [  138.451486] container webserver[1292]: stat: cannot stat '/nix/var/nix/profiles/per-user/root': No such file or directory
machine# [  138.477357] container webserver[1292]: WARNING: the per-user profile dir /nix/var/nix/profiles/per-user/root should belong to user id 0
machine# [  138.488463] container webserver[1292]: mkdir: cannot create directory ‘/nix/var/nix/gcroots/per-user’: Permission denied
machine# [  138.505600] container webserver[1292]: stat: cannot stat '/nix/var/nix/gcroots/per-user/root': No such file or directory
machine# [  138.531320] container webserver[1292]: WARNING: the per-user gcroots dir /nix/var/nix/gcroots/per-user/root should belong to user id 0
machine# [  138.578720] container webserver[1292]: mkdir: cannot create directory ‘/nix/var/nix/profiles/per-user’: Permission denied
machine# [  138.590044] container webserver[1292]: stat: cannot stat '/nix/var/nix/profiles/per-user/root': No such file or directory
machine# [  138.613625] container webserver[1292]: WARNING: the per-user profile dir /nix/var/nix/profiles/per-user/root should belong to user id 0
machine# [  138.622335] container webserver[1292]: mkdir: cannot create directory ‘/nix/var/nix/gcroots/per-user’: Permission denied
machine# [  138.634071] container webserver[1292]: stat: cannot stat '/nix/var/nix/gcroots/per-user/root': No such file or directory
machine# [  138.657775] container webserver[1292]: WARNING: the per-user gcroots dir /nix/var/nix/gcroots/per-user/root should belong to user id 0
machine# [  139.298607] container webserver[1292]: activating the configuration...
machine# [  139.468912] container webserver[1292]: setting up /etc...
machine# [  139.632658] container webserver[1292]: install: cannot change permissions of ‘/nix/var/nix/gcroots/per-user’: No such file or directory
machine# [  139.634923] container webserver[1292]: install: cannot change permissions of ‘/nix/var/nix/profiles/per-user’: No such file or directory
machine# [  139.637108] container webserver[1292]: install: cannot change permissions of ‘/nix/var/nix/gcroots/tmp’: No such file or directory
machine# [  139.639835] container webserver[1292]: Activation script snippet 'nix' failed (1)
machine# [  140.230131] container webserver[1292]: ln: failed to create symbolic link '/nix/var/nix/gcroots/current-system': Permission denied
machine# [  140.693565] container webserver[1292]: setting up tmpfiles
machine# [  140.854959] systemd[1]: container@webserver.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
machine# [  140.863708] systemd[1]: Reload failed for Container 'webserver'.

This is because /nix/var/nix/{gcroots,profiles} are not owned by the container's root user. I guess we will have to either chown it, or not mount at all if container is running in unprivileged mode.

@disassembler disassembler modified the milestones: 20.03, 20.09 Feb 5, 2020
@uvNikita
Copy link
Contributor Author

Seems like the blocking issue has been fixed: (systemd/systemd#13622), so as long as we will get the new systemd version we can continue to work on this :)

@stale
Copy link

stale bot commented Aug 21, 2020

Hello, I'm a bot and I thank you in the name of the community for your contributions.

Nixpkgs is a busy repository, and unfortunately sometimes PRs get left behind for too long. Nevertheless, we'd like to help committers reach the PRs that are still important. This PR has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

If this is still important to you and you'd like to remove the stale label, we ask that you leave a comment. Your comment can be as simple as "still important to me". But there's a bit more you can do:

If you received an approval by an unprivileged maintainer and you are just waiting for a merge, you can @ mention someone with merge permissions and ask them to help. You might be able to find someone relevant by using Git blame on the relevant files, or via GitHub's web interface. You can see if someone's a member of the nixpkgs-committers team, by hovering with the mouse over their username on the web interface, or by searching them directly on the list.

If your PR wasn't reviewed at all, it might help to find someone who's perhaps a user of the package or module you are changing, or alternatively, ask once more for a review by the maintainer of the package/module this is about. If you don't know any, you can use Git blame on the relevant files, or GitHub's web interface to find someone who touched the relevant files in the past.

If your PR has had reviews and nevertheless got stale, make sure you've responded to all of the reviewer's requests / questions. Usually when PR authors show responsibility and dedication, reviewers (privileged or not) show dedication as well. If you've pushed a change, it's possible the reviewer wasn't notified about your push via email, so you can always officially request them for a review, or just @ mention them and say you've addressed their comments.

Lastly, you can always ask for help at our Discourse Forum, or more specifically, at this thread or at #nixos' IRC channel.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 21, 2020
@davidak
Copy link
Member

davidak commented Aug 22, 2020

The coresponding PR was merged in january, so we probably have the right systemd version now?
@uvNikita have you looked into it again?

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 22, 2020
@uvNikita
Copy link
Contributor Author

@davidak yes, I think we will have the right systemd version in 20.09. However, #67332 was reverted, so we need to fix that first now. Ideally, we would also refactor the whole nixos-containers module to use .nspawn files which simplify things a lot.

@ryantm ryantm marked this pull request as draft October 23, 2020 03:05
@FRidh FRidh modified the milestones: 20.09, 21.03 Dec 20, 2020
@aanderse
Copy link
Member

@uvNikita any news on this?

@uvNikita
Copy link
Contributor Author

@aanderse I think the best path would be to implement containers module v2.0 (see #69414) where we would use systemd-netowrkd and nspawn files which would reduce amount of scripts and workarounds necessary.

Adding unprivileged and ephemeral options support there should be a trivial task I think.

In fact, this exactly the way I'm currently using unprivileged, ephemeral containers -- a custom stripped-down nixos containers module similar to the one developed in #69414.

@aanderse
Copy link
Member

@uvNikita great. I'm looking forward to it. Thanks for the reply!

@stale
Copy link

stale bot commented Sep 10, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 10, 2021
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Jan 4, 2022
Now we're doing it correct user-namespacing here as well, for that a few
filesystem-fixes had to be applied.

For more context, please refer to NixOS#67336
Also credits go to the author of the aforementioned PR, I basically
pulled these changes into this branch.
m1cr0man pushed a commit to m1cr0man/nixpkgs that referenced this pull request Dec 6, 2022
Sometimes it's needed to build a configuration within a `nix-build` for
systemd units. While this is fairly easy for .service-units (where you
can easily define overrides), it's not possible for `systemd-nspawn(1)`.

This is mostly a hack to get dedicated bind-mounts of store paths from
`pkgs.closureInfo` into the configuration without IFD.

In the long term we either want to fix this in systemd or find a more
suited solution though.

nixos/containers-next: initialize first draft for new NixOS containers w/networkd

This is the first batch of changes for a new container-module replacing
the current `nixos-container`-subsystem in the longterm.

The state in here is still strongly inspired by the
`containers`[1]-module to declare declarative nspawn-instances by using
NixOS config for the host and the container itself.

For now, this module uses the tentative namespace `nixos.containers',
but that's subject to change.

This new module will also contain the following key-differences:

* Rather than writing a big abstraction-layer on top, we'll rely on
  `.nspawn`-units[2]. This has the benefits that (1) we can stop adding
  options for each new nspawn-feature (such as MACVLANs, ephemeral
  instances, etc.) because it can be directly written into the
  `.nspawn`-unit using the module system like

      systemd.nspawn.foo.filesConfig = {
        BindReadOnly = /* ... */
      };

  Also, administrators don't need to learn too much about our
  abstractions, they only need to know a few basics about the
  module-system and how to write systemd units.

* This feature strictly enforces `systemd-networkd` on both the
  container & the host. It can be turned off for containers in the
  host-namespace without a private network though.

  The reason for this is that the current `nixos-container`
  implementation has the long-standing bug that the container's uplink
  is broken *until* the container has booted since the host-side of the
  veth-pair is configured in `ExecStartPost=`[3]. This is, because
  there's no proper way to take care of it in an earlier stage since
  `systemd-nspawn` creates the interface itself.

  This has e.g. the implication that services inside the container
  wrongly assume that they connect to e.g. an external database via
  network (since `network{,-online}.target` was reached), however this
  is not the case due to the unconfigured host-side veth interface.

  However, when using `systemd-networkd(8)` on both sides, this is not
  the case anymore since systemd will automatially take care of
  configuring the network correctly when an nspawn unit starts and
  `networkd` is active.

Apart from a basic draft, this also contains support for RFC1918
IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via
SLAAC and `radvd(8)` including support for ephemeral containers.

Further additions such as a better config-activation mechanism
and a tool to manage containers imperatively will follow.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-containers
[2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#
[3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240

nixos/containers-next: initialize first draft for new NixOS containers w/networkd

This is the first batch of changes for a new container-module replacing
the current `nixos-container`-subsystem in the longterm.

The state in here is still strongly inspired by the
`containers`[1]-module to declare declarative nspawn-instances by using
NixOS config for the host and the container itself.

For now, this module uses the tentative namespace `nixos.containers',
but that's subject to change.

This new module will also contain the following key-differences:

* Rather than writing a big abstraction-layer on top, we'll rely on
  `.nspawn`-units[2]. This has the benefits that (1) we can stop adding
  options for each new nspawn-feature (such as MACVLANs, ephemeral
  instances, etc.) because it can be directly written into the
  `.nspawn`-unit using the module system like

      systemd.nspawn.foo.filesConfig = {
        BindReadOnly = /* ... */
      };

  Also, administrators don't need to learn too much about our
  abstractions, they only need to know a few basics about the
  module-system and how to write systemd units.

* This feature strictly enforces `systemd-networkd` on both the
  container & the host. It can be turned off for containers in the
  host-namespace without a private network though.

  The reason for this is that the current `nixos-container`
  implementation has the long-standing bug that the container's uplink
  is broken *until* the container has booted since the host-side of the
  veth-pair is configured in `ExecStartPost=`[3]. This is, because
  there's no proper way to take care of it in an earlier stage since
  `systemd-nspawn` creates the interface itself.

  This has e.g. the implication that services inside the container
  wrongly assume that they connect to e.g. an external database via
  network (since `network{,-online}.target` was reached), however this
  is not the case due to the unconfigured host-side veth interface.

  However, when using `systemd-networkd(8)` on both sides, this is not
  the case anymore since systemd will automatially take care of
  configuring the network correctly when an nspawn unit starts and
  `networkd` is active.

Apart from a basic draft, this also contains support for RFC1918
IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via
SLAAC and `radvd(8)` including support for ephemeral containers.

Further additions such as a better config-activation mechanism
and a tool to manage containers imperatively will follow.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-containers
[2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#
[3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240

nixos/containers-next: implement small wrapper for nspawn port-forwards

This exposes a given `containerPort` to the host address. So if port 80
from the container is forwarded to the host's port 8080 and the
container uses `2001:DB8::42` and the host-side uses `2001:DB8::23` on
the veth-interface, then `[2001:DB::42]:80` will be available on the
host as `[2001:DB8::2]:8080`.

nixos/containers-next: implement more advanced networking tests

This change tests various combinations of static & dynamic addressing
and also fixes a bug where `radvd(8)` was errorneously configured for
veth-pairs where it's actually not needed.

This test is also supposed to show how to use `systemd`-configuration to
implement most of the features (for instance there's no custom set of
options to implement MACVLANs) and serves as regression-test for future
`systemd`-updates in NixOS.

Please note that the `ndppd`-hack is only here because QEMU doesn't do
proper IPv6 neighbour resolution. In fact, I left comments whenever some
workarounds were needed for the testing-facility.

nixos/tests/container-migration: init

This test is supposed to demonstrate how to migrate a single container
to the new subsystem. Of course, docs on how to rewrite config isn't
written yet, this is mainly a POC to show that it's generally possible
by

* Deploying a new configuration (using `nixos.containers`) being
  equivalent to the old one.
* Moving the state from `/var/lib/containers` to `/var/lib/machines`.
* Rebooting the host - unfortunately - because otherwise
  `systemd-networkd` will reach an inconsistent state - at least with
  v247.

For the reboot-part I also had to change the QEMU vm-builder a bit to
actually support persistent boot-disks.

nixos/containers-next: allow static configuration for a virtual zone as well

This is already the case for dynamically assigned addresses (e.g. via
SLAAC or DHCPv4) where `0.0.0.0/24` and `::/64` provides a pool of
private IPs. However if such a zone is supposed to be fully static, the
same should be possible as well.

nixos/switch-to-configuration: import old config activation changes

This is basically what I tried in NixOS#84608 at first - being able to reload
or restart a container based on the NixOS-specific
`re{load,start}IfChanged` options for systemd units, but with a few
differences:

* I switched back to using `nsenter(1)` from util-linux for the same
  rationale as in ebb6e38: without
  this, the activation would hang until a timeout is exceeded if the
  service-manager inside the container is reloaded.

* I also disabled `systemd-networkd-wait-online.service` inside the
  container because it'd also hang even if the interfaces are configured
  properly. We should investigate how to fix it / if it was already
  fixed at some point.

Also implemented a small test to ensure that a config-activation works
fine, even with networking.

nixos/containers-next: fix broken machinectl reboot and probably more

It seems as systemd ignores `systemd-nspawn@` (the template unit) if an
override exists and a custom unit for the service (i.e.
`systemd-nspawn@containername.service`):

    [root@server:~]# systemctl status systemd-nspawn@ldap
    ● systemd-nspawn@ldap.service
         Loaded: loaded (/nix/store/rm4kigdbzl78iai8jfbgxbslvalk8bwa-unit-systemd-nspawn-ldap.service/systemd-nspawn@ldap.service; linked; vendor preset: enabled)
        Drop-In: /nix/store/fr9zabpvp3077cbb6jnpxm42qxqw9yk2-system-units/systemd-nspawn@.service.d
                 └─overrides.conf
         Active: active (running) since Tue 2021-03-16 15:01:32 UTC; 23min ago

This breaks at least `machinectl reboot` which needs
`RestartForceExitStatus = 133` as setting. For now, I've added all
settings to the module itself.

nixos/switch-to-configuration: Implement more generic decisions for config activations in containers

Actually, using `re{load,start}IfChanged` isn't the best decision for
containers because some containers have to be reloaded or restarted
depending on what has changed. For instance, a new bind-mount requires a
`machinectl reboot`, but a change in the NixOS config only needs a
`systemctl reload` (which runs `switch-to-configuration` inside the
container).

To model this, I decided to add four keywords and an option
`activation.strategy` to declarative containers:

* `strategy = "none"` means that the container will be entirely ignored
  by `switch-to-configuration`.

* `strategy = "restart"` will always `machinectl reboot` the container
  if a change was detected.

* `strategy = "reload"` will always `systemctl reload` the container if
  a change was detected.

* `strategy = "dynamic"` will check what has changed inside the
  container. If only the NixOS config inside the container has changed,
  a reload will be scheduled, otherwise a restart.

Always did a nearly full rewrite of the activation test to cover several
corner-cases and combination of such settings.

nixos/containers-next: add read-only `nixos.containers.rendered` option

This option is an attr-set that maps containers to their NixOS
configuration since `nixos.containers.instances` directly transforms the
config to a NixOS derivation. Also, the raw `nixos.containers.instances`
isn't really usable since it usually contains a list of chunks that are
evaluated by the module-system.

This is actually useful to introspect the configuration just as it's
done with e.g. `resources.machines`[1] in nixops. For instance, I'm
configuring my Prometheus scraping targets like this by gathering all active
exporters in my machines and their containers:

    { config, lib, ... }: with lib;
    let
      containers = flip mapAttrsToList machine.nixos.containers.rendered (const (x: x.config));
    in
      flip concatMap (attrValues containers)
        (c: flip concatMap (attrValues c.services.prometheus.exporters)
          (exporter:
            (optional exporter.enable "${config.networking.fqdn}:${toString exporter.port}")))

[1] https://nixos.mayflower.consulting/blog/2018/10/26/nixops-machine-configs/

nixos/all-tests: register tests

Also add a `jobset.nix` to test this on my self-hosted Hydra (which btw
uses this feature already :p).

nixos/containers-next: make sure that the module works fine with `restrictedEval` being active

This is necessary to get it running on my Hydra.

nixos/containers-next: add test for SSH inside a nspawn machine

Just another small testcase to confirm that the container's network
works fine.

nixos/containers-next: enable private users by default

nixos/systemd-nspawn: make `/etc/systemd/nspawn` mutable

Now only `/etc/systemd/nspawn/<name>.nspawn` will be a symlink rather
than having the full directory as a symlink. This is actually consistent
with `networkd` (both don't have alternate locations for transient units)
and will become necessary when implementing imperative containers since
these should also use nspawn units.

nixos/containers-next: fix eval after 21.05 breaking changes

`stdenv.lib` and `pkgs.utillinux` are deprecated now and cause an
error when disallowing aliases (which is the default when evaluating
nixpkgs).

nixos-nspawn: init

This is a first draft for imperative containers - basically a
replacement for `nixos-container` - based on Python. It's still missing
a few features, but is actually a working POC with the following
key-differences:

* Rather than Perl, Python is used now. While the choice of a language
  is always debatable, I'm pretty convinced that Python is easier to
  access than Perl and a lot more people are willing to write Python
  code (that's for instance the reason why the test-driver was
  eventually ported to Python).

* Similar to `extra-container`[1], this also contains way more features
  than the stock `nixos-container` implementation. This is because we
  basically provide all options from `nixos.containers` and evaluate
  them after that. The additional configs (such as
  `activation`/`network`/etc) are rendered into JSON and can be read by
  the script to imperatively create `.nspawn` & `.network` units.

[1] https://github.com/erikarvstedt/extra-container

nixos/containers-next: implement proper user-namespacing support

Now we're doing it correct user-namespacing here as well, for that a few
filesystem-fixes had to be applied.

For more context, please refer to NixOS#67336
Also credits go to the author of the aforementioned PR, I basically
pulled these changes into this branch.

nixos/containers-next: add support for `LoadCredential=`

With user-namespacing set to `pick`[1], bind-mounts will always be owned
by `nouser:nogroup`. This is a problem for secrets since these shouldn't
be world-readable and with a `nouser:nogroup` from another
user-namespace (the `root` inside container isn't an actual `root`
anymore) the secrets would be unreadable.

To work around this, `LoadCredential=` can be used. In fact, using
`--load-credential` - unfortunately there's no switch for
`.nspawn`-units - passes a secret into a container where it can be
re-used by using the host's credential-ID as `path` in a `.service`-file
inside the container.

So basically

    {
      nixos.containers.instances.foo.credentials = [
        { id = "foo"; path = "/run/secrets/foo";}
      ];
    }

makes the secret available as `/run/host/credentials/foo` and by
specifying

    LoadCredential=foo:foo

in `example.service`, the credential will be readable by the `ExecStart=`
inside `example.service` from `/run/credentials/example.service/foo`.

[1] https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#--private-users=

nixos/containers-next-imperative: init

sudo-nspawn: init

This is a slightly modified sudo enabling `--enable-static-sudoers`
which ensures that `sudoers.so` is linked statically into the
executable[1]:

>  --enable-static-sudoers
>        By default, the sudoers plugin is built and installed as a
>        dynamic shared object.  When the --enable-static-sudoers
>        option is specified, the sudoers plugin is compiled directly
>        into the sudo binary.  Unlike --disable-shared, this does
>        not prevent other plugins from being used and the intercept
>        and noexec options will continue to function.

This is necessary here because of user-namespaced `nspawn`-instances:
these have their own UID/GID-range. If a container called `ldap` has
`PrivateUsers=pick` enabled, this may look like this:

    $ ls /var/lib/machines
    drwxr-xr-x 15 vu-ldap-0  vg-ldap-0  15 Mar 11  2021 ldap
    -rw-------  1 root       root        0 Sep 12 16:13 .#ldap.lck
    $ id vu-ldap-0
    uid=1758003200(vu-ldap-0) gid=65534(nogroup) groups=65534(nogroup)

However, this means that bind-mounts (such as `/nix/store`) will be
owned by `nobody:nogroup` which is a problem for `sudo(8)` which expects
`sudoers.so` being owned by `root`.

To work around this, the aforementioned configure-flag will be used to
ensure that this library is statically linked into `bin/sudo` itself. We
cannot do a full static build though since `sudo(8)` still needs to
`dlopen(3)` various other libraries to function properly with PAM.

[1] https://www.sudo.ws/install.html

nixos/switch-to-configuration: fix a few problems with nspawn instances

Config activation of declarative containers used to be error-prone in
some cases:

* If a machine was powered off and had its config changed, the
  activation broke like this:

      systemd-nspawn@ldap.service is not active, cannot reload.

  The easiest workaround is to just skip inactive containers. The
  host-side configuration - i.e. the `nspawn`-unit and (optionally) the
  network configuration - is still activated and will be used on the
  next start.

* Sometimes, `systemd-nspawn@`-instances are marked to be started by the
  diffing-code. This should not happen since `systemd-nspawn@`-instances
  are now treated specially which means that these will only be started
  if they're newly added.

* If both `dbus.service` and an arbitrary container will be reloaded in
  the same transaction (i.e. in the same `systemctl reload`-call) this
  will freeze the system making it unreachable even via `ssh(1)` for
  about two minutes and leaving the following errors in the log:

      Sep 11 21:32:16 roflmayr systemd[1]: Reloading D-Bus System Message Bus.
      Sep 11 21:32:41 roflmayr dbus-send[1868379]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
      Sep 11 21:32:41 roflmayr systemd[1]: dbus.service: Control process exited, code=exited, status=1/FAILURE
      Sep 11 21:32:41 roflmayr systemd[1]: Reload failed for D-Bus System Message Bus.

  While I'm not entirely sure what's going on here, I realized that this
  issue disappears if all services that are scheduled for reload are
  processed before the containers. I guess that this avoids host-side
  system-services interfering with a container's system-manager.

nixos-nspawn: misc improvements & cleanups

This enhances the test-coverage of the script significantly and also
adds fixes for a few existing problems such as

* missing call-traces
* a spurious error when invoking the command without arguments

and cleans the code up a bit.

nixos/containers-next: move to subdir and factor out defaults for containers

This was done because imperative & declarative containers have a common
base configuration that was duplicated before, so moving it into a file
used by both facilities is better here.

To avoid cluttering the `virtualisation/`-subtree of NixOS too much, I
decided to create a new subdir for this.

nixos-nspawn: implement activation & networking

However only in a simplified manner - my main intention was to write a
replacement for the `containers`-module and this was just a side-effect,
so further features should be implemented by the community.

Basically, `nixos-nspawn` update now activates the config on its own,
but without support for `strategy = "dynamic";` to avoid having to
duplicate the Perl implementation here. Instead, either
`reload`/`restart`/`none` is the default and can be overridden with
`nixos-nspawn --reload` / `nixos-nspawn --restart`. Since this is a
completely manual change anyways, this is IMHO good-enough for now. The
same applies to `nixos-nspawn rollback`.

Also, the rendered `.network`-units now support addresses just like
declarative containers do with the exception of IPv6 SLAAC because I'd
have to imperatively change `radvd` for this which is out of scope[1].

Finally, the test was enhanced to cover more cases related to the new
features.

[1] Actually, this would introduce too much impurity anyways. Instead,
    `networkd` should implement IPv6  SLAAC for nspawn on its own so we
    can remove `radvd` and properly implement this here.

nixos/activation-scripts: turn off `var`-script for containers

It's already taken care of and only causes `permission denied`-errors
that make config activations seem failed even though they aren't.

Revert "nixos/activation-scripts: turn off `var`-script for containers"

This reverts commit 6f281b9ad31cf6d9ef396de788d06ea4e35f8112.

This is actually not a good idea since the `var`-activation-script is
actually the component that ensures that `/var/empty` exists which is
`$HOME` for quite a number of services.

nixos/containers-next: only create OS structure in `/var/lib/machines` if it doesn't exist

Because after that, this can screw with permissions if the container is
using a private user-namespace. This actually solves the activation
issues and the `var`-script can still be used in here.

nixos/tests/containers-next: add testcase for custom `ResolvConf`-setting

nixos/container-migration-test: confirm that nixos-container is still usable after switching to the new API

nixos/containers-next: assert that networkd is used

nixos/tests/containers-next-imperative: ensure that imperative containers can be powered off without state issues

nixos/tests/container-migration: fix eval

nixos/containers-next: fix eval

nixos/qemu-vm: increase /boot to 120M

Otherwise test-cases that install several NixOS generations into `/boot`
will fail with `No space left on device`.

nixos/container-migration: actually move state of containers

nixos/containers-next: fix test

nixos/containers-next: s/literalExample/literalExpression/g

nixos/useHostResolvConf: deprecate option

nixos/containers-next-imperative: fix test

* Don't use underscores in hostnames, this appears to break
  systemd-resolved now.
* Minor fixes for the test.

nixos/containers-next: fix `systemd-networkd-wait-online.service` hanging indefinetely

See NixOS#140669 (comment)
for further context.

Co-authored-by: Franz Pletz <fpletz@fnordicwalking.de>
Co-authored-by: zseri <zseri.devel@ytrizja.de>

nixos/containers-next: config -> system-config

nixos/containers-next: confirm that exposed hostnames also work for services like nginx

nixos/containers-next: review fixes

* Fix naming of migration test.
* Explain why `persistentBookDisk` is needed.
* Document that `jobset.nix` is only temporary and should be removed
  before merging.
* Remove superfluous `touch $out`.

sudo-nspawn: merge with `pkgs.sudo`

The feature can now be activated via `withStaticSudoers`. Also, the
patches aren't needed anymore since these are part of the current
`sudo`-release that's also in `nixpkgs`.

nixos-nspawn: refactor python setup

* Simplify shebangs
* Fix `python3`-inclusion on `nix-shell`-shebang
* Don't `flake8` the code on build.

Co-authored-by: Sandro <sandro.jaeckel@gmail.com>

nixos/qemu-vm: fix manual evaluation

containers-next: Support independent use of container-options.nix

containers-next: Add bindMounts option

containers-next: Dont shut down imperative containers during rebuild
@Artturin Artturin modified the milestones: 21.05, 23.05 Dec 31, 2022
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Dec 31, 2022
Princemachiavelli pushed a commit to Princemachiavelli/nixpkgs that referenced this pull request May 10, 2023
Now we're doing it correct user-namespacing here as well, for that a few
filesystem-fixes had to be applied.

For more context, please refer to NixOS#67336
Also credits go to the author of the aforementioned PR, I basically
pulled these changes into this branch.
@RaitoBezarius RaitoBezarius modified the milestones: 23.05, 23.11 May 31, 2023
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

containers: reload fails with user namespace enabled