Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/systemd-nspawn: reload or restart machines on config change #84608

Closed
wants to merge 1 commit into from

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Apr 7, 2020

Motivation for this change

I started working on a draft for improved nixos-containers (using
networkd and .nspawn units) after the networkd hackathon[1] which
isn't published yet.

Quite recently I realized that when changing a .nspawn-unit, the
switch-to-configuration.pl doesn't activate those changes. This patch
takes care of it with the following changes:

  • It's possible to declare whether to restart or reload such a unit. The
    restart option is the default. In that case the
    systemd-nspawn@<machine-name>.service[2]-unit will be restarted or reloaded.

  • By default, all .nspawn-units are part of the machines.target.

  • A VM-test covers all those cases including a custom reload-script to
    activate a new configuration in the machine.

  • I had to remove the --keep-unit flag on startup to fix the restart
    of the unit. This is a known issue[3].

It's also possible to use a reload to activate a new configuration
inside a nspawn-machine with a config like this:

{ pkgs, ... }: {
  systemd.nspawn.test-container.reloadOnChange = true;
  systemd.nspawn.test-container.restartOnChange = false;
  systemd.services."systemd-nspawn@test-container".serviceConfig.ExecReload = "${pkgs.writeScriptBin "activate" ''
    #! ${pkgs.runtimeShell} -xe
    systemd-run --machine test-container --pty --quiet -- /bin/sh --login -c \
      '${containerCfg}/bin/switch-to-configuration test'
  ''}/bin/activate";
}

[1] https://discourse.nixos.org/t/networkd-sprint-2019-11-23-24-in-munich/4578
[2] https://github.com/systemd/systemd/blob/v243/units/systemd-nspawn@.service.in
[3] #80169

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

I started working on a draft for improved nixos-containers (using
`networkd` and `.nspawn` units) after the networkd hackathon[1] which
isn't published yet.

Quite recently I realized that when changing a `.nspawn`-unit, the
`switch-to-configuration.pl` doesn't activate those changes. This patch
takes care of it with the following changes:

* It's possible to declare whether to restart or reload such a unit. The
  restart option is the default. In that case the
  `systemd-nspawn@<machine-name>.service`[2]-unit will be restarted or reloaded.

* By default, all `.nspawn`-units are part of the `machines.target`.

* A VM-test covers all those cases including a custom reload-script to
  activate a new configuration in the machine.

* I had to remove the `--keep-unit` flag on startup to fix the restart
  of the unit. This is a known issue[3].

It's also possible to use a reload to activate a new configuration
inside a nspawn-machine with a config like this:

``` nix
{ pkgs, ... }: {
  systemd.nspawn.test-container.reloadOnChange = true;
  systemd.nspawn.test-container.restartOnChange = false;
  systemd.services."systemd-nspawn@test-container".serviceConfig.ExecReload = "${pkgs.writeScriptBin "activate" ''
    #! ${pkgs.runtimeShell} -xe
    systemd-run --machine test-container --pty --quiet -- /bin/sh --login -c \
      '${containerCfg}/bin/switch-to-configuration test'
  ''}/bin/activate";
}
```

[1] https://discourse.nixos.org/t/networkd-sprint-2019-11-23-24-in-munich/4578
[2] https://github.com/systemd/systemd/blob/v243/units/systemd-nspawn@.service.in
[3] NixOS#80169
@@ -7,6 +7,7 @@
use Net::DBus;
use Sys::Syslog qw(:standard :macros);
use Cwd 'abs_path';
use experimental 'smartmatch';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's a good idea.

@@ -44,6 +44,23 @@ let

instanceOptions = {
options = sharedOptions // {
restartOnChange = mkOption {
default = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make this false by default as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actively flipped the default for this, as part of the network initialization unfortunately happens inside a shellscript outside the container, and doesn't get applied if we just apply the config inside the container.

nixos/modules/system/boot/systemd-nspawn.nix Show resolved Hide resolved
@@ -126,7 +153,7 @@ in {
systemd.services."systemd-nspawn@".serviceConfig.ExecStart = [
"" # deliberately empty. signals systemd to override the ExecStart
# Only difference between upstream is that we do not pass the -U flag
"${config.systemd.package}/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth --settings=override --machine=%i"
"${config.systemd.package}/bin/systemd-nspawn --quiet --boot --link-journal=try-guest --network-veth --settings=override --machine=%i"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing --keep-unit here shouldn't be necessary here anymore since v245 given you already have KillSignal = "SIGRTMIN+3"; . We really ought to test if these issues are fixed in 245 and otherwise update the upstream systemd ticket

Copy link
Contributor

@flokli flokli Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blocked on the systemd bump, which is blocked on some of the tests failing currently (more context in #nixos-systemd)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ma27 can you check this again now that systemd 245.3 has been merged to staging?

@arianvp
Copy link
Member

arianvp commented Apr 7, 2020

One other remark. I remember that when touch systemd.nspawn.<> options it wouldn't trigger a restart or reload of the service in the past. Is that's something we want to have? Using systemd.services.<name>.restartTriggers = environment.etc."systemd/nspawn".source ?

One problem with that I see now is that change of one nspawn container will trigger restart of other nspawn container.
Perhaps we want each nspawn file to be each own derivation to support this proper

@Ma27
Copy link
Member Author

Ma27 commented Apr 7, 2020

One problem with that I see now is that change of one nspawn container will trigger restart

Can you explain why? This doesn't restart everything from systemd-nspawn@.service, right?

Using systemd.services..restartTriggers = environment.etc."systemd/nspawn".source ?

But that wouldn't cover the reload, right?

This is blocked on the systemd bump, which is blocked on some of the tests failing currently (more context in #nixos-systemd)

That's fine, I'll change it accordingly as soon as the bump is out. I don't expect this to get merged too soon, I mainly wanted to get some feedback for now :)

@flokli
Copy link
Contributor

flokli commented Apr 7, 2020

That's fine, I'll change it accordingly as soon as the bump is out. I don't expect this to get merged too soon, I mainly wanted to get some feedback for now :)

Makes sense, thanks. Let's pair soon on IRC on picking up the systemd bump again :)

@Ma27
Copy link
Member Author

Ma27 commented Apr 25, 2020

Closing for now. In order to function properly, this needs way more work... as soon as I have time to, I'll publish a basic draft for improved nixos-containers where I'll provide a similar solution like the stuff I did here.

@Ma27 Ma27 closed this Apr 25, 2020
@Ma27 Ma27 deleted the nspawn-reload branch April 25, 2020 22:48
Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Jan 4, 2022
This is basically what I tried in NixOS#84608 at first - being able to reload
or restart a container based on the NixOS-specific
`re{load,start}IfChanged` options for systemd units, but with a few
differences:

* I switched back to using `nsenter(1)` from util-linux for the same
  rationale as in ebb6e38: without
  this, the activation would hang until a timeout is exceeded if the
  service-manager inside the container is reloaded.

* I also disabled `systemd-networkd-wait-online.service` inside the
  container because it'd also hang even if the interfaces are configured
  properly. We should investigate how to fix it / if it was already
  fixed at some point.

Also implemented a small test to ensure that a config-activation works
fine, even with networking.
m1cr0man pushed a commit to m1cr0man/nixpkgs that referenced this pull request Dec 6, 2022
Sometimes it's needed to build a configuration within a `nix-build` for
systemd units. While this is fairly easy for .service-units (where you
can easily define overrides), it's not possible for `systemd-nspawn(1)`.

This is mostly a hack to get dedicated bind-mounts of store paths from
`pkgs.closureInfo` into the configuration without IFD.

In the long term we either want to fix this in systemd or find a more
suited solution though.

nixos/containers-next: initialize first draft for new NixOS containers w/networkd

This is the first batch of changes for a new container-module replacing
the current `nixos-container`-subsystem in the longterm.

The state in here is still strongly inspired by the
`containers`[1]-module to declare declarative nspawn-instances by using
NixOS config for the host and the container itself.

For now, this module uses the tentative namespace `nixos.containers',
but that's subject to change.

This new module will also contain the following key-differences:

* Rather than writing a big abstraction-layer on top, we'll rely on
  `.nspawn`-units[2]. This has the benefits that (1) we can stop adding
  options for each new nspawn-feature (such as MACVLANs, ephemeral
  instances, etc.) because it can be directly written into the
  `.nspawn`-unit using the module system like

      systemd.nspawn.foo.filesConfig = {
        BindReadOnly = /* ... */
      };

  Also, administrators don't need to learn too much about our
  abstractions, they only need to know a few basics about the
  module-system and how to write systemd units.

* This feature strictly enforces `systemd-networkd` on both the
  container & the host. It can be turned off for containers in the
  host-namespace without a private network though.

  The reason for this is that the current `nixos-container`
  implementation has the long-standing bug that the container's uplink
  is broken *until* the container has booted since the host-side of the
  veth-pair is configured in `ExecStartPost=`[3]. This is, because
  there's no proper way to take care of it in an earlier stage since
  `systemd-nspawn` creates the interface itself.

  This has e.g. the implication that services inside the container
  wrongly assume that they connect to e.g. an external database via
  network (since `network{,-online}.target` was reached), however this
  is not the case due to the unconfigured host-side veth interface.

  However, when using `systemd-networkd(8)` on both sides, this is not
  the case anymore since systemd will automatially take care of
  configuring the network correctly when an nspawn unit starts and
  `networkd` is active.

Apart from a basic draft, this also contains support for RFC1918
IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via
SLAAC and `radvd(8)` including support for ephemeral containers.

Further additions such as a better config-activation mechanism
and a tool to manage containers imperatively will follow.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-containers
[2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#
[3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240

nixos/containers-next: initialize first draft for new NixOS containers w/networkd

This is the first batch of changes for a new container-module replacing
the current `nixos-container`-subsystem in the longterm.

The state in here is still strongly inspired by the
`containers`[1]-module to declare declarative nspawn-instances by using
NixOS config for the host and the container itself.

For now, this module uses the tentative namespace `nixos.containers',
but that's subject to change.

This new module will also contain the following key-differences:

* Rather than writing a big abstraction-layer on top, we'll rely on
  `.nspawn`-units[2]. This has the benefits that (1) we can stop adding
  options for each new nspawn-feature (such as MACVLANs, ephemeral
  instances, etc.) because it can be directly written into the
  `.nspawn`-unit using the module system like

      systemd.nspawn.foo.filesConfig = {
        BindReadOnly = /* ... */
      };

  Also, administrators don't need to learn too much about our
  abstractions, they only need to know a few basics about the
  module-system and how to write systemd units.

* This feature strictly enforces `systemd-networkd` on both the
  container & the host. It can be turned off for containers in the
  host-namespace without a private network though.

  The reason for this is that the current `nixos-container`
  implementation has the long-standing bug that the container's uplink
  is broken *until* the container has booted since the host-side of the
  veth-pair is configured in `ExecStartPost=`[3]. This is, because
  there's no proper way to take care of it in an earlier stage since
  `systemd-nspawn` creates the interface itself.

  This has e.g. the implication that services inside the container
  wrongly assume that they connect to e.g. an external database via
  network (since `network{,-online}.target` was reached), however this
  is not the case due to the unconfigured host-side veth interface.

  However, when using `systemd-networkd(8)` on both sides, this is not
  the case anymore since systemd will automatially take care of
  configuring the network correctly when an nspawn unit starts and
  `networkd` is active.

Apart from a basic draft, this also contains support for RFC1918
IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via
SLAAC and `radvd(8)` including support for ephemeral containers.

Further additions such as a better config-activation mechanism
and a tool to manage containers imperatively will follow.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-containers
[2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#
[3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240

nixos/containers-next: implement small wrapper for nspawn port-forwards

This exposes a given `containerPort` to the host address. So if port 80
from the container is forwarded to the host's port 8080 and the
container uses `2001:DB8::42` and the host-side uses `2001:DB8::23` on
the veth-interface, then `[2001:DB::42]:80` will be available on the
host as `[2001:DB8::2]:8080`.

nixos/containers-next: implement more advanced networking tests

This change tests various combinations of static & dynamic addressing
and also fixes a bug where `radvd(8)` was errorneously configured for
veth-pairs where it's actually not needed.

This test is also supposed to show how to use `systemd`-configuration to
implement most of the features (for instance there's no custom set of
options to implement MACVLANs) and serves as regression-test for future
`systemd`-updates in NixOS.

Please note that the `ndppd`-hack is only here because QEMU doesn't do
proper IPv6 neighbour resolution. In fact, I left comments whenever some
workarounds were needed for the testing-facility.

nixos/tests/container-migration: init

This test is supposed to demonstrate how to migrate a single container
to the new subsystem. Of course, docs on how to rewrite config isn't
written yet, this is mainly a POC to show that it's generally possible
by

* Deploying a new configuration (using `nixos.containers`) being
  equivalent to the old one.
* Moving the state from `/var/lib/containers` to `/var/lib/machines`.
* Rebooting the host - unfortunately - because otherwise
  `systemd-networkd` will reach an inconsistent state - at least with
  v247.

For the reboot-part I also had to change the QEMU vm-builder a bit to
actually support persistent boot-disks.

nixos/containers-next: allow static configuration for a virtual zone as well

This is already the case for dynamically assigned addresses (e.g. via
SLAAC or DHCPv4) where `0.0.0.0/24` and `::/64` provides a pool of
private IPs. However if such a zone is supposed to be fully static, the
same should be possible as well.

nixos/switch-to-configuration: import old config activation changes

This is basically what I tried in NixOS#84608 at first - being able to reload
or restart a container based on the NixOS-specific
`re{load,start}IfChanged` options for systemd units, but with a few
differences:

* I switched back to using `nsenter(1)` from util-linux for the same
  rationale as in ebb6e38: without
  this, the activation would hang until a timeout is exceeded if the
  service-manager inside the container is reloaded.

* I also disabled `systemd-networkd-wait-online.service` inside the
  container because it'd also hang even if the interfaces are configured
  properly. We should investigate how to fix it / if it was already
  fixed at some point.

Also implemented a small test to ensure that a config-activation works
fine, even with networking.

nixos/containers-next: fix broken machinectl reboot and probably more

It seems as systemd ignores `systemd-nspawn@` (the template unit) if an
override exists and a custom unit for the service (i.e.
`systemd-nspawn@containername.service`):

    [root@server:~]# systemctl status systemd-nspawn@ldap
    ● systemd-nspawn@ldap.service
         Loaded: loaded (/nix/store/rm4kigdbzl78iai8jfbgxbslvalk8bwa-unit-systemd-nspawn-ldap.service/systemd-nspawn@ldap.service; linked; vendor preset: enabled)
        Drop-In: /nix/store/fr9zabpvp3077cbb6jnpxm42qxqw9yk2-system-units/systemd-nspawn@.service.d
                 └─overrides.conf
         Active: active (running) since Tue 2021-03-16 15:01:32 UTC; 23min ago

This breaks at least `machinectl reboot` which needs
`RestartForceExitStatus = 133` as setting. For now, I've added all
settings to the module itself.

nixos/switch-to-configuration: Implement more generic decisions for config activations in containers

Actually, using `re{load,start}IfChanged` isn't the best decision for
containers because some containers have to be reloaded or restarted
depending on what has changed. For instance, a new bind-mount requires a
`machinectl reboot`, but a change in the NixOS config only needs a
`systemctl reload` (which runs `switch-to-configuration` inside the
container).

To model this, I decided to add four keywords and an option
`activation.strategy` to declarative containers:

* `strategy = "none"` means that the container will be entirely ignored
  by `switch-to-configuration`.

* `strategy = "restart"` will always `machinectl reboot` the container
  if a change was detected.

* `strategy = "reload"` will always `systemctl reload` the container if
  a change was detected.

* `strategy = "dynamic"` will check what has changed inside the
  container. If only the NixOS config inside the container has changed,
  a reload will be scheduled, otherwise a restart.

Always did a nearly full rewrite of the activation test to cover several
corner-cases and combination of such settings.

nixos/containers-next: add read-only `nixos.containers.rendered` option

This option is an attr-set that maps containers to their NixOS
configuration since `nixos.containers.instances` directly transforms the
config to a NixOS derivation. Also, the raw `nixos.containers.instances`
isn't really usable since it usually contains a list of chunks that are
evaluated by the module-system.

This is actually useful to introspect the configuration just as it's
done with e.g. `resources.machines`[1] in nixops. For instance, I'm
configuring my Prometheus scraping targets like this by gathering all active
exporters in my machines and their containers:

    { config, lib, ... }: with lib;
    let
      containers = flip mapAttrsToList machine.nixos.containers.rendered (const (x: x.config));
    in
      flip concatMap (attrValues containers)
        (c: flip concatMap (attrValues c.services.prometheus.exporters)
          (exporter:
            (optional exporter.enable "${config.networking.fqdn}:${toString exporter.port}")))

[1] https://nixos.mayflower.consulting/blog/2018/10/26/nixops-machine-configs/

nixos/all-tests: register tests

Also add a `jobset.nix` to test this on my self-hosted Hydra (which btw
uses this feature already :p).

nixos/containers-next: make sure that the module works fine with `restrictedEval` being active

This is necessary to get it running on my Hydra.

nixos/containers-next: add test for SSH inside a nspawn machine

Just another small testcase to confirm that the container's network
works fine.

nixos/containers-next: enable private users by default

nixos/systemd-nspawn: make `/etc/systemd/nspawn` mutable

Now only `/etc/systemd/nspawn/<name>.nspawn` will be a symlink rather
than having the full directory as a symlink. This is actually consistent
with `networkd` (both don't have alternate locations for transient units)
and will become necessary when implementing imperative containers since
these should also use nspawn units.

nixos/containers-next: fix eval after 21.05 breaking changes

`stdenv.lib` and `pkgs.utillinux` are deprecated now and cause an
error when disallowing aliases (which is the default when evaluating
nixpkgs).

nixos-nspawn: init

This is a first draft for imperative containers - basically a
replacement for `nixos-container` - based on Python. It's still missing
a few features, but is actually a working POC with the following
key-differences:

* Rather than Perl, Python is used now. While the choice of a language
  is always debatable, I'm pretty convinced that Python is easier to
  access than Perl and a lot more people are willing to write Python
  code (that's for instance the reason why the test-driver was
  eventually ported to Python).

* Similar to `extra-container`[1], this also contains way more features
  than the stock `nixos-container` implementation. This is because we
  basically provide all options from `nixos.containers` and evaluate
  them after that. The additional configs (such as
  `activation`/`network`/etc) are rendered into JSON and can be read by
  the script to imperatively create `.nspawn` & `.network` units.

[1] https://github.com/erikarvstedt/extra-container

nixos/containers-next: implement proper user-namespacing support

Now we're doing it correct user-namespacing here as well, for that a few
filesystem-fixes had to be applied.

For more context, please refer to NixOS#67336
Also credits go to the author of the aforementioned PR, I basically
pulled these changes into this branch.

nixos/containers-next: add support for `LoadCredential=`

With user-namespacing set to `pick`[1], bind-mounts will always be owned
by `nouser:nogroup`. This is a problem for secrets since these shouldn't
be world-readable and with a `nouser:nogroup` from another
user-namespace (the `root` inside container isn't an actual `root`
anymore) the secrets would be unreadable.

To work around this, `LoadCredential=` can be used. In fact, using
`--load-credential` - unfortunately there's no switch for
`.nspawn`-units - passes a secret into a container where it can be
re-used by using the host's credential-ID as `path` in a `.service`-file
inside the container.

So basically

    {
      nixos.containers.instances.foo.credentials = [
        { id = "foo"; path = "/run/secrets/foo";}
      ];
    }

makes the secret available as `/run/host/credentials/foo` and by
specifying

    LoadCredential=foo:foo

in `example.service`, the credential will be readable by the `ExecStart=`
inside `example.service` from `/run/credentials/example.service/foo`.

[1] https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#--private-users=

nixos/containers-next-imperative: init

sudo-nspawn: init

This is a slightly modified sudo enabling `--enable-static-sudoers`
which ensures that `sudoers.so` is linked statically into the
executable[1]:

>  --enable-static-sudoers
>        By default, the sudoers plugin is built and installed as a
>        dynamic shared object.  When the --enable-static-sudoers
>        option is specified, the sudoers plugin is compiled directly
>        into the sudo binary.  Unlike --disable-shared, this does
>        not prevent other plugins from being used and the intercept
>        and noexec options will continue to function.

This is necessary here because of user-namespaced `nspawn`-instances:
these have their own UID/GID-range. If a container called `ldap` has
`PrivateUsers=pick` enabled, this may look like this:

    $ ls /var/lib/machines
    drwxr-xr-x 15 vu-ldap-0  vg-ldap-0  15 Mar 11  2021 ldap
    -rw-------  1 root       root        0 Sep 12 16:13 .#ldap.lck
    $ id vu-ldap-0
    uid=1758003200(vu-ldap-0) gid=65534(nogroup) groups=65534(nogroup)

However, this means that bind-mounts (such as `/nix/store`) will be
owned by `nobody:nogroup` which is a problem for `sudo(8)` which expects
`sudoers.so` being owned by `root`.

To work around this, the aforementioned configure-flag will be used to
ensure that this library is statically linked into `bin/sudo` itself. We
cannot do a full static build though since `sudo(8)` still needs to
`dlopen(3)` various other libraries to function properly with PAM.

[1] https://www.sudo.ws/install.html

nixos/switch-to-configuration: fix a few problems with nspawn instances

Config activation of declarative containers used to be error-prone in
some cases:

* If a machine was powered off and had its config changed, the
  activation broke like this:

      systemd-nspawn@ldap.service is not active, cannot reload.

  The easiest workaround is to just skip inactive containers. The
  host-side configuration - i.e. the `nspawn`-unit and (optionally) the
  network configuration - is still activated and will be used on the
  next start.

* Sometimes, `systemd-nspawn@`-instances are marked to be started by the
  diffing-code. This should not happen since `systemd-nspawn@`-instances
  are now treated specially which means that these will only be started
  if they're newly added.

* If both `dbus.service` and an arbitrary container will be reloaded in
  the same transaction (i.e. in the same `systemctl reload`-call) this
  will freeze the system making it unreachable even via `ssh(1)` for
  about two minutes and leaving the following errors in the log:

      Sep 11 21:32:16 roflmayr systemd[1]: Reloading D-Bus System Message Bus.
      Sep 11 21:32:41 roflmayr dbus-send[1868379]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
      Sep 11 21:32:41 roflmayr systemd[1]: dbus.service: Control process exited, code=exited, status=1/FAILURE
      Sep 11 21:32:41 roflmayr systemd[1]: Reload failed for D-Bus System Message Bus.

  While I'm not entirely sure what's going on here, I realized that this
  issue disappears if all services that are scheduled for reload are
  processed before the containers. I guess that this avoids host-side
  system-services interfering with a container's system-manager.

nixos-nspawn: misc improvements & cleanups

This enhances the test-coverage of the script significantly and also
adds fixes for a few existing problems such as

* missing call-traces
* a spurious error when invoking the command without arguments

and cleans the code up a bit.

nixos/containers-next: move to subdir and factor out defaults for containers

This was done because imperative & declarative containers have a common
base configuration that was duplicated before, so moving it into a file
used by both facilities is better here.

To avoid cluttering the `virtualisation/`-subtree of NixOS too much, I
decided to create a new subdir for this.

nixos-nspawn: implement activation & networking

However only in a simplified manner - my main intention was to write a
replacement for the `containers`-module and this was just a side-effect,
so further features should be implemented by the community.

Basically, `nixos-nspawn` update now activates the config on its own,
but without support for `strategy = "dynamic";` to avoid having to
duplicate the Perl implementation here. Instead, either
`reload`/`restart`/`none` is the default and can be overridden with
`nixos-nspawn --reload` / `nixos-nspawn --restart`. Since this is a
completely manual change anyways, this is IMHO good-enough for now. The
same applies to `nixos-nspawn rollback`.

Also, the rendered `.network`-units now support addresses just like
declarative containers do with the exception of IPv6 SLAAC because I'd
have to imperatively change `radvd` for this which is out of scope[1].

Finally, the test was enhanced to cover more cases related to the new
features.

[1] Actually, this would introduce too much impurity anyways. Instead,
    `networkd` should implement IPv6  SLAAC for nspawn on its own so we
    can remove `radvd` and properly implement this here.

nixos/activation-scripts: turn off `var`-script for containers

It's already taken care of and only causes `permission denied`-errors
that make config activations seem failed even though they aren't.

Revert "nixos/activation-scripts: turn off `var`-script for containers"

This reverts commit 6f281b9ad31cf6d9ef396de788d06ea4e35f8112.

This is actually not a good idea since the `var`-activation-script is
actually the component that ensures that `/var/empty` exists which is
`$HOME` for quite a number of services.

nixos/containers-next: only create OS structure in `/var/lib/machines` if it doesn't exist

Because after that, this can screw with permissions if the container is
using a private user-namespace. This actually solves the activation
issues and the `var`-script can still be used in here.

nixos/tests/containers-next: add testcase for custom `ResolvConf`-setting

nixos/container-migration-test: confirm that nixos-container is still usable after switching to the new API

nixos/containers-next: assert that networkd is used

nixos/tests/containers-next-imperative: ensure that imperative containers can be powered off without state issues

nixos/tests/container-migration: fix eval

nixos/containers-next: fix eval

nixos/qemu-vm: increase /boot to 120M

Otherwise test-cases that install several NixOS generations into `/boot`
will fail with `No space left on device`.

nixos/container-migration: actually move state of containers

nixos/containers-next: fix test

nixos/containers-next: s/literalExample/literalExpression/g

nixos/useHostResolvConf: deprecate option

nixos/containers-next-imperative: fix test

* Don't use underscores in hostnames, this appears to break
  systemd-resolved now.
* Minor fixes for the test.

nixos/containers-next: fix `systemd-networkd-wait-online.service` hanging indefinetely

See NixOS#140669 (comment)
for further context.

Co-authored-by: Franz Pletz <fpletz@fnordicwalking.de>
Co-authored-by: zseri <zseri.devel@ytrizja.de>

nixos/containers-next: config -> system-config

nixos/containers-next: confirm that exposed hostnames also work for services like nginx

nixos/containers-next: review fixes

* Fix naming of migration test.
* Explain why `persistentBookDisk` is needed.
* Document that `jobset.nix` is only temporary and should be removed
  before merging.
* Remove superfluous `touch $out`.

sudo-nspawn: merge with `pkgs.sudo`

The feature can now be activated via `withStaticSudoers`. Also, the
patches aren't needed anymore since these are part of the current
`sudo`-release that's also in `nixpkgs`.

nixos-nspawn: refactor python setup

* Simplify shebangs
* Fix `python3`-inclusion on `nix-shell`-shebang
* Don't `flake8` the code on build.

Co-authored-by: Sandro <sandro.jaeckel@gmail.com>

nixos/qemu-vm: fix manual evaluation

containers-next: Support independent use of container-options.nix

containers-next: Add bindMounts option

containers-next: Dont shut down imperative containers during rebuild
Princemachiavelli pushed a commit to Princemachiavelli/nixpkgs that referenced this pull request May 10, 2023
This is basically what I tried in NixOS#84608 at first - being able to reload
or restart a container based on the NixOS-specific
`re{load,start}IfChanged` options for systemd units, but with a few
differences:

* I switched back to using `nsenter(1)` from util-linux for the same
  rationale as in ebb6e38: without
  this, the activation would hang until a timeout is exceeded if the
  service-manager inside the container is reloaded.

* I also disabled `systemd-networkd-wait-online.service` inside the
  container because it'd also hang even if the interfaces are configured
  properly. We should investigate how to fix it / if it was already
  fixed at some point.

Also implemented a small test to ensure that a config-activation works
fine, even with networking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants