New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/systemd-nspawn: reload or restart machines on config change #84608
Conversation
I started working on a draft for improved nixos-containers (using `networkd` and `.nspawn` units) after the networkd hackathon[1] which isn't published yet. Quite recently I realized that when changing a `.nspawn`-unit, the `switch-to-configuration.pl` doesn't activate those changes. This patch takes care of it with the following changes: * It's possible to declare whether to restart or reload such a unit. The restart option is the default. In that case the `systemd-nspawn@<machine-name>.service`[2]-unit will be restarted or reloaded. * By default, all `.nspawn`-units are part of the `machines.target`. * A VM-test covers all those cases including a custom reload-script to activate a new configuration in the machine. * I had to remove the `--keep-unit` flag on startup to fix the restart of the unit. This is a known issue[3]. It's also possible to use a reload to activate a new configuration inside a nspawn-machine with a config like this: ``` nix { pkgs, ... }: { systemd.nspawn.test-container.reloadOnChange = true; systemd.nspawn.test-container.restartOnChange = false; systemd.services."systemd-nspawn@test-container".serviceConfig.ExecReload = "${pkgs.writeScriptBin "activate" '' #! ${pkgs.runtimeShell} -xe systemd-run --machine test-container --pty --quiet -- /bin/sh --login -c \ '${containerCfg}/bin/switch-to-configuration test' ''}/bin/activate"; } ``` [1] https://discourse.nixos.org/t/networkd-sprint-2019-11-23-24-in-munich/4578 [2] https://github.com/systemd/systemd/blob/v243/units/systemd-nspawn@.service.in [3] NixOS#80169
@@ -7,6 +7,7 @@ | |||
use Net::DBus; | |||
use Sys::Syslog qw(:standard :macros); | |||
use Cwd 'abs_path'; | |||
use experimental 'smartmatch'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's a good idea.
@@ -44,6 +44,23 @@ let | |||
|
|||
instanceOptions = { | |||
options = sharedOptions // { | |||
restartOnChange = mkOption { | |||
default = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to make this false
by default as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actively flipped the default for this, as part of the network initialization unfortunately happens inside a shellscript outside the container, and doesn't get applied if we just apply the config inside the container.
@@ -126,7 +153,7 @@ in { | |||
systemd.services."systemd-nspawn@".serviceConfig.ExecStart = [ | |||
"" # deliberately empty. signals systemd to override the ExecStart | |||
# Only difference between upstream is that we do not pass the -U flag | |||
"${config.systemd.package}/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth --settings=override --machine=%i" | |||
"${config.systemd.package}/bin/systemd-nspawn --quiet --boot --link-journal=try-guest --network-veth --settings=override --machine=%i" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing --keep-unit
here shouldn't be necessary here anymore since v245 given you already have KillSignal = "SIGRTMIN+3";
. We really ought to test if these issues are fixed in 245 and otherwise update the upstream systemd ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is blocked on the systemd bump, which is blocked on some of the tests failing currently (more context in #nixos-systemd
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ma27 can you check this again now that systemd 245.3 has been merged to staging?
One other remark. I remember that when touch One problem with that I see now is that change of one nspawn container will trigger restart of other nspawn container. |
Can you explain why? This doesn't restart everything from
But that wouldn't cover the reload, right?
That's fine, I'll change it accordingly as soon as the bump is out. I don't expect this to get merged too soon, I mainly wanted to get some feedback for now :) |
Makes sense, thanks. Let's pair soon on IRC on picking up the systemd bump again :) |
Closing for now. In order to function properly, this needs way more work... as soon as I have time to, I'll publish a basic draft for improved nixos-containers where I'll provide a similar solution like the stuff I did here. |
This is basically what I tried in NixOS#84608 at first - being able to reload or restart a container based on the NixOS-specific `re{load,start}IfChanged` options for systemd units, but with a few differences: * I switched back to using `nsenter(1)` from util-linux for the same rationale as in ebb6e38: without this, the activation would hang until a timeout is exceeded if the service-manager inside the container is reloaded. * I also disabled `systemd-networkd-wait-online.service` inside the container because it'd also hang even if the interfaces are configured properly. We should investigate how to fix it / if it was already fixed at some point. Also implemented a small test to ensure that a config-activation works fine, even with networking.
Sometimes it's needed to build a configuration within a `nix-build` for systemd units. While this is fairly easy for .service-units (where you can easily define overrides), it's not possible for `systemd-nspawn(1)`. This is mostly a hack to get dedicated bind-mounts of store paths from `pkgs.closureInfo` into the configuration without IFD. In the long term we either want to fix this in systemd or find a more suited solution though. nixos/containers-next: initialize first draft for new NixOS containers w/networkd This is the first batch of changes for a new container-module replacing the current `nixos-container`-subsystem in the longterm. The state in here is still strongly inspired by the `containers`[1]-module to declare declarative nspawn-instances by using NixOS config for the host and the container itself. For now, this module uses the tentative namespace `nixos.containers', but that's subject to change. This new module will also contain the following key-differences: * Rather than writing a big abstraction-layer on top, we'll rely on `.nspawn`-units[2]. This has the benefits that (1) we can stop adding options for each new nspawn-feature (such as MACVLANs, ephemeral instances, etc.) because it can be directly written into the `.nspawn`-unit using the module system like systemd.nspawn.foo.filesConfig = { BindReadOnly = /* ... */ }; Also, administrators don't need to learn too much about our abstractions, they only need to know a few basics about the module-system and how to write systemd units. * This feature strictly enforces `systemd-networkd` on both the container & the host. It can be turned off for containers in the host-namespace without a private network though. The reason for this is that the current `nixos-container` implementation has the long-standing bug that the container's uplink is broken *until* the container has booted since the host-side of the veth-pair is configured in `ExecStartPost=`[3]. This is, because there's no proper way to take care of it in an earlier stage since `systemd-nspawn` creates the interface itself. This has e.g. the implication that services inside the container wrongly assume that they connect to e.g. an external database via network (since `network{,-online}.target` was reached), however this is not the case due to the unconfigured host-side veth interface. However, when using `systemd-networkd(8)` on both sides, this is not the case anymore since systemd will automatially take care of configuring the network correctly when an nspawn unit starts and `networkd` is active. Apart from a basic draft, this also contains support for RFC1918 IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via SLAAC and `radvd(8)` including support for ephemeral containers. Further additions such as a better config-activation mechanism and a tool to manage containers imperatively will follow. [1] https://nixos.org/manual/nixos/stable/options.html#opt-containers [2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html# [3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240 nixos/containers-next: initialize first draft for new NixOS containers w/networkd This is the first batch of changes for a new container-module replacing the current `nixos-container`-subsystem in the longterm. The state in here is still strongly inspired by the `containers`[1]-module to declare declarative nspawn-instances by using NixOS config for the host and the container itself. For now, this module uses the tentative namespace `nixos.containers', but that's subject to change. This new module will also contain the following key-differences: * Rather than writing a big abstraction-layer on top, we'll rely on `.nspawn`-units[2]. This has the benefits that (1) we can stop adding options for each new nspawn-feature (such as MACVLANs, ephemeral instances, etc.) because it can be directly written into the `.nspawn`-unit using the module system like systemd.nspawn.foo.filesConfig = { BindReadOnly = /* ... */ }; Also, administrators don't need to learn too much about our abstractions, they only need to know a few basics about the module-system and how to write systemd units. * This feature strictly enforces `systemd-networkd` on both the container & the host. It can be turned off for containers in the host-namespace without a private network though. The reason for this is that the current `nixos-container` implementation has the long-standing bug that the container's uplink is broken *until* the container has booted since the host-side of the veth-pair is configured in `ExecStartPost=`[3]. This is, because there's no proper way to take care of it in an earlier stage since `systemd-nspawn` creates the interface itself. This has e.g. the implication that services inside the container wrongly assume that they connect to e.g. an external database via network (since `network{,-online}.target` was reached), however this is not the case due to the unconfigured host-side veth interface. However, when using `systemd-networkd(8)` on both sides, this is not the case anymore since systemd will automatially take care of configuring the network correctly when an nspawn unit starts and `networkd` is active. Apart from a basic draft, this also contains support for RFC1918 IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via SLAAC and `radvd(8)` including support for ephemeral containers. Further additions such as a better config-activation mechanism and a tool to manage containers imperatively will follow. [1] https://nixos.org/manual/nixos/stable/options.html#opt-containers [2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html# [3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240 nixos/containers-next: implement small wrapper for nspawn port-forwards This exposes a given `containerPort` to the host address. So if port 80 from the container is forwarded to the host's port 8080 and the container uses `2001:DB8::42` and the host-side uses `2001:DB8::23` on the veth-interface, then `[2001:DB::42]:80` will be available on the host as `[2001:DB8::2]:8080`. nixos/containers-next: implement more advanced networking tests This change tests various combinations of static & dynamic addressing and also fixes a bug where `radvd(8)` was errorneously configured for veth-pairs where it's actually not needed. This test is also supposed to show how to use `systemd`-configuration to implement most of the features (for instance there's no custom set of options to implement MACVLANs) and serves as regression-test for future `systemd`-updates in NixOS. Please note that the `ndppd`-hack is only here because QEMU doesn't do proper IPv6 neighbour resolution. In fact, I left comments whenever some workarounds were needed for the testing-facility. nixos/tests/container-migration: init This test is supposed to demonstrate how to migrate a single container to the new subsystem. Of course, docs on how to rewrite config isn't written yet, this is mainly a POC to show that it's generally possible by * Deploying a new configuration (using `nixos.containers`) being equivalent to the old one. * Moving the state from `/var/lib/containers` to `/var/lib/machines`. * Rebooting the host - unfortunately - because otherwise `systemd-networkd` will reach an inconsistent state - at least with v247. For the reboot-part I also had to change the QEMU vm-builder a bit to actually support persistent boot-disks. nixos/containers-next: allow static configuration for a virtual zone as well This is already the case for dynamically assigned addresses (e.g. via SLAAC or DHCPv4) where `0.0.0.0/24` and `::/64` provides a pool of private IPs. However if such a zone is supposed to be fully static, the same should be possible as well. nixos/switch-to-configuration: import old config activation changes This is basically what I tried in NixOS#84608 at first - being able to reload or restart a container based on the NixOS-specific `re{load,start}IfChanged` options for systemd units, but with a few differences: * I switched back to using `nsenter(1)` from util-linux for the same rationale as in ebb6e38: without this, the activation would hang until a timeout is exceeded if the service-manager inside the container is reloaded. * I also disabled `systemd-networkd-wait-online.service` inside the container because it'd also hang even if the interfaces are configured properly. We should investigate how to fix it / if it was already fixed at some point. Also implemented a small test to ensure that a config-activation works fine, even with networking. nixos/containers-next: fix broken machinectl reboot and probably more It seems as systemd ignores `systemd-nspawn@` (the template unit) if an override exists and a custom unit for the service (i.e. `systemd-nspawn@containername.service`): [root@server:~]# systemctl status systemd-nspawn@ldap ● systemd-nspawn@ldap.service Loaded: loaded (/nix/store/rm4kigdbzl78iai8jfbgxbslvalk8bwa-unit-systemd-nspawn-ldap.service/systemd-nspawn@ldap.service; linked; vendor preset: enabled) Drop-In: /nix/store/fr9zabpvp3077cbb6jnpxm42qxqw9yk2-system-units/systemd-nspawn@.service.d └─overrides.conf Active: active (running) since Tue 2021-03-16 15:01:32 UTC; 23min ago This breaks at least `machinectl reboot` which needs `RestartForceExitStatus = 133` as setting. For now, I've added all settings to the module itself. nixos/switch-to-configuration: Implement more generic decisions for config activations in containers Actually, using `re{load,start}IfChanged` isn't the best decision for containers because some containers have to be reloaded or restarted depending on what has changed. For instance, a new bind-mount requires a `machinectl reboot`, but a change in the NixOS config only needs a `systemctl reload` (which runs `switch-to-configuration` inside the container). To model this, I decided to add four keywords and an option `activation.strategy` to declarative containers: * `strategy = "none"` means that the container will be entirely ignored by `switch-to-configuration`. * `strategy = "restart"` will always `machinectl reboot` the container if a change was detected. * `strategy = "reload"` will always `systemctl reload` the container if a change was detected. * `strategy = "dynamic"` will check what has changed inside the container. If only the NixOS config inside the container has changed, a reload will be scheduled, otherwise a restart. Always did a nearly full rewrite of the activation test to cover several corner-cases and combination of such settings. nixos/containers-next: add read-only `nixos.containers.rendered` option This option is an attr-set that maps containers to their NixOS configuration since `nixos.containers.instances` directly transforms the config to a NixOS derivation. Also, the raw `nixos.containers.instances` isn't really usable since it usually contains a list of chunks that are evaluated by the module-system. This is actually useful to introspect the configuration just as it's done with e.g. `resources.machines`[1] in nixops. For instance, I'm configuring my Prometheus scraping targets like this by gathering all active exporters in my machines and their containers: { config, lib, ... }: with lib; let containers = flip mapAttrsToList machine.nixos.containers.rendered (const (x: x.config)); in flip concatMap (attrValues containers) (c: flip concatMap (attrValues c.services.prometheus.exporters) (exporter: (optional exporter.enable "${config.networking.fqdn}:${toString exporter.port}"))) [1] https://nixos.mayflower.consulting/blog/2018/10/26/nixops-machine-configs/ nixos/all-tests: register tests Also add a `jobset.nix` to test this on my self-hosted Hydra (which btw uses this feature already :p). nixos/containers-next: make sure that the module works fine with `restrictedEval` being active This is necessary to get it running on my Hydra. nixos/containers-next: add test for SSH inside a nspawn machine Just another small testcase to confirm that the container's network works fine. nixos/containers-next: enable private users by default nixos/systemd-nspawn: make `/etc/systemd/nspawn` mutable Now only `/etc/systemd/nspawn/<name>.nspawn` will be a symlink rather than having the full directory as a symlink. This is actually consistent with `networkd` (both don't have alternate locations for transient units) and will become necessary when implementing imperative containers since these should also use nspawn units. nixos/containers-next: fix eval after 21.05 breaking changes `stdenv.lib` and `pkgs.utillinux` are deprecated now and cause an error when disallowing aliases (which is the default when evaluating nixpkgs). nixos-nspawn: init This is a first draft for imperative containers - basically a replacement for `nixos-container` - based on Python. It's still missing a few features, but is actually a working POC with the following key-differences: * Rather than Perl, Python is used now. While the choice of a language is always debatable, I'm pretty convinced that Python is easier to access than Perl and a lot more people are willing to write Python code (that's for instance the reason why the test-driver was eventually ported to Python). * Similar to `extra-container`[1], this also contains way more features than the stock `nixos-container` implementation. This is because we basically provide all options from `nixos.containers` and evaluate them after that. The additional configs (such as `activation`/`network`/etc) are rendered into JSON and can be read by the script to imperatively create `.nspawn` & `.network` units. [1] https://github.com/erikarvstedt/extra-container nixos/containers-next: implement proper user-namespacing support Now we're doing it correct user-namespacing here as well, for that a few filesystem-fixes had to be applied. For more context, please refer to NixOS#67336 Also credits go to the author of the aforementioned PR, I basically pulled these changes into this branch. nixos/containers-next: add support for `LoadCredential=` With user-namespacing set to `pick`[1], bind-mounts will always be owned by `nouser:nogroup`. This is a problem for secrets since these shouldn't be world-readable and with a `nouser:nogroup` from another user-namespace (the `root` inside container isn't an actual `root` anymore) the secrets would be unreadable. To work around this, `LoadCredential=` can be used. In fact, using `--load-credential` - unfortunately there's no switch for `.nspawn`-units - passes a secret into a container where it can be re-used by using the host's credential-ID as `path` in a `.service`-file inside the container. So basically { nixos.containers.instances.foo.credentials = [ { id = "foo"; path = "/run/secrets/foo";} ]; } makes the secret available as `/run/host/credentials/foo` and by specifying LoadCredential=foo:foo in `example.service`, the credential will be readable by the `ExecStart=` inside `example.service` from `/run/credentials/example.service/foo`. [1] https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#--private-users= nixos/containers-next-imperative: init sudo-nspawn: init This is a slightly modified sudo enabling `--enable-static-sudoers` which ensures that `sudoers.so` is linked statically into the executable[1]: > --enable-static-sudoers > By default, the sudoers plugin is built and installed as a > dynamic shared object. When the --enable-static-sudoers > option is specified, the sudoers plugin is compiled directly > into the sudo binary. Unlike --disable-shared, this does > not prevent other plugins from being used and the intercept > and noexec options will continue to function. This is necessary here because of user-namespaced `nspawn`-instances: these have their own UID/GID-range. If a container called `ldap` has `PrivateUsers=pick` enabled, this may look like this: $ ls /var/lib/machines drwxr-xr-x 15 vu-ldap-0 vg-ldap-0 15 Mar 11 2021 ldap -rw------- 1 root root 0 Sep 12 16:13 .#ldap.lck $ id vu-ldap-0 uid=1758003200(vu-ldap-0) gid=65534(nogroup) groups=65534(nogroup) However, this means that bind-mounts (such as `/nix/store`) will be owned by `nobody:nogroup` which is a problem for `sudo(8)` which expects `sudoers.so` being owned by `root`. To work around this, the aforementioned configure-flag will be used to ensure that this library is statically linked into `bin/sudo` itself. We cannot do a full static build though since `sudo(8)` still needs to `dlopen(3)` various other libraries to function properly with PAM. [1] https://www.sudo.ws/install.html nixos/switch-to-configuration: fix a few problems with nspawn instances Config activation of declarative containers used to be error-prone in some cases: * If a machine was powered off and had its config changed, the activation broke like this: systemd-nspawn@ldap.service is not active, cannot reload. The easiest workaround is to just skip inactive containers. The host-side configuration - i.e. the `nspawn`-unit and (optionally) the network configuration - is still activated and will be used on the next start. * Sometimes, `systemd-nspawn@`-instances are marked to be started by the diffing-code. This should not happen since `systemd-nspawn@`-instances are now treated specially which means that these will only be started if they're newly added. * If both `dbus.service` and an arbitrary container will be reloaded in the same transaction (i.e. in the same `systemctl reload`-call) this will freeze the system making it unreachable even via `ssh(1)` for about two minutes and leaving the following errors in the log: Sep 11 21:32:16 roflmayr systemd[1]: Reloading D-Bus System Message Bus. Sep 11 21:32:41 roflmayr dbus-send[1868379]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Sep 11 21:32:41 roflmayr systemd[1]: dbus.service: Control process exited, code=exited, status=1/FAILURE Sep 11 21:32:41 roflmayr systemd[1]: Reload failed for D-Bus System Message Bus. While I'm not entirely sure what's going on here, I realized that this issue disappears if all services that are scheduled for reload are processed before the containers. I guess that this avoids host-side system-services interfering with a container's system-manager. nixos-nspawn: misc improvements & cleanups This enhances the test-coverage of the script significantly and also adds fixes for a few existing problems such as * missing call-traces * a spurious error when invoking the command without arguments and cleans the code up a bit. nixos/containers-next: move to subdir and factor out defaults for containers This was done because imperative & declarative containers have a common base configuration that was duplicated before, so moving it into a file used by both facilities is better here. To avoid cluttering the `virtualisation/`-subtree of NixOS too much, I decided to create a new subdir for this. nixos-nspawn: implement activation & networking However only in a simplified manner - my main intention was to write a replacement for the `containers`-module and this was just a side-effect, so further features should be implemented by the community. Basically, `nixos-nspawn` update now activates the config on its own, but without support for `strategy = "dynamic";` to avoid having to duplicate the Perl implementation here. Instead, either `reload`/`restart`/`none` is the default and can be overridden with `nixos-nspawn --reload` / `nixos-nspawn --restart`. Since this is a completely manual change anyways, this is IMHO good-enough for now. The same applies to `nixos-nspawn rollback`. Also, the rendered `.network`-units now support addresses just like declarative containers do with the exception of IPv6 SLAAC because I'd have to imperatively change `radvd` for this which is out of scope[1]. Finally, the test was enhanced to cover more cases related to the new features. [1] Actually, this would introduce too much impurity anyways. Instead, `networkd` should implement IPv6 SLAAC for nspawn on its own so we can remove `radvd` and properly implement this here. nixos/activation-scripts: turn off `var`-script for containers It's already taken care of and only causes `permission denied`-errors that make config activations seem failed even though they aren't. Revert "nixos/activation-scripts: turn off `var`-script for containers" This reverts commit 6f281b9ad31cf6d9ef396de788d06ea4e35f8112. This is actually not a good idea since the `var`-activation-script is actually the component that ensures that `/var/empty` exists which is `$HOME` for quite a number of services. nixos/containers-next: only create OS structure in `/var/lib/machines` if it doesn't exist Because after that, this can screw with permissions if the container is using a private user-namespace. This actually solves the activation issues and the `var`-script can still be used in here. nixos/tests/containers-next: add testcase for custom `ResolvConf`-setting nixos/container-migration-test: confirm that nixos-container is still usable after switching to the new API nixos/containers-next: assert that networkd is used nixos/tests/containers-next-imperative: ensure that imperative containers can be powered off without state issues nixos/tests/container-migration: fix eval nixos/containers-next: fix eval nixos/qemu-vm: increase /boot to 120M Otherwise test-cases that install several NixOS generations into `/boot` will fail with `No space left on device`. nixos/container-migration: actually move state of containers nixos/containers-next: fix test nixos/containers-next: s/literalExample/literalExpression/g nixos/useHostResolvConf: deprecate option nixos/containers-next-imperative: fix test * Don't use underscores in hostnames, this appears to break systemd-resolved now. * Minor fixes for the test. nixos/containers-next: fix `systemd-networkd-wait-online.service` hanging indefinetely See NixOS#140669 (comment) for further context. Co-authored-by: Franz Pletz <fpletz@fnordicwalking.de> Co-authored-by: zseri <zseri.devel@ytrizja.de> nixos/containers-next: config -> system-config nixos/containers-next: confirm that exposed hostnames also work for services like nginx nixos/containers-next: review fixes * Fix naming of migration test. * Explain why `persistentBookDisk` is needed. * Document that `jobset.nix` is only temporary and should be removed before merging. * Remove superfluous `touch $out`. sudo-nspawn: merge with `pkgs.sudo` The feature can now be activated via `withStaticSudoers`. Also, the patches aren't needed anymore since these are part of the current `sudo`-release that's also in `nixpkgs`. nixos-nspawn: refactor python setup * Simplify shebangs * Fix `python3`-inclusion on `nix-shell`-shebang * Don't `flake8` the code on build. Co-authored-by: Sandro <sandro.jaeckel@gmail.com> nixos/qemu-vm: fix manual evaluation containers-next: Support independent use of container-options.nix containers-next: Add bindMounts option containers-next: Dont shut down imperative containers during rebuild
This is basically what I tried in NixOS#84608 at first - being able to reload or restart a container based on the NixOS-specific `re{load,start}IfChanged` options for systemd units, but with a few differences: * I switched back to using `nsenter(1)` from util-linux for the same rationale as in ebb6e38: without this, the activation would hang until a timeout is exceeded if the service-manager inside the container is reloaded. * I also disabled `systemd-networkd-wait-online.service` inside the container because it'd also hang even if the interfaces are configured properly. We should investigate how to fix it / if it was already fixed at some point. Also implemented a small test to ensure that a config-activation works fine, even with networking.
Motivation for this change
I started working on a draft for improved nixos-containers (using
networkd
and.nspawn
units) after the networkd hackathon[1] whichisn't published yet.
Quite recently I realized that when changing a
.nspawn
-unit, theswitch-to-configuration.pl
doesn't activate those changes. This patchtakes care of it with the following changes:
It's possible to declare whether to restart or reload such a unit. The
restart option is the default. In that case the
systemd-nspawn@<machine-name>.service
[2]-unit will be restarted or reloaded.By default, all
.nspawn
-units are part of themachines.target
.A VM-test covers all those cases including a custom reload-script to
activate a new configuration in the machine.
I had to remove the
--keep-unit
flag on startup to fix the restartof the unit. This is a known issue[3].
It's also possible to use a reload to activate a new configuration
inside a nspawn-machine with a config like this:
[1] https://discourse.nixos.org/t/networkd-sprint-2019-11-23-24-in-munich/4578
[2] https://github.com/systemd/systemd/blob/v243/units/systemd-nspawn@.service.in
[3] #80169
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)