Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: switch to unified cgroup hierarchy by default #104094

Merged
merged 4 commits into from Nov 22, 2020

Conversation

flokli
Copy link
Contributor

@flokli flokli commented Nov 17, 2020

Other distros, like Fedora 32 have already switched to it.

Users that need to keep using the old cgroup hierarchy (namely because
they want to run docker, and not some of the alternatives supporting
cgroupsv2 can re-enable it by setting systemd.enableUnifiedCgroupHierarchy = false;.

Fixes #73800

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@flokli flokli added this to To Do in systemd via automation Nov 17, 2020
Copy link
Member

@grahamc grahamc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this in my config:

   boot.kernelParams = (if config.virtualisation.docker.enable then [] else [ "cgroup_no_v1=all" "systemd.unified_cgroup_hierarchy=yes" ]);

maybe we should automatically specify this for the docker services?

nixos/doc/manual/release-notes/rl-2103.xml Outdated Show resolved Hide resolved
Copy link
Contributor Author

@flokli flokli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this in my config:

boot.kernelParams = (if config.virtualisation.docker.enable then [] else [ "cgroup_no_v1=all" "systemd.unified_cgroup_hierarchy=yes" ]);

maybe we should automatically specify this for the docker services?

Yes, adding "systemd.unified_cgroup_hierarchy=0 if docker is enabled might make sense, at least as long as docker doesn't support cgroupv2.
We probably should add link to the upstream tracking issues.

It seems it's supported in runc: opencontainers/runc#2315 (comment) and should be supported in docker 20, which was "not officially released yet, and its ETA still unpublished" in September 2020: moby/moby#40360 (comment)

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Nov 18, 2020

Docker 20.10.0-rc1 (https://github.com/moby/moby/releases) was released today which should add cgroupsv2 and apparmor 3.0-beta support. Maybe we can wait until it is released?

@flokli
Copy link
Contributor Author

flokli commented Nov 18, 2020

@SuperSandro2000 are you aware of any timelines on docker 20.* getting released? Would you be up to packaging their rc, to see how it behaves? ;-)

I surely want to add/extend some more tests about this before merging, and see how config options for other container runtimes might need to look like (if it requires special configuration).

@ajs124
Copy link
Member

ajs124 commented Nov 18, 2020

This will make user@ services fail when using with the hardened profile. Specifically this line of the hardened profile and this issue marked as "not supported".

There are several solutions to this and it definitely should not block this, but I thought it was worth noting.

cc @joachifm (because of the hardened profile).

@flokli flokli force-pushed the systemd-unified-cgroup-hierarchy branch from 26541b9 to aa64c18 Compare November 18, 2020 23:18
@flokli
Copy link
Contributor Author

flokli commented Nov 18, 2020

I pushed a new version exposing a systemd.unifiedCgroupHierarchy option (defaulting to true, adding the kernel cmdline if set to false), that is automatically set to false by hidepid and when docker is enabled.

I still want to some checks for IOAccounting and IPAccounting to our systemd tests.

@flokli
Copy link
Contributor Author

flokli commented Nov 19, 2020

I also did some more experiments:

  • cri-o works fine with unified cgroups.
  • k3s with docker as container runtime obviously still needs unified cgroups disabled
  • k3s with their bundled containerd doesn't work with cgroupsv2 either

I made sure the unified cgroup hierarchy is disabled in these cases and the tests succeed again.

@flokli flokli marked this pull request as ready for review November 19, 2020 01:33
@flokli
Copy link
Contributor Author

flokli commented Nov 19, 2020

I added some more tests on the accounting parts. I couldn't reliably get IO Accounting on individual units tested, but the test ensures system.slice has non-zero IO accounting values :-)

This is ready for review, PTAL.

@SuperSandro2000
Copy link
Member

are you aware of any timelines on docker 20.* getting released?

No but I assume it could take a month.

Would you be up to packaging their rc, to see how it behaves?

I can get it to build but for testing I miss experience.

@flokli
Copy link
Contributor Author

flokli commented Nov 19, 2020

are you aware of any timelines on docker 20.* getting released?

No but I assume it could take a month.

Okay. So I propose we keep this PR as-is (disable unified cgroup when you enable docker), and we can remove this once docker has landed and supports it.

Would you be up to packaging their rc, to see how it behaves?

I can get it to build but for testing I miss experience.

nixosTests.docker should probably be enough to verify basic functionality, and once that works, people could take a more closer look in that new PR :-)

@SuperSandro2000
Copy link
Member

nixosTests.docker should probably be enough to verify basic functionality, and once that works, people could take a more closer look in that new PR :-)

KVM does not want to work with nix tests on my machine. Not sure why.

Copy link
Member

@ajs124 ajs124 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, in general. Some notes, although they're not too important.

nixos/modules/security/hidepid.nix Outdated Show resolved Hide resolved
nixos/tests/systemd.nix Show resolved Hide resolved
@joachifm joachifm removed their request for review November 19, 2020 15:43
See https://www.redhat.com/sysadmin/fedora-31-control-group-v2 for
details on why this is desirable, and how it impacts containers.

Users that need to keep using the old cgroup hierarchy can re-enable it
by setting `systemd.unifiedCgroupHierarchy` to `false`.

Well-known candidates not supporting that hierarchy, like docker and
hidepid=… will disable it automatically.

Fixes NixOS#73800
This gets automatically disabled by docker if the docker backend is
used, but the bundled containerd also doesn't seem to support cgroupsv2,
so disable it explicitly here, too.
For now, testing IO Accounting is skipped, as it seems to be either
broken, or hard to reproduce in a VM.
@flokli flokli force-pushed the systemd-unified-cgroup-hierarchy branch from d5d53d6 to f683297 Compare November 19, 2020 16:00
@flokli flokli requested a review from zowoq as a code owner November 19, 2020 23:18
podman.succeed(
su_cmd(
"podman run --runtime=crun -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
"podman run -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the explicit --runtime=crun/ Run container rootless with crun please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why? podman seems to default to it with cgroupsv2, and if we view this test from a user perspective, they just want to podman run as non-root, without explicitly configuring a specific runtime.

also cc @adisbladis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to say I agree with @flokli on this one. It's better that we test the users default config.
Also I seem to recall seeing some intent to switch back to runc by default at some point once it's gained the required cgroupsv2 support (don't quote me on this one).

Copy link
Contributor

@zowoq zowoq Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this instead?

Edit: just realised that the original diff was incomplete.

diff --git a/nixos/tests/podman.nix b/nixos/tests/podman.nix
index cd8c2b4308c..3b54d793b1b 100644
--- a/nixos/tests/podman.nix
+++ b/nixos/tests/podman.nix
@@ -53,15 +53,24 @@ import ./make-test-python.nix (
           podman.succeed("podman stop sleeping")
           podman.succeed("podman rm sleeping")
 
+      with subtest("Run container as root with default runtime"):
+          podman.succeed("tar cv --files-from /dev/null | podman import - scratchimg")
+          podman.succeed(
+              "podman run -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
+          )
+          podman.succeed("podman ps | grep sleeping")
+          podman.succeed("podman stop sleeping")
+          podman.succeed("podman rm sleeping")
+
       with subtest("Run container rootless with runc"):
           podman.succeed(su_cmd("tar cv --files-from /dev/null | podman import - scratchimg"))
-          podman.succeed(
+          podman.fail(
               su_cmd(
                   "podman run --runtime=runc -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
               )
           )
-          podman.succeed(su_cmd("podman ps | grep sleeping"))
-          podman.succeed(su_cmd("podman stop sleeping"))
+          podman.fail(su_cmd("podman ps | grep sleeping"))
+          podman.fail(su_cmd("podman stop sleeping"))
           podman.succeed(su_cmd("podman rm sleeping"))
 
       with subtest("Run container rootless with crun"):
@@ -74,6 +83,17 @@ import ./make-test-python.nix (
           podman.succeed(su_cmd("podman ps | grep sleeping"))
           podman.succeed(su_cmd("podman stop sleeping"))
           podman.succeed(su_cmd("podman rm sleeping"))
+
+      with subtest("Run container rootless with default runtime"):
+          podman.succeed(su_cmd("tar cv --files-from /dev/null | podman import - scratchimg"))
+          podman.succeed(
+              su_cmd(
+                  "podman run -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
+              )
+          )
+          podman.succeed(su_cmd("podman ps | grep sleeping"))
+          podman.succeed(su_cmd("podman stop sleeping"))
+          podman.succeed(su_cmd("podman rm sleeping"))
     '';
   }
 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think our test should suddenly fail if podmans runc backend starts working in rootless scenarios again.

I'm fine adding a test for the default backend in as-root scenarios, but made the rootless runc subtest a comment, which we can re-add once it gets implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think our test should suddenly fail if podmans runc backend starts working in rootless scenarios again.

I'm fine adding a test for the default backend in as-root scenarios, but made the rootless runc subtest a comment, which we can re-add once it gets implemented.

All of these packages have passthru.tests so the failure will be noticed when it does start working.

Can you move the default tests after the explict runtime tests please? If it does fail because of a runtime problem having it fail on the explict test would be clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move the test order, sure, but I still don't think the test should suddenly fail if more things work. podman is still functional if a backend that was known to be broken starts working again. We shouldn't then fail the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the maintainers of this package to read the release notes while bumping, and updating/extending the test accordingly once new features are added.

@flokli flokli force-pushed the systemd-unified-cgroup-hierarchy branch from 16ef7f1 to 8de38fa Compare November 20, 2020 11:16
The runc backend doesn't work with unified cgroup hierarchy, and it
failing is a known issue.

However, the default backends should work in both rootless and as-root
scenarios, so make sure we test these.
@flokli flokli force-pushed the systemd-unified-cgroup-hierarchy branch from 8de38fa to 90d5bdb Compare November 20, 2020 15:24
@@ -155,6 +155,9 @@ in
users.groups.docker.gid = config.ids.gids.docker;
systemd.packages = [ cfg.package ];

# TODO: remove once docker 20.10 is released
systemd.enableUnifiedCgroupHierarchy = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this conditional on the docker version? Users can set the docker package at runtime and defaulting this to the right version automatically might be a nice perk that we can implement rather quickly in NixOS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know yet if docker 20.10 really works. I'm happy to take a closer look at this once it is in Nixpkgs. There currently is no docker working with enabled unified cgroup hierarchy, so I'm not much in favor of code for the eventuality of it being there ;-)

@flokli
Copy link
Contributor Author

flokli commented Nov 22, 2020

Let's get this in. If there's other applications that don't really work with cgroupsv2, disabling it for them is pretty trivial by setting systemd.enableUnifiedCgroupHierarchy = false;.

@flokli flokli merged commit c768913 into NixOS:staging Nov 22, 2020
systemd automation moved this from To Do to Done Nov 22, 2020
@flokli flokli deleted the systemd-unified-cgroup-hierarchy branch November 22, 2020 21:35
lukegb added a commit to lukegb/nixpkgs that referenced this pull request Dec 1, 2020
Since NixOS#104094 (d22b3ed), NixOS is
using the unified cgroup hierarchy by default (aka cgroupv2).

This means the blkio controller isn't there, so we should test for
something else (e.g. the presence of the io controller).

Fixes NixOS#105581.
FRidh pushed a commit that referenced this pull request Dec 1, 2020
Since #104094 (d22b3ed), NixOS is
using the unified cgroup hierarchy by default (aka cgroupv2).

This means the blkio controller isn't there, so we should test for
something else (e.g. the presence of the io controller).

Fixes #105581.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
systemd
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants