Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to make sandbox work in Docker with its default seccomp filter #2719

Closed
wants to merge 1 commit into from

Conversation

roberth
Copy link
Member

@roberth roberth commented Mar 11, 2019

  • Fall back to bind-mounting proc (required because of MNT_LOCKED)
  • Allow skipping pivot_root, which is not allowed in containers

@roberth roberth marked this pull request as ready for review March 12, 2019 10:54
@@ -700,6 +700,20 @@ password <replaceable>my-password</replaceable>
</varlistentry>


<varlistentry xml:id="conf-sandbox-use-pivot_root">
<term><literal>sandbox-use-pivot_root</literal></term>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this kind of option. Is there any way we can detect whether pivot_root works / we're in a container?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can detect specific, but it depends on how the container is implemented. See this answer + comment for example. Also it seems that we'd be checking the wrong thing. The failure seems more likely caused by mount namespaces rather than cgroups, or it can be caused by a security module.

What we can do instead is make the pivot_root feature degrade gracefully, but then it may go unnoticed when it needs fixing.

@catern
Copy link
Contributor

catern commented Mar 16, 2019

Why does pivot_root not work in Docker? The sandbox works fine inside at least some other container technologies.

@roberth
Copy link
Member Author

roberth commented Mar 18, 2019

@catern It seems to work fine in NixOS containers, but that's because those reuse the host nix-daemon, so the sandbox isn't actually containerized. I did run the sandbox in LXC without the pivot_root problem, but that did suffer from the /proc issue which is also addressed in this PR.

Looking at the pivot_root source, we can have problems due to MNT_LOCKED which is used to secure /proc, for example, or due to shared mounts. I don't think those kernel features can or should be avoided in general.

@roberth
Copy link
Member Author

roberth commented Mar 18, 2019

It seems to me that graceful degradation is the right direction for this PR.
@edolstra Do you need a way to test whether pivot_root is active or do we just trust that it keeps working? I don't want to overengineer this; you know where the balance is.

@edolstra
Copy link
Member

What do you mean with "trust that it keeps working"? Maybe it's easiest to just fall back if pivot_root fails.

However, the commit message of 5451b8d suggests that if we don't use pivot_root, it might be possible to break out of the chroot. So we have to check what the security implications are...

@roberth
Copy link
Member Author

roberth commented Mar 18, 2019

I meant that if for whatever future reason our implementation breaks, we may not notice the security downgrade because we have a fallback in place.

So either we remove the option and take a security risk, or we leave the option in so the user can decide for themselves whether deploying in a container is a good idea or not.
I think the extra option is worth having.
I'll improve the documentation wrt security if you agree.

(For future reference, this solution came from our internal needs and we were not planning to use this in a multi-tenant way)

@catern
Copy link
Contributor

catern commented Mar 18, 2019

I wasn't thinking of NixOS containers, but rather other container systems I've developed myself, within which the Nix sandbox works fine. But if the Nix sandbox also works fine in LXC without the pivot_root problem, perhaps the issue is with Docker or your configuration of Docker, not with Nix. You should find out concretely what the issue is with Docker first.

Also: I'm not sure this /proc change is correct. Won't that give us a /proc instance for the parent pid namespace, not the child pid namespace we created for the builder? The pids found in /proc will be wrong (though /proc/self will be right). /proc doesn't magically handle this AFAIK, you have to mount a new instance for new pid namespaces.

@edolstra
Copy link
Member

Does it even make sense to use the sandbox in a container? Doesn't the container already provide enough isolation?

@roberth
Copy link
Member Author

roberth commented Mar 18, 2019

Containers are not a great security device, particularly if you have to open them up to do something like this. So that's not why we're using a container. We're using the container as a method of deployment and for practical isolation.
The sandbox can serve two purposes:

  • security
  • purity

Our use case requires the purity but not security.
We're not dealing with an adversary and I don't think having an unsandboxed /proc is much of a problem in that case.
So for us it makes sense.

@catern
Copy link
Contributor

catern commented Mar 18, 2019

The issue is not that /proc is unsandboxed. The issue is that an incorrectly mounted /proc will cause bugs in the software you run. Anything that gets a PID (from, say, getpid or fork) and looks it up in /proc will break.

@roberth
Copy link
Member Author

roberth commented Mar 18, 2019

Thanks @catern, I didn't think this may be an issue. I'll have to do some more research. One way to make procfs mounting work is to unmount all the bind mounts on /proc, but I hope there's a better solution.

@roberth roberth changed the title Make the sandbox work in docker [WIP] Make the sandbox work in docker Mar 18, 2019
@edolstra
Copy link
Member

Maybe we can have a new, weaker sandbox mode (rather than a special boolean flag) that doesn't use PID and user namespaces. That way we don't need to mount a new /proc in the sandbox.

@cstrahan
Copy link
Contributor

cstrahan commented Mar 21, 2019

This blog helped me understand the procfs problem: https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/#what-about-procfs

The author talks about efforts to change the Linux kernel such that mount_too_revealing() doesn't forbid the new procfs mount:

Alternatively, @jessfraz's blog talks about her adding support in Docker and Kubernetes to leave the procfs mount alone (as opposed to hiding/mounting over various paths):

@roberth, I suspect you'll want to use one of those options in Docker/Kubernetes to side step the procfs issues.

@cstrahan
Copy link
Contributor

Note that the aforementioned support in Docker hasn't been surfaced through the CLI yet: docker/cli#1347

@roberth
Copy link
Member Author

roberth commented Mar 21, 2019

@cstrahan Thanks for the pointers.

We seem to have a working solution now, based on the following

  • disabling docker's seccomp to allow pivot_root
  • unmounting everything on top of /proc, which we can do with the SYS_ADMIN capability

That's good enough for our use case, but it will be interesting to see how these permissions can be reduced in the future.

Here's our arion service module that makes it work. I haven't decided yet whether such a module should be part of arion or a separate repo.

{ pkgs, lib, ... }:
{
  service.useHostStore = lib.mkForce false;

  nixos.configuration = { pkgs, lib, ...}: {
    boot.postBootCommands = ''
        # Assert dominance, so nix-daemon can mount procfs for the sandbox
        # Background: https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/#the-exception-of-procfs-and-sysfs
        # Code: https://serverfault.com/a/897476
        for dir in $(${pkgs.gawk}/bin/awk '/\/proc\// { print $5; }' /proc/1/mountinfo); do
          echo "Exposing $dir"
          umount "$dir";
        done
    '';
    systemd.sockets.nix-daemon.enable = true;
    systemd.services.nix-daemon.enable = true;
    # Use a non-default range in order to decrease the likelyhood of getting killed by the host nix-daemon
    ids.uids.nixbld = 9000;
  };
  service.devices = [ "/dev/kvm" ];
  service.capabilities.SYS_ADMIN = true;
  build.service.security_opt = [ "seccomp=unconfined" ]; # pivot_root for nix sandbox, TODO turn security_opt into proper service option
}

@roberth
Copy link
Member Author

roberth commented Mar 21, 2019

Regarding the pivot_root patch, it seems to deserve to exist because you can use chroot with the default seccomp filter on. The /proc patch is clearly broken.

This is required for running the sandbox in a privileged
container.
@roberth roberth changed the title [WIP] Make the sandbox work in docker Option to make sandbox work in Docker with its default seccomp filter Jun 4, 2019
@stale
Copy link

stale bot commented Feb 13, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 13, 2021
@roberth
Copy link
Member Author

roberth commented Jun 2, 2021

I haven't need this for some time now.

@roberth roberth closed this Jun 2, 2021
@hlolli
Copy link
Member

hlolli commented Oct 29, 2022

Your diff helped me to install nixos from a rescue environment from a cloud provider that doesn't support pivot_root. Worked like a charm. Probably super rare situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants