Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chroot-user: rewrite in C, drop CHROOTENV_EXTRA_BINDS #31182

Merged
merged 2 commits into from Dec 10, 2017

Conversation

lukateras
Copy link
Member

@lukateras lukateras commented Nov 3, 2017

Motivation for this change

Resolves #19047, fixes #31104. Related: #16030

Rebuilt several small derivations that use buildFHSUserEnv, currently in the process of rebuilding Steam runtime, will tell how it goes. All good so far.

Runs ~70ms (4 times) faster than chroot-user.rb on my machine. Basically the same LOC count.

/cc @abbradar @bjornfor @Mic92 @spacekitteh

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@lukateras
Copy link
Member Author

lukateras commented Nov 3, 2017

Fixed a few issues thanks to Valgrind, cleaned up.


// If we are root, no need to create new user namespace.
if (uid == 0) {
unshare(CLONE_NEWNS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to check result.

unshare(CLONE_NEWNS);
// Mark all mounted filesystems as slave so changes
// don't propagate to the parent mount namespace.
mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to check result.

} else {
// Create new mount and user namespaces. CLONE_NEWUSER
// requires a program to be non-threaded.
unshare(CLONE_NEWNS | CLONE_NEWUSER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to check result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added checks, fixed errno shadowing after getgid().

gid_t gid = getgid();

if (uid == -1)
errorf(EX_OSERR, "getuid");
Copy link
Member

@Mic92 Mic92 Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errno might be shadowed here by getgid call. The return value must be check immediately.

errorf(EX_OSERR, "execvpe");
}

while (wait(NULL) > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only wait for our child process here and exit with the same exit status:

I have here some code, I adapted from nsenter:
https://github.com/Mic92/nsattach/blob/master/nsattach.c#L376

Copy link
Member Author

@lukateras lukateras Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now doing that.

I've added signal handling code but no SIGSTOP handling (so no WUNTRACED flag). I don't think that is the right thing to do here because it means that sending SIGCONT to child process without repeating it to parent would result in parent being suspended indefinitely.

@Mic92 Mic92 requested a review from globin November 3, 2017 15:15
exit(EX_USAGE);
}

char *root = mkdtemp(strdup("/tmp/chrootenvXXXXXX"));
Copy link
Member

@Mic92 Mic92 Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: memory never freed. Should stack allocation not be sufficient here?

Copy link
Member Author

@lukateras lukateras Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack allocation here causes segfault, mkdtemp template should be on the heap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the following works for me:

#include <stdio.h>
#include <stdlib.h>
int main() {
  char template[] = "/tmp/chrootenvXXXXXX";
  char *root = mkdtemp(template);
  if (!root) {
    perror("mkdtemp()");
  } else {
    printf("see %s\n", root);
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, if I swap template[] with *template it results in segfault.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

char *template = "string" is a pointer to a constant which resides in the read only memory. char template[] = "string" is a mutable 7 bytes array (initialized with a string literal).

@Mic92 Mic92 requested a review from abbradar November 3, 2017 15:29
@lukateras lukateras force-pushed the chroot-user/c-rewrite branch 5 times, most recently from b06a5ad to 0ac795a Compare November 3, 2017 17:42
@lukateras
Copy link
Member Author

lukateras commented Nov 3, 2017

@abbradar By the way, why do we bind-mount only certain dirs in /? We could just bind-mount every dir, which would also make /host unnecessary...

@lukateras lukateras force-pushed the chroot-user/c-rewrite branch 2 times, most recently from 59eaf86 to 9dd5342 Compare November 4, 2017 08:20
@lukateras
Copy link
Member Author

lukateras commented Nov 4, 2017

I've fixed /proc/self/{g,u}id_map writes, Steam works. No leaks, correct exit code handling:

$ valgrind ./chrootenv false
==2317== Memcheck, a memory error detector
==2317== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2317== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==2317== Command: ./chrootenv false
==2317== 
==2317== 
==2317== HEAP SUMMARY:
==2317==     in use at exit: 0 bytes in 0 blocks
==2317==   total heap usage: 11 allocs, 11 frees, 332,256 bytes allocated
==2317== 
==2317== All heap blocks were freed -- no leaks are possible
==2317== 
==2317== For counts of detected and suppressed errors, rerun with: -v
==2317== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
$ echo $?
1

@lukateras
Copy link
Member Author

lukateras commented Nov 4, 2017

And on dropping CHROOTENV_EXTRA_BINDS: I'm normally against breaking reverse compatibility, but considering this is a rewrite, the scope of packages that use buildFHSUserEnv is very small (currently 5 in tree), it is deprecated in favor of /host and would take around 40 lines of code to implement...

I'm pretty sure we don't break anything by doing that. And @bjornfor who requested that feature initially seems to be OK with it, see #16030 (comment).

@lukateras
Copy link
Member Author

@abbradar On #16030 (comment):

I would really like to see experimental branch that uses /etc/ld-nix.so.cache to find libraries. This would fix #21109, RPG Maker MV on Steam, and probably quite a few other games too: it is common for games to have a start script that overrides LD_LIBRARY_PATH. Nagging game developers to fix the problem usually doesn't help.

@abbradar
Copy link
Member

abbradar commented Nov 4, 2017

First of all, thank you for this rewrite! I have been delaying this for too much time ;)

We can indeed just bind-mount everything to /, good idea. Actually I planned to do something similar (link everything from /host) and also blacklist instead of whitelisting environment variables. This way we would be even closer to the surrounding environment (given that our aim is not isolation but providing FHS-compliant packages installed).

About CHROOTENV_EXTRA_BINDS -- if we move to this direction it could be OK to drop it but I'm not sure if there are not any clever usages of that in the wild (given that one can add arbitrary bind-mounts). Let's see what @bjornfor has to say on that.

On /etc/ld-nix.so.cache -- this won't actually help because one still needs LD_LIBRARY_PATH for things like /run/opengl-driver. I tried to add NIX_LD_LIBRARY_PATH some time ago but ld.so's codebase is awful and I lost my motivation along the way. I have even mentioned problems with LD_LIBRARY_PATH during NixCon but not sure how to tackle that yet.

@lukateras
Copy link
Member Author

lukateras commented Nov 4, 2017

@abbradar I can implement a blacklist (probably *envp[] filtered with strncmp, don't know a better approach, also don't know which environment variables should be in that list) and bind-mounting every dir in / (if /host/host exists, it probably should take precedence?).

I'd prefer this PR to be mostly a feature parity rewrite, once done I'll open a new one :-)

On /etc/ld-nix.so.cache -- this won't actually help because one still needs LD_LIBRARY_PATH for things like /run/opengl-driver. I tried to add NIX_LD_LIBRARY_PATH some time ago but ld.so's codebase is awful and I lost my motivation along the way. I have even mentioned problems with LD_LIBRARY_PATH during NixCon but not sure how to tackle that yet.

I see... I'll look into the linker. This fragment of emultempl/elf32.em looks promising:

if [ "x${NATIVE}" = xyes ] ; then
fragment <<EOF
	  if (command_line.rpath_link == NULL
	      && command_line.rpath == NULL)
	    {
	      lib_path = (const char *) getenv ("LD_RUN_PATH");
	      if (gld${EMULATION_NAME}_search_needed (lib_path, &n,
						      force))
		break;
	    }
	  lib_path = (const char *) getenv ("LD_LIBRARY_PATH");
	  if (gld${EMULATION_NAME}_search_needed (lib_path, &n, force))
	    break;
EOF
fi

@abbradar
Copy link
Member

abbradar commented Nov 4, 2017

Let's proceed with feature parity first then and change semantics later, sounds good to me.

I can't look at the source right now but in my memory it was completely different -- something like using a global static buffer to read and then work with LD_LIBRATY_PATH, so it couldn't be called two times with two different variables without substantial rewrite or hacks. It would be very good if I was mistaken!

@bjornfor
Copy link
Contributor

bjornfor commented Nov 4, 2017

Let's see what @bjornfor has to say on that.

As per my comment in #16030 (comment) (thanks for the reminder!), I'm OK with removing CHROOTENV_EXTRA_BINDS.

@lukateras lukateras mentioned this pull request Nov 4, 2017
8 tasks
@lukateras
Copy link
Member Author

@abbradar I've got it mixed up, the patch above is for binutils ld, not glibc ld.so. However, good news! I have hacked in NIX_LD_LIBRARY_PATH support for ld.so, check it out: #31263

@abbradar
Copy link
Member

abbradar commented Nov 9, 2017

Sorry, I've gotten ill so I won't be able to proceed with this and other PRs for a couple of days more at least (just a heads up).

@lukateras
Copy link
Member Author

Get well! <( ̄︶ ̄)>

I'll clean this up in the meantime.

// Create new mount and user namespaces. CLONE_NEWUSER
// requires a program to be non-threaded.
if (unshare(CLONE_NEWNS | CLONE_NEWUSER) < 0)
errorf(EX_OSERR, "unshare");
Copy link
Member

@Mic92 Mic92 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unshare as user is not allowed on all linux distributions by default. On redhat and clones, it is required to set kernel.unprivileged_userns_clone = 1. Maybe the error message can reflect that if the return value of unshare is EPERM.

Copy link
Member Author

@lukateras lukateras Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, good catch. Also Arch Linux has it disabled by default in kernel (requires linux-userns kernel from AUR), but Debian variants should be OK. Do you think it's appropriate to check for distro-specific files (like /etc/redhat-release) and print distro-specific instructions?

Copy link
Member

@Mic92 Mic92 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the error message generic to keep the code simple:

No permission to create username spaces as unprivileged user.
Please make sure that the kernel has usernamespace support enabled (`CONFIG_USER_NS=y`)
and is enabled for unprivileged users (sysctl -w kernel.unprivileged_userns_clone=1).

Distributions should have enough documentation on how to persist sysctl configuration and installing different kernels and we would most likely not keep the information in our code base up-to-date.

Copy link
Member Author

@lukateras lukateras Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that kernel.unprivileged_userns_clone isn't in Torvalds' tree and most likely in some RHEL-specific patch on top of it:

$ find /proc/sys/kernel -name unprivileged_\*
/proc/sys/kernel/unprivileged_bpf_disabled
/proc/sys/kernel/unprivileged_userns_apparmor_policy

I guess I'll stat /proc/sys/kernel/unprivileged_userns_clone and if it exists, print that it should be enabled.

Copy link
Member

@Mic92 Mic92 Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also fail on old Linux kernels. We may want to have a nice generic error message, in lines of "You may have an old kernel or have CLONE_NEWUSER disabled by your distribution security settings."

Copy link
Member Author

@lukateras lukateras Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abbradar, @Mic92: Got you. Hope they enable it by default at some point (and old kernels will be phased out of use), but for now going to print what you've suggested, supplemented by sysctl instructions if /proc/sys/kernel/unprivileged_userns_clone is present.

Arch Linux should have adopted this patch instead of disabling the option entirely...

@FRidh
Copy link
Member

FRidh commented Nov 23, 2017

Is this ready?

@Mic92
Copy link
Member

Mic92 commented Nov 23, 2017

afaik only the error message is missing.

@lukateras
Copy link
Member Author

lukateras commented Dec 10, 2017

Thank you a lot, @Mic92! This should now be ready to merge. No fancy /proc/sys/kernel/unprivileged_ns_clone handling, but it might be added later.

@Mic92 Mic92 merged commit 8bdbb21 into NixOS:master Dec 10, 2017
@vcunat
Copy link
Member

vcunat commented Dec 10, 2017

BTW (hijacking this thread a little), the steam sandbox blocks network for me. Even the old chroot binaries that used to work won't work anymore (and this PR doesn't fix it). I suspect it's triggered by newer kernel (4.14 ATM but was broken with 4.13 as well), but I haven't confirmed that yet. When I run all the sandbox except for steam command, I get

$ ping nixos.org   
ping: socket: Operation not permitted

@abbradar
Copy link
Member

abbradar commented Dec 10, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chrootenv: buildFHSUserEnv doesn't work inside sandbox Rewrite chroot-user.rb in C/C++
9 participants