New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chroot-user: rewrite in C, drop CHROOTENV_EXTRA_BINDS #31182
Conversation
6f1f500
to
e706ea3
Compare
Fixed a few issues thanks to Valgrind, cleaned up. |
|
||
// If we are root, no need to create new user namespace. | ||
if (uid == 0) { | ||
unshare(CLONE_NEWNS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to check result.
unshare(CLONE_NEWNS); | ||
// Mark all mounted filesystems as slave so changes | ||
// don't propagate to the parent mount namespace. | ||
mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to check result.
} else { | ||
// Create new mount and user namespaces. CLONE_NEWUSER | ||
// requires a program to be non-threaded. | ||
unshare(CLONE_NEWNS | CLONE_NEWUSER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to check result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added checks, fixed errno
shadowing after getgid()
.
gid_t gid = getgid(); | ||
|
||
if (uid == -1) | ||
errorf(EX_OSERR, "getuid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errno might be shadowed here by getgid call. The return value must be check immediately.
errorf(EX_OSERR, "execvpe"); | ||
} | ||
|
||
while (wait(NULL) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only wait for our child process here and exit with the same exit status:
I have here some code, I adapted from nsenter
:
https://github.com/Mic92/nsattach/blob/master/nsattach.c#L376
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now doing that.
I've added signal handling code but no SIGSTOP
handling (so no WUNTRACED
flag). I don't think that is the right thing to do here because it means that sending SIGCONT
to child process without repeating it to parent would result in parent being suspended indefinitely.
exit(EX_USAGE); | ||
} | ||
|
||
char *root = mkdtemp(strdup("/tmp/chrootenvXXXXXX")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: memory never freed. Should stack allocation not be sufficient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack allocation here causes segfault, mkdtemp
template should be on the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the following works for me:
#include <stdio.h>
#include <stdlib.h>
int main() {
char template[] = "/tmp/chrootenvXXXXXX";
char *root = mkdtemp(template);
if (!root) {
perror("mkdtemp()");
} else {
printf("see %s\n", root);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, if I swap template[]
with *template
it results in segfault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char *template = "string"
is a pointer to a constant which resides in the read only memory. char template[] = "string"
is a mutable 7 bytes array (initialized with a string literal).
b06a5ad
to
0ac795a
Compare
@abbradar By the way, why do we bind-mount only certain dirs in |
59eaf86
to
9dd5342
Compare
I've fixed
|
And on dropping I'm pretty sure we don't break anything by doing that. And @bjornfor who requested that feature initially seems to be OK with it, see #16030 (comment). |
@abbradar On #16030 (comment): I would really like to see experimental branch that uses |
First of all, thank you for this rewrite! I have been delaying this for too much time ;) We can indeed just bind-mount everything to About On |
@abbradar I can implement a blacklist (probably I'd prefer this PR to be mostly a feature parity rewrite, once done I'll open a new one :-)
I see... I'll look into the linker. This fragment of
|
Let's proceed with feature parity first then and change semantics later, sounds good to me. I can't look at the source right now but in my memory it was completely different -- something like using a global static buffer to read and then work with |
As per my comment in #16030 (comment) (thanks for the reminder!), I'm OK with removing CHROOTENV_EXTRA_BINDS. |
Sorry, I've gotten ill so I won't be able to proceed with this and other PRs for a couple of days more at least (just a heads up). |
Get well! <( ̄︶ ̄)> I'll clean this up in the meantime. |
Formatted via clang-format.
9dd5342
to
edb59ee
Compare
// Create new mount and user namespaces. CLONE_NEWUSER | ||
// requires a program to be non-threaded. | ||
if (unshare(CLONE_NEWNS | CLONE_NEWUSER) < 0) | ||
errorf(EX_OSERR, "unshare"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unshare
as user is not allowed on all linux distributions by default. On redhat and clones, it is required to set kernel.unprivileged_userns_clone = 1
. Maybe the error message can reflect that if the return value of unshare
is EPERM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, good catch. Also Arch Linux has it disabled by default in kernel (requires linux-userns
kernel from AUR), but Debian variants should be OK. Do you think it's appropriate to check for distro-specific files (like /etc/redhat-release
) and print distro-specific instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the error message generic to keep the code simple:
No permission to create username spaces as unprivileged user.
Please make sure that the kernel has usernamespace support enabled (`CONFIG_USER_NS=y`)
and is enabled for unprivileged users (sysctl -w kernel.unprivileged_userns_clone=1).
Distributions should have enough documentation on how to persist sysctl configuration and installing different kernels and we would most likely not keep the information in our code base up-to-date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that kernel.unprivileged_userns_clone
isn't in Torvalds' tree and most likely in some RHEL-specific patch on top of it:
$ find /proc/sys/kernel -name unprivileged_\*
/proc/sys/kernel/unprivileged_bpf_disabled
/proc/sys/kernel/unprivileged_userns_apparmor_policy
I guess I'll stat /proc/sys/kernel/unprivileged_userns_clone
and if it exists, print that it should be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be actually debian/ubuntu specific and not on redhat/fedora: https://anonscm.debian.org/git/kernel/linux.git/tree/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also fail on old Linux kernels. We may want to have a nice generic error message, in lines of "You may have an old kernel or have CLONE_NEWUSER disabled by your distribution security settings."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abbradar, @Mic92: Got you. Hope they enable it by default at some point (and old kernels will be phased out of use), but for now going to print what you've suggested, supplemented by sysctl
instructions if /proc/sys/kernel/unprivileged_userns_clone
is present.
Arch Linux should have adopted this patch instead of disabling the option entirely...
Is this ready? |
afaik only the error message is missing. |
Thank you a lot, @Mic92! This should now be ready to merge. No fancy |
BTW (hijacking this thread a little), the steam sandbox blocks network for me. Even the old chroot binaries that used to work won't work anymore (and this PR doesn't fix it). I suspect it's triggered by newer kernel (4.14 ATM but was broken with 4.13 as well), but I haven't confirmed that yet. When I run all the sandbox except for steam command, I get
|
A heads up (I'm not able to read the complete thread right now) - I'm in a hospital and won't be able to work on this or other Nix things for a while. Feel free to merge if consensus is that it's ready - I can always review it and make changes later.
--
Nikolay.
|
Motivation for this change
Resolves #19047, fixes #31104. Related: #16030
Rebuilt several small derivations that use
buildFHSUserEnv
, currently in the process of rebuilding Steam runtime, will tell how it goes. All good so far.Runs ~70ms (4 times) faster than
chroot-user.rb
on my machine. Basically the same LOC count./cc @abbradar @bjornfor @Mic92 @spacekitteh
Things done
build-use-sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)