Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gvisor: init at 2018-11-10 #50218

Closed
wants to merge 1 commit into from
Closed

Conversation

andrew-d
Copy link
Contributor

Motivation for this change

Add a package for the gvisor container runtime sandbox. This was requested in #39889, but there were some problems with Bazel at the time. I've managed to get this working, but I'd appreciate feedback on how I've done so. In short: there's two derivations here; one that is a fixed-output derivation produced by running bazel sync to download all dependencies and making them deterministic, and a second that uses the above derivation along with the source in order to build the actual output binary. At the end of the whole process, gvisor is runnable:

$ /nix/store/iag5vgl51alqmirabvz5ij9yfp6kwmby-gvisor-2018-11-10/bin/runsc --help
Usage: runsc <flags> <subcommand> <subcommand args>

Subcommands:
	checkpoint       checkpoint current state of container (experimental)
	create           create a secure container
	delete           delete resources held by a container
	events           display container events such as OOM notifications, cpu, memory, and IO usage statistics

I haven't yet tested this with Docker, so I'll try to do that shortly.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

cc @dtzWill and @q3k (on the original issue)
cc @mboes (Bazel maintainer - feedback appreciated!)

@andrew-d
Copy link
Contributor Author

I tried testing this with Docker, but it looks like the CONFIG_CGROUP_PERF kernel option isn't enabled in the default Nixpkgs kernel, leading to some form of incompatibility between gvisor and Docker. I initially get the following error:

error creating container: error configuring cgroup: mkdir /sys/fs/cgroup/perf_event: read-only file system

I can remount the cgroup filesystem as rw (sudo mount -o remount,rw /sys/fs/cgroup), but when doing that or patching gvisor to remove that cgroup from the controller set, I get the error:

unable to find "perf_event" in controller set: unknown.

I don't have time to rebuild my kernel with that cgroup option right now, but I'll try to get it to it soon, unless someone else wants to have a try!

@andrew-d
Copy link
Contributor Author

@orivej - After applying #50225, and setting virtualisation.docker.extraOptions = "--add-runtime=runsc=/nix/store/[...]-gvisor-2018-11-10/bin/runsc";, I'm able to run docker run --runtime=runsc -it ubuntu /bin/bash, apt-get install things from within the container, and generally do things. I also confirmed via ps that gvisor was running! 🎉

find "$out" -name '*.sh' -exec \
sed -i 's|#!/bin/bash|#!${bash}/bin/bash|g' {} \;

find "$out" -name '*.go' -exec \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is only required by tests. Could you try with '*_test.go'?
Also, it woul be nice to patch this upstream:/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into patching this upstream at some point, sure. For now, fixed this, the merge conflict, and force-pushed.

@nlewo
Copy link
Member

nlewo commented Nov 18, 2018

@Profpatsch Do you know if there is a more simple way to prefetch dependencies for Bazel builds in nixpkgs? The goal is to not have to download dependencies with Bazel (bazel sync) at application build time.

@andrew-d
Copy link
Contributor Author

The goal is to not have to download dependencies with Bazel (bazel sync) at application build time.

I tried a couple ways to do this, but wasn't successful. You can use native.existing_rules() to iterate over all repository rules in a Bazel workspace, but turning those into things that Nix can fetch is pretty tricky. Especially since you can't just search for a standard set of rules that access the network, since e.g. rules_go has some repository rules that run custom commands to fetch dependencies. I suspect that bazel sync is probably the best we're going to get, honestly.

@Profpatsch
Copy link
Member

Profpatsch commented Nov 19, 2018

Do you know if there is a more simple way to prefetch dependencies for Bazel builds in nixpkgs?

bazel sync --experimental_repository_resolved_file <filename> is able to produce some kind of lock file, but it’s kinda verbose and not in a well-known format, but skylark. It might be possible to eval it with a python interpreter and spew out some json.

@nlewo
Copy link
Member

nlewo commented Nov 24, 2018

This looks good to me.
But, it's really tricky to build a Bazel project in nixpkgs. It would be nice to have a bazel2nix tool! Moreover, I don't know how this build will be robust on Bazel upgrades.

This is not required, but it would be nice to have a NixOS test that uses this container runtime engine. I could help on that.

@GrahamcOfBorg build gvisor

@GrahamcOfBorg
Copy link

No attempt on aarch64-linux (full log)

The following builds were skipped because they don't evaluate on aarch64-linux: gvisor

Partial log (click to expand)


a) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnsupportedSystem = true; }
in configuration.nix to override this.

b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnsupportedSystem = true; }
to ~/.config/nixpkgs/config.nix.


@GrahamcOfBorg
Copy link

No attempt on x86_64-darwin (full log)

The following builds were skipped because they don't evaluate on x86_64-darwin: gvisor

Partial log (click to expand)


a) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnsupportedSystem = true; }
in configuration.nix to override this.

b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnsupportedSystem = true; }
to ~/.config/nixpkgs/config.nix.


@GrahamcOfBorg
Copy link

Unexpected error: command failed with exit code 1 on x86_64-linux (full log)

Attempted: gvisor

Partial log (click to expand)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   155    0   155    0     0    734      0 --:--:-- --:--:-- --:--:--   734
100 1735k    0 1735k    0     0  1521k      0 --:--:--  0:00:01 --:--:-- 3916k
unpacking source archive /build/d97ccfa346d23d99dcbe634a10fa5d81b089100d.tar.gz
cannot link '/nix/store/.links/1cnjyagqg3s6b6v6j675ryhzk9q9f6fhd88v0dyqw8w6s5b07r7x' to '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source/pkg/sentry/kernel/g3doc/run_states.dot': No space left on device
cannot link '/nix/store/.links/1s37c4s9a74nv7j6xxydif20a7ljydlj8y4z3p69n1dahnn6m7gq' to '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source/pkg/sentry/mm/mm.go': No space left on device
cannot link '/nix/store/.links/0wymhb94n99vs4yf3vjb4gmlva43hcfnbrvac9wldjd85aaznhxp' to '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source/pkg/sentry/fs/proc/uptime.go': No space left on device
warning: path '/nix/store/zwzcdh5x9wr3pq0n6vdzcpgrjcnixr7f-source' claims to be content-addressed but isn't
error: unexpected end-of-file

@andrew-d
Copy link
Contributor Author

@nlewo - This should be pretty reliable through Bazel upgrades; the bazel sync command is the newly-recommended way of doing reproducible builds, and the only other thing that could change is the $TEST_TMPDIR variable, which is currently documented here. Just about everything else is essentially independent of the way Bazel works. I suspect this approach is substantially more reliable than using the Bazel --experimental_repository_resolved_file flag, which could change at any point.

I'll try to get a NixOS test added today or tomorrow; it'll be my first one, so I'll happily take any advice!

In the mean time, do you mind kicking off another build? Looks like the Linux build failed due to disk space, which doesn't look related to this PR.

# NOTE: this is the output of the whole fixed-output derivation, so
# `nix-prefetch-git` won't work to obtain this. The easiest way is to just
# change it and see what breaks :)
sha256 = "1bcnq7kazbf6l5j0g82x2lvg1nbp7z70klk139dxi0jkw0j8dh3r";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the expected hash. Maybe, you forgot to update it when you changed line 63.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhhhhhh, I know what this is. The fixed-output derivation has the paths to bash and coreutils as part of the derivation, so any change to those results in a new hash here too. That's annoying 😒

I think I can fix that by changing the bash calls to /bin/sh, and just dropping the test fixes since we're not running them anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, it would be better to patch shebangs in the patch phase of the bazelDependencies derivation.


outputHashMode = "recursive";
outputHashAlgo = "sha256";
outputHash = "0430pn3q71r6pyxq32k2n1zhnp9hvs5mizvw3zy6zwrsv3fchdb6";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hash is also not the expected one when I locally build it. But this could be related to the update of the hash of patchedSource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I force-pushed the fix for the comment above. If this hash still doesn't work, mind running this command for me (with the correct store path) and uploading the results to Gist / Pastebin / something?

(cd /nix/store/35j7izc656kyppz5nqci9c6rivp2zi9s-gvisor-build-dependencies-2018-11-10 && find . -type f | xargs shasum | sort -k 2,2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrew-d Hashes are not corrects:(
Let me know if https://gist.github.com/nlewo/bbd43f6a7c985e6d70402ac55a439116 helps you: there are hashes of the resulting build temporary directory (nix-store -rK ...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is going to be substantially more annoying than I'd expected. After a bunch of digging, here's what I've found:

  • rules_go uses some helper tools, which they go install into a synthetic repository
  • Bazel creates .marker files to track whether repositories are up-to-date (and which is the vast majority of what differs between our two systems); these marker files appear to include the hash of the underlying files in the working tree.
  • Since these are built using the regular go tooling, and inconsistent paths, they don't have a consistent output hash.
  • Bazel will verify marker as part of the build process, so we can't patch these tools post-hoc (since this means the hashes in the marker files don't match and Bazel tries to re-download everything).

I'm honestly at a bit of a loss; here are my thoughts:

  1. Try to get these tools building in a reproducible fashion (requires an upstream patch in rules_go)
  2. Do something to fix these specific files; the inconsistency comes from a specific debug section in the output binaries (.note.go.buildid), so we could try to zero out that section.

Of the two, I'm going to try to do #1, since the second feels fragile to me. But overall, yeah, this is pretty annoying 😒

(also, I thought about trying to build things with Nix itself, but unlike the *tonix utilities that other languages' package managers use, Bazel repository rules allow running arbitrary shell scripts, so I think we'll always have to run Bazel itself to fetch dependencies)

@andrew-d
Copy link
Contributor Author

Current state: I've submitted a patch to bazel-gazelle to make the helper build tools deterministic (bazelbuild/bazel-gazelle#382) which has been merged, but that's not sufficient; I'm currently chasing down some Nix paths in the dependency output. Most of them are local configuration from the environment, and we can just remove them (rm -rf $out/local_config*), but there's one particular problem that I'm running into:

Our Go compiler has patches[1][2] that replace the absolute /etc/services, /etc/protocols, and /usr/share/zoneinfo paths with Nix store paths. This, however, means that we cannot use a Go binary in a fixed-output derivation, since the binary will contain paths from the Nix store and thus the fixed-output hash will change if those paths ever do. Anyone have any idea what we normally do in cases like this? Or should we just assume that this particular problem is a lost cause, and find some other way of building these binaries?

(also, holy hell is this rapidly turning into something more complicated than I'd originally expected 😛)

@Profpatsch
Copy link
Member

Thanks for putting in the work to research bazel builds inside of nix.

This, however, means that we cannot use a Go binary in a fixed-output derivation, since the binary will contain paths from the Nix store and thus the fixed-output hash will change if those paths ever do. Anyone have any idea what we normally do in cases like this? Or should we just assume that this particular problem is a lost cause, and find some other way of building these binaries?

I haven’t seen fixed-output hashes for anything but implementing fetchers, since they require absolute determinism. Especially with a semi-hermetic build tool like bazel which uses build rules written by imperative programmers (cough rules_go cough) that’s tough to achieve.

Best strategy I can see right now is using their lock file to parse out all hashes and check those hashes into nixpkgs (plus an update script that can update the hashes). Since the output format is skylark, you should be able to parse it as valid python syntax (or eval with all symbols stubbed out).

@andrew-d
Copy link
Contributor Author

This was super annoying, but: I've just force-pushed an update that successfully builds gvisor by manually fetching all dependencies with Nix. It's especially annoying since rules_go applies patches to some third-party libraries, so we have to manually apply those ourselves too, or the build will fail. However: this builds properly for me, now.

@Profpatsch and @nlewo - thoughts on this new approach?

@andrew-d
Copy link
Contributor Author

Just pushed an alternate version; this is now generated by a very WIP script that will parse the Bazel resolved-dependencies file and attempt to convert it to a Nix file. It's pretty hacky, but I'm heading to bed and figured I'd drop it here for now!

@Profpatsch
Copy link
Member

thoughts on this new approach?

I really like it. Would be nice to split out the parser to get a generic transformation from bazel lockfile to nixpkgs package. Of course then the generated code must be overridable, I can help with that if you want. See https://github.com/Profpatsch/yarn2nix/tree/master/nix-lib for an example on how that can be done (there might be some code-reuse possible).

@benpye
Copy link
Contributor

benpye commented Feb 12, 2019

@andrew-d Wondered if you ever continued with this? I really like the idea of running services on my NixOS machine within gVisor KVM containers, especially for things like the Unifi controller I run, a big Java behemoth.

@andrew-d
Copy link
Contributor Author

@benpye - The short answer is "not yet"; I'm currently trying my hand at a slightly more generic "Bazel to Nix" translator, and once I get that working will update this PR with the generated code. This branch does work, though, if you want to apply it to a local fork!

@ghuntley
Copy link
Member

@andrew-d anything you need help with? This is rad.

@andrew-d andrew-d mentioned this pull request Nov 9, 2019
10 tasks
@flokli flokli closed this in #73097 Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants