Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor nixos-install to separate out filesystem build logic #23026

Merged
merged 1 commit into from Apr 17, 2017

Conversation

copumpkin
Copy link
Member

@copumpkin copumpkin commented Feb 20, 2017

The key distinction I'm drawing is that there's a component that deals with the store of the machine being built, and another component for the store building it. The inner part of it assumes nothing from the builder (doesn't need chroot or root powers) so it can run comfortably inside a Nix build, as well as nixos-rebuild. I have some upcoming work that will use that to significantly speed up and streamline image builds for NixOS, especially on virtualized hosts like EC2, but it's also a
reasonable speedup on native hosts.

This still doesn't quite work (the installer tests don't pass) but this is the general shape of what I'd like to use, so would appreciate any early feedback anyone has.

Diff-wise, it might be productive to also diff nixos-prepare-root.sh against the old nixos-install.sh, since much of the stuff that I deleted from the latter ended up in the former, often unchanged. I don't know how to make GitHub do that for me, but if it can, please post a comment.

cc @edolstra @domenkozar @shlevy @obadz @cleverca22

Motivation for this change

The current NixOS image building machinery needs to jump through quite a few hoops to use the current nixos-install inside a build, so it fires up a VM and behaves impurely inside it. While that's convenient, in practice it's a bit finicky, doing some work on the host, some work on nix database inside the VM, and other work on the nix store/db inside the filesystem being built.

This PR is the first step to fixing that, with another PR afterwards that will use nixos-prepare-root from the image building machinery. See also #21943 (comment) for more about how it works.

Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@mention-bot
Copy link

@copumpkin, thanks for your PR! By analyzing the history of the files in this pull request, we identified @edolstra, @obadz and @Shados to be potential reviewers.

ln -s /run $mountPoint/var/run
for f in /etc/resolv.conf /etc/hosts; do rm -f $mountPoint/$f; [ -f "$f" ] && cp -Lf $f $mountPoint/etc/; done
for f in /etc/passwd /etc/group; do touch $mountPoint/$f; [ -f "$f" ] && mount --rbind -o ro $f $mountPoint/$f; done
set -x
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I'd obviously take this out later 😄


set -e

mountPoint="$1"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to pretend this has a user-friendly CLI. It's mostly intended to be used by nixos-install.sh and the upcoming new image building machinery right now. If someone has a suggestion on how to remove the duplicate closure vs. path stuff in this CLI, I'm all ears.

# inside one. See https://github.com/NixOS/nix/issues/1242 for more discussion.
if [[ "$i" =~ \.closure$ ]]; then
echo "importing serialized closure $i to $mountPoint..."
fakechroot -- chroot $mountPoint nix-store --import < $i
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure I could use unshare for a user namespace here but fakechroot will work on grsecurity hosts, whereas the user namespaces have been turned off. fakechroot seemed pretty harmless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, with 1.12 I think you can just set the store to be a store at a different root without fakechroot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is there a strong reason not to just hard-code this to use 1.12 and lose the fakechroot dep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, trying that out now 😄

@copumpkin
Copy link
Member Author

copumpkin commented Feb 21, 2017

I made a few changes that make the installer tests get farther, but I'm still noticing some weirdness that I haven't figured out yet. I successfully install NixOS and then fail here, on nixos-rebuild switch on the built machine, because it can't hit the internet and appears to be missing some build dependencies to rebuild.

machine# download-from-binary-cache.pl: could not download ‘https://cache.nixos.org/nix-cache-info’ (Curl error 6)
machine# these derivations will be built:
machine#   /nix/store/1lfgp13jfn183vpb01ib009iminy68w8-xz-5.2.2.tar.bz2.drv
machine#   /nix/store/9a07qfy74aap61kvn2gwyw2i6gq82hdn-bootstrap-tools.tar.xz.drv
machine#   /nix/store/d5kl9r7v9a8hkccyrfachyqw2a37d5dp-busybox.drv
machine#   /nix/store/jkwjgpjzk2x8bsc38vhwvkkz7hd97gkg-bootstrap-tools.drv
machine#   /nix/store/2rggcvlx8dr6jmbpjrjjbd6lbzw0a9zq-stdenv-linux-boot.drv
machine#   /nix/store/9v0pyazk4d36bh2q3x3yslzrqizlypz9-binutils-2.27.tar.bz2.drv
machine#   /nix/store/whpn4xwnvhbcb921gn327z5ia0lranzn-bootstrap-glibc.drv
machine#   /nix/store/584q03x4rmb0xkh70xf8zd2vcy1bzdxj-bootstrap-gcc-wrapper.drv
machine#   /nix/store/if1la35hs9xrzr87rwr3l37dkn5z2hrl-stdenv-linux-boot.drv
machine#   /nix/store/0jk4wzg11sa6cqyw8g7w5lb35axji969-bison-3.0.4.tar.gz.drv
machine#   /nix/store/msyjjsa94dwb4fpgr7bz5hb8iyv1b69m-perl-5.22.2.tar.gz.drv
machine#   /nix/store/lmikn95hbyv1nq8rgq5l389ynpy4j4hh-perl-5.22.2.drv
machine#   /nix/store/qbg663xar2lvp8fgiv45dil429vmjhka-m4-1.4.18.tar.bz2.drv
machine#   /nix/store/vfj4lsk2789f1bqvvb6p6kawf3w217pc-gnum4-1.4.18.drv
machine#   /nix/store/nv9lfpqnc7x2rjhjw7wq66kfrm48rjlg-bison-3.0.4.drv
machine#   /nix/store/wnz60v3mvzqnxdf8jvpj9imxliqf3d3z-zlib-1.2.11.tar.gz.drv
machine#   /nix/store/yflvx1wh9cjy0f7pv5g7413109xmyffz-zlib-1.2.11.drv
machine#   /nix/store/lyfmkmpp2dvz1rg37kbiihz7z9snv0f1-binutils-2.27.drv
machine#   /nix/store/ng11v2xfm747cyinnc5jxpx73nlv24cv-glibc-2.24.tar.gz.drv
machine#   /nix/store/gg4aprv1apwqjszqyl5ikb96xhc46nrc-bootstrap-gcc-wrapper.drv
machine#   /nix/store/wr99dvzilk56dwiw9g7fv7v6flqldjy2-stdenv-linux-boot.drv
machine#   /nix/store/ya8fvzagm3iv4vym8p86fxf3ghsqdbkd-linux-4.4.10.tar.xz.drv
machine#   /nix/store/s3vbfcw1cqc5jncmkflmfa4v47v7d41a-linux-headers-4.4.10.drv
machine#   /nix/store/n20xrh5s3zk5sdq7h4jj60z0r0v69l5y-glibc-2.24.drv
machine#   /nix/store/2xbwfb52c9ifrfk6h1iqry360awbcz0n-bootstrap-gcc-wrapper.drv
machine#   /nix/store/cbv0pbv41zi58w0gamwqg6apff18b8yw-patchelf-0.9.tar.bz2.drv
machine#   /nix/store/b9xnr59rs8ffqdii8yh2jmldca0va1kd-patchelf-0.9.drv
machine#   /nix/store/vsrhhza557w6ji9w81z1xfd9pnxq1x8c-paxctl-0.9.tar.gz.drv
machine#   /nix/store/jdvvmxc5nyyvqs1p4g2cx3g89za3b54r-paxctl-0.9.drv
machine#   /nix/store/xa13ssgxmpb448ysbp6jpv9w9578b4yv-stdenv-linux-boot.drv
machine#   /nix/store/6lx8d3yfq6ckjmp180p928pwcszawsjm-xz-5.2.2.drv
machine#   /nix/store/pb2510i8mylgwslh0yb3x32b132f6vhf-gettext-0.19.8.tar.gz.drv
machine#   /nix/store/0a873a7aw2cny37r253vraac1pr0qxv4-gettext-0.19.8.drv
machine#   /nix/store/1arn4rhjxgwxfnah66vqnmwldp0vcgry-libelf-0.8.13.tar.gz.drv
machine#   /nix/store/8hwq087hb244zzg0aj4rf60vkjiyl0pg-make-4.2.1.tar.bz2.drv
machine#   /nix/store/34i4zfc3advw6a6kjx2bmrfycyvrv4wc-libelf-0.8.13.drv
machine#   /nix/store/kmirmhxw429d0qnyb6g0mkvfnn7fxgk2-which-2.21.tar.gz.drv
machine#   /nix/store/9pj6ksycmfn3f2j352fl521v0knazxh4-which-2.21.drv
machine#   /nix/store/czh06mhrvyry9iv2m8kc9i2d6isaykif-zlib-1.2.11.drv
machine#   /nix/store/r29bqz3f94qvczlh453db1d4fd6rhqld-texinfo-6.3.tar.xz.drv
machine#   /nix/store/hns594qk3ah3rynsjisxxs5i430wqklr-texinfo-6.3.drv
machine#   /nix/store/990piqwwf6sigdx6bvk07vlcmvcn1c27-mpc-1.0.3.tar.gz.drv
machine#   /nix/store/5nrb9rc65gjpy98ppic6m6imfwi7dlgl-gmp-6.1.1.tar.bz2.drv
machine#   /nix/store/mwsm9970prcyjc0n7pwzqz3pm8c6hyib-gmp-6.1.1.drv
machine#   /nix/store/nbd14rg37xizix6hjjz9y6f0np9l76bc-mpfr-3.1.3.tar.bz2.drv
machine#   /nix/store/zy92zssjzqhpibm02fbw6vn73svl2fyn-mpfr-3.1.3.drv
machine#   /nix/store/jb2riy0py5wdvwg8xi0i9s444hhydr3i-libmpc-1.0.3.drv
machine#   /nix/store/ncvpy15854bmmh4jpgcz304w1c27mc83-gcc-5.4.0.tar.bz2.drv
machine#   /nix/store/pb3p5akfd6qnx44mv05w9qk6v1axb66d-isl-0.14.1.tar.xz.drv
machine#   /nix/store/szvv8jjcnasqzx2z14nlg04xw909g9mx-isl-0.14.1.drv
machine#   /nix/store/vv4ikfk1hk5n29yz9q658162ix5483jp-gcc-5.4.0.drv
machine#   /nix/store/3lasd081kr10ax9dya8kqrsj8z3gca0n-gcc-wrapper-5.4.0.drv
machine#   /nix/store/hk6b9vwwn9fb3rvf8lx8c6i72146lq0h-stdenv-linux-boot.drv
machine#   /nix/store/53hb333pdsrzqyfy2yjp14n9xb93rg39-gnumake-4.2.1.drv
machine#   /nix/store/944cg60y38c00js3hsx3f4vbhp5q0n07-paxctl-0.9.drv
machine#   /nix/store/gm4y5ldqzk7z1syb7v7brviq15jnfa70-patchelf-0.9.drv
machine#   /nix/store/pbpan36lq46am4jrhxvqbk9il01l9idp-stdenv.drv
machine#   /nix/store/7mq4f9jx7lhs6q9zi525lvaqfz1b17ix-grub-config.xml.drv
machine#   /nix/store/2kyc6h9gidd2pdff314jnsz0cc5mknlm-install-grub.sh.drv
machine#   /nix/store/6hwmsjg8jplyv5fqji11izavqczgripx-stage-2-init.sh.drv
machine#   /nix/store/7c81n3151pfav1h3zllyxr9y68dpmxf7-nixos-system-nixos-17.03.git.d9d1eb5.drv
machine# building path(s) ‘/nix/store/420qx5p8cc7w4jakspms2c73430xwnyn-make-4.2.1.tar.bz2’
machine# downloading ‘http://ftpmirror.gnu.org/make/make-4.2.1.tar.bz2’... 
machine# error: unable to download ‘http://ftpmirror.gnu.org/make/make-4.2.1.tar.bz2’: Couldn't resolve host name (6)
machine# builder for ‘/nix/store/8hwq087hb244zzg0aj4rf60vkjiyl0pg-make-4.2.1.tar.bz2.drv’ failed with exit code 1
machine# building path(s) ‘/nix/store/rxf5acfv3r1w5ax06dsffsnz5hxrr294-binutils-2.27.tar.bz2’
machine# cannot build derivation ‘/nix/store/53hb333pdsrzqyfy2yjp14n9xb93rg39-gnumake-4.2.1.drv’: 1 dependencies couldn't be built
machine# cannot build derivation ‘/nix/store/pbpan36lq46am4jrhxvqbk9il01l9idp-stdenv.drv’: 1 dependencies couldn't be built
machine# cannot build derivation ‘/nix/store/7c81n3151pfav1h3zllyxr9y68dpmxf7-nixos-system-nixos-17.03.git.d9d1eb5.drv’: 1 dependencies couldn't be built
machine# error: build of ‘/nix/store/7c81n3151pfav1h3zllyxr9y68dpmxf7-nixos-system-nixos-17.03.git.d9d1eb5.drv’ failed

I've been studying the current nixos-install logic to figure out how it manages to get build-time dependencies into its closure but can't see anything obvious that would do it, yet it works. One thing you'll notice is that I threw $machine->succeed("nix-build '<nixpkgs/nixos>' -A system >&2"); into the test script right before we edit configuration.nix and it does no rebuilding, so the channel configuration appears to be fine. It's only when the test modifies configuration.nix and forces it to rebuild some basic stuff that everything falls apart. Normally I'd check system.extraDependencies, but given that the test worked before this PR, it's obviously something I'm doing wrong. Does anyone know how the build tools for nixos-rebuild switch end up in the store normally?

P.S: please excuse the messy PR right now; I'm trying to post my current state to aid in debugging, and will obviously clean up and improve things (like e.g., the if true) once I get it all working smoothly.

@copumpkin
Copy link
Member Author

Aha, @7c6f434c points out that I need the build-time closure of the system for that part of the test to pass, which ends up on the system in the current nixos-install.sh because we build it on the target system. So now I'm at NixOS/nix#1245

@dezgeg
Copy link
Contributor

dezgeg commented Feb 27, 2017

I don't think it's feasible to have nixos-install build the closure outside of the target - consider that on the actual LiveCD anything you download/build to the LiveCD's /nix/store needs to fit to RAM.

@copumpkin
Copy link
Member Author

@dezgeg good point, but it seems like we could probably make the "build inside target" thing happen more explicitly, rather than the current approach which feels sort of accidental and pollutes images with build-time crap that you don't necessarily want on small images

@copumpkin
Copy link
Member Author

Okay, amazingly I was able to get all the installer tests passing. Cleaning up the change now and will update soon!!

@copumpkin
Copy link
Member Author

copumpkin commented Apr 13, 2017

I think the latest commit works for everything, but doesn't address @dezgeg's issue yet. I'll think about how best to deal with that now. But anyone who's feeling in the mood for a review, I'd welcome it!

@shlevy
Copy link
Member

shlevy commented Apr 13, 2017

@copumpkin Is the commit message also up to date?

@copumpkin
Copy link
Member Author

@shlevy yeah. I figured I'd leave the [WIP] to think about @dezgeg's point. If we're okay with temporarily requiring a bit more memory on a live CD, we can probably merge in this state (pending feedback of course)

# Inherit binary caches from the host
# TODO: will this still work with Nix 1.12 now that it has no perl? Probably not...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copumpkin Have you tested this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with 1.12, nope. I guess there's no downside now; when I first wrote it I was trying to avoid 1.12 due to the schema version issues. I'll point it at nixUnstable and try out the new stuff!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I've switched nixos-prepare-root to nixUnstable, but still haven't tested using nixUnstable on nixos-install. I don't think that hurts this and I don't want to force more nixUnstable than we need.

binary_caches="$(@perl@/bin/perl -I @nix@/lib/perl5/site_perl/*/* -e 'use Nix::Config; Nix::Config::readConfig; print $Nix::Config::config{"binary-caches"};')"
extraBuildFlags+=(--option "binary-caches" "$binary_caches")

nixpkgs="$(readlink -f $(nix-instantiate --find-file nixpkgs))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking should probably have quotes around the inner $()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir!

unset NIX_BUILD_HOOK
# A place to drop temporary closures
tmpdir="$(mktemp -d)"
trap "rm -rf $tmpdir" EXIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I prefer to put the trap before the mktemp call to handle the (admittedly silly) case of an interrupt between the mktemp and the trap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dammit you thwarted my evil attempts to subvert everything. Fine, I'll add quotes there too!

system_root=$closure
# Create a temporary file ending in .closure (so nixos-prepare-root knows to --import it) to transport the store closure
# to the filesytem we're preparing. Also delete it on exit!
nix-store --export $closure > $system_closure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may want nix-store --export $(nix-store -qR $closure) here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good point. Hadn't tested that branch

# inside one. See https://github.com/NixOS/nix/issues/1242 for more discussion.
if [[ "$i" =~ \.closure$ ]]; then
echo "importing serialized closure $i to $mountPoint..."
fakechroot -- chroot $mountPoint nix-store --import < $i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, with 1.12 I think you can just set the store to be a store at a different root without fakechroot

# valid, delete it to get it out of the way, but as a result nothing
# will work anymore.)

fakechroot -- chroot $mountPoint nix-store --register-validity < $i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

# inside one. See https://github.com/NixOS/nix/issues/1242 for more discussion.
if [[ "$i" =~ \.closure$ ]]; then
echo "importing serialized closure $i to $mountPoint..."
fakechroot -- chroot $mountPoint nix-store --import < $i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is there a strong reason not to just hard-code this to use 1.12 and lose the fakechroot dep?

# Create the required /bin/sh symlink; otherwise lots of things
# (notably the system() function) won't work.
mkdir -m 0755 -p $mountPoint/bin
# !!! assuming that @shell@ is in the closure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add a shell.closure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

if [ -z "$noChannelCopy" ] && [ -n "$channel" ]; then
echo "copying channel..."
fakechroot -- chroot $mountPoint nix-env --option build-use-substitutes false \
"${extraBuildFlags[@]}" -p /nix/var/nix/profiles/per-user/root/channels --set "$channel" --quiet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will entail actual building, right? Or is there a channel.closure I missed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a channel_closure a little farther down, but you're right, I probably screwed something up here.

ln -sfn /nix/var/nix/profiles/per-user/root/channels $mountPoint/root/.nix-defexpr/channels

# Grub needs an mtab.
ln -sfn /proc/mounts $mountPoint/etc/mtab
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it belongs a bit better to nixos-install

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@edolstra
Copy link
Member

I really, really don't want to complicate the installer with fakechroot stuff or a separate "prepare root" script.

Also, Nix 1.12 should enable us to simplify the installer quite a bit. We will be able to use nix-build with a store URI of local?root=/mnt, and local as a substituter to reuse paths from the installation CD. This will remove the need to copy a Nix closure to /mnt. In fact a lot of the /mnt initialisation can be omitted since it happens automatically.

@copumpkin
Copy link
Member Author

copumpkin commented Apr 13, 2017

@edolstra can you explain how this is complicating anything, if I get rid of the fakechroot? I'd call it better separation of concerns, and was quite clear about why I'm doing this and what functionality it enables.

Just to reiterate, I literally cannot build disk images on several platforms. Either they take half an hour on EC2, or simply do not work under at least two local virtualization solutions. I understand wanting to keep few moving parts, but this seems like a refactor that splits the logic out with a sensible separation, so the whole "this is more complicated because it's different" thing just feels vague and unactionable. I'm "complicating" the installer (by removing 75% of its code?) precisely because you told me that I shouldn't duplicate the logic in it in my last PR attempting to build images in a sensible way. You seem to be leaving me in a conundrum where you don't want me to "complicate" the installer, but don't want me to duplicate its logic, but technology prevents me from actually using it to build images. Can you tell me what you expect me to do?

Edit: just TBC, most of the "diff" lines in this commit just arise from moving logic from one script to another (which has explicit comments as to its purpose and contract) so I can better reuse it. So I really don't get what's more complicated than before, outside of the fakechroot.

@copumpkin
Copy link
Member Author

Just to reiterate, I literally cannot build disk images on several platforms.

To put this in perspective, I've been paying packet.net $0.40 an hour for the past several days just so I could test this and fix the installer tests, because I have no Linux boxes of my own, I can't run this inside a VM on my Mac, and it's completely untenable to do any of this on EC2 (if I want to get it done this year at least).

@shlevy
Copy link
Member

shlevy commented Apr 13, 2017

@copumpkin I think you wanted to ping @edolstra 😄

@copumpkin
Copy link
Member Author

copumpkin commented Apr 13, 2017

For anyone following, @edolstra and I hashed out our concerns on IRC, and I'd summarize them as follows:

  • @edolstra (understandably) wants the image build process to be as similar to the normal install process as possible, and this PR is my first step towards making them diverge more than they do today.
  • My concern is speed, efficiency, and portability of the image build process (i.e., I want it to work in more places, and do so efficiently). I'm happy to have slight discrepancies (although I'll try to minimize them) if I can get image builds working efficiently on virtualized platforms (major examples being EC2, other cloud providers, and macOS local VMs). I too strive for the two processes to be as similar as possible but contend that "as possible" should scale back a bit if it's actively breaking the process (or making it slow to the point of being impractical) on some platforms.

I actually think that in the slightly longer term, there's a good resolution that will make nixos-install look more like an "image" (not literally) build rather than today's approach of making an image build look like nixos-install.

Until then, @edolstra has said he won't veto this PR if others think this is worthwhile. I'm going to post to the mailing list and see if we can get a sample of opinions. I know of at least a few people who have been bitten by the virtualization issues, but it'd be nice to hear more. If you have thoughts on the matter, please chime in here, either with 👍 (if you support merging this PR) or 👎 (if you don't) on this comment, and if you have thoughts to add in support of your thumb, please do.

@7c6f434c
Copy link
Member

My position: separating phases in multiple scripts that do almost the same as the current script but have clearer functionality is a nice thing to have.

My personal interests: I am currently using something that reuses parts of NixOS code but has significant differences (NixPkgs is still the main package source). Personally, I am obviously interested in maximal separation of concern in NixOS code. Basically, I can help people on IRC with parts of NixOS that are useful or at least runnable outside literal NixOS, and being able to help people is nice…

[Ideally I would also hope that eventually there can be explicit support for convenient booting NixOS and non-NixOS Nix-based systems sharing a store and updating different profiles. This would allow to create handcoded proof-of-concept bootscripts to show off a single feature and then integrate them into NixOS if anyone cares. This is not in scope of the current PR, though.]

@obadz
Copy link
Contributor

obadz commented Apr 13, 2017

To be fair this script didn't even invoke nixos-install before my 24f8cf0. I think it's an OK state to revert to if there's a good reason for it and we specifically design to make this relatively painless. So 👍

@cstrahan
Copy link
Contributor

I've been (intermittently) following this PR with much anticipation.

Just to reiterate, I literally cannot build disk images on several platforms. Either they take half an hour on EC2, or simply do not work under at least two local virtualization solutions.

This is huge to me. I think the costs in terms of discrepancies can be minimized, and what's left is made up for in speed and portability.

In concept, I'm a big 👍 on this. It looks like we have a path forward without using fakechroot, and I'm sure we'll come up with a solution to the problem of bootloader installation.

(Where "we" mostly means @copumpkin, I suspect -- I'll mostly be cheering from the sidelines on this one)

@@ -123,6 +123,9 @@ let
$machine->succeed("type -tP ps | tee /dev/stderr") =~ /.nix-profile/
or die "nix-env failed";

# This should do nothing
$machine->succeed("nixos-rebuild switch >&2");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, nixos-rebuild switch does not do nothing. In all cases, even if this is the same as the previous system, it will still create a new generation to the system profile, update the symbolic link, and run the activation script.

Maybe, what you should be testing is the the current-system still targets the same path in the /nix/store, to check the the output is the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up taking this out since it doesn't really do much, and I don't want to test basic functionality of nixos-rebuild switch in this test.

@copumpkin
Copy link
Member Author

Thanks for all the support!

Anyway, almost all feedback is now incorporated as of my push just now. Will make a few more tweaks soon (to do with which closures I pass around and how) but I think we're pretty close.

@shlevy
Copy link
Member

shlevy commented Apr 14, 2017

@copumpkin Can you re-request review from me when you think this is merge ready?

@copumpkin
Copy link
Member Author

Will do, thanks!

@copumpkin copumpkin changed the title [WIP] Refactor nixos-install to separate out filesystem build logic Refactor nixos-install to separate out filesystem build logic Apr 14, 2017
@copumpkin copumpkin requested a review from shlevy April 14, 2017 23:50
@copumpkin
Copy link
Member Author

@shlevy requested another review. I think this is basically ready, except for @dezgeg's RAM concern which I'd like to address in a separate commit, and will write another VM test for limited RAM to check it. All installer tests currently pass.

One thing I noticed after rebasing onto master (perhaps it was happening before too?) is that the grub1 test prints out a bunch of disconcerting segfaults, but then passes anyway: http://lpaste.net/7158756514360983552. I don't think it has anything to do with this commit, but it doesn't make me feel great.

@copumpkin
Copy link
Member Author

copumpkin commented Apr 14, 2017

I guess the grub1 test prints out a bunch of segfaults without failing on master too... 🤔🤢 http://hydra.nixos.org/build/51598743/log

mkdir -m 0755 -p $mountPoint/bin
# !!! assuming that @shell@ is in the closure
ln -sf @shell@ $mountPoint/bin/sh
# TODO: do I need to set NIX_SUBSTITUTERS here or is the --option binary-caches above enough?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone know what this does and whether it's necessary? My understanding is that we only used it before to handle the copy-from-other-stores.pl substituter, but I'm not really sure.

mkdir -m 1775 -p $mountPoint/nix/store

# TODO: move this back
# chown @root_uid@:@nixbld_gid@ $mountPoint/nix/store
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I welcome thoughts about this. My sense is that it should go into nixos-install, but I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sandboxing is enabled, is root:root more sensible ownership for the store than root:nixbld?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but we can't assume it's always enabled. I wish we could!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that all the bind mounts will give the desired effect even with sandboxing enabled, mind you. If you don't assume sandboxing, then nixos-prepare-root.sh should create a non-broken Nix situation that includes performing chown, so moving chown back from the specific installation script makes a lot of sense…

@copumpkin
Copy link
Member Author

This should be ready for review if anyone's feeling adventurous 😄

The key distinction I'm drawing is that there's a component that deals
with the store of the machine being built, and another component for
the store building it. The inner part of it assumes nothing from the
builder (doesn't need chroot or root powers) so it can run comfortably
inside a Nix build, as well as nixos-rebuild. I have some upcoming work
that will use that to significantly speed up and streamline image builds
for NixOS, especially on virtualized hosts like EC2, but it's also a
reasonable speedup on native hosts.
@copumpkin
Copy link
Member Author

copumpkin commented Apr 16, 2017

I've moved the chown of the Nix store back to nixos-install.sh for now. My hesitation comes from the recent user namespace work in Nix builders and I think we use seccomp to prevent chown/chgrp calls from inside builders. I have another way to change ownership inside LKL for the image builder, but until we resolve whether builders should be able to chown (in temporary build folders; obviously the store has one owner) I'm going to leave the chown call in nixos-install and the upcoming image script.

I also added some other checks to the installer tests to make sure other things are being done properly. For a while we generated images with invalid Nix registration data, so I added a verify, and I also added a non-root user to make sure that the daemon is set up correctly and that users can use it to run builds.

patchPhase = ''
substituteInPlace src/libexpr/json-to-value.cc \
--replace 'std::less<Symbol>, gc_allocator<Value *>' \
'std::less<Symbol>, gc_allocator<std::pair<const Symbol, Value *> >'

sed -i '/if (settings.readOnlyMode) {/a curSchema = getSchema();' src/libstore/local-store.cc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copumpkin I was curious about this, too. Perhaps a comment here pointing at a specific Nix issue, or just a short description, would help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping for a 1.11.9 to go out, but I'll add a comment until then

@copumpkin
Copy link
Member Author

copumpkin commented Apr 16, 2017 via email

@copumpkin
Copy link
Member Author

copumpkin commented Apr 16, 2017 via email

@copumpkin
Copy link
Member Author

Okay, it looks there's a lot of support for merging this. I'm going to go ahead and merge, and if there are any outstanding objections after this we can revert the commit.

The follow-up PR is #24964

@copumpkin copumpkin merged commit e9f1d86 into NixOS:master Apr 17, 2017
@copumpkin copumpkin added this to Improve speed & portability in Amazing images Apr 17, 2017
@vcunat
Copy link
Member

vcunat commented Apr 23, 2017

@copumpkin: it seems very likely that this merge is what breaks the build of ova and thus blocks nixos-unstable.

nixos-install: line 109: nix-instantiate: command not found

I haven't really read this PR nor the followup.

@vcunat
Copy link
Member

vcunat commented Apr 23, 2017

BTW, the followup PR fixes the build.

@copumpkin
Copy link
Member Author

copumpkin commented Apr 23, 2017 via email

@vcunat
Copy link
Member

vcunat commented Apr 23, 2017

OK to me as long as it's not going to block the channel for days.

@copumpkin
Copy link
Member Author

copumpkin commented Apr 23, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Amazing images
Improve speed & portability
Development

Successfully merging this pull request may close these issues.

None yet