New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, don't merge] make-disk-image: change to be less VM-centric #21943
Conversation
@copumpkin, thanks for your PR! By analyzing the history of the files in this pull request, we identified @edolstra, @obadz and @domenkozar to be potential reviewers. |
21faa8c
to
6cb547f
Compare
cc @rbvermaa who might also be interested, since my main motivator is to use EC2 for image building (and eventually, revamping "VM tests" to work nicely on there too) |
Nice! |
Does Amazon support UEFI? With it it might be simple enough to just manually do the bootloader installation, which AFAICT is the only thing the VM is used for. |
Not really in favor of this. The VM implementation works fine and doesn't require tricks like |
@edolstra "Works fine" until we try to use any of the cloud providers out
there? I want to expand use of NixOS; it makes it hard for me to argue to
do that when building a simple image takes half an hour because it assumes
things that force us to use a different compute provider just because
nobody thought that avoiding incidental global resource use was worth
slightly more cautious code? You may be the only Hydra out there today but
I'm trying to change that and it'll be hard if you fight changes that make
it easier for others to spin them up...
…On Tue, Jan 17, 2017 at 04:43 Eelco Dolstra ***@***.***> wrote:
Not really in favor of this. The VM implementation works fine and doesn't
require tricks like fakechroot or cptofs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21943 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAKPymlA4aBx6aQwzZEDjnVrotf1zH_ks5rTI0qgaJpZM4LlNZN>
.
|
The slowness comes from the fact that EC2 doesn't use nested virtualization. This is hurting me as well, having 1h cycles to deploy a change is painful. With this and some improvements to image generation we could get down to a few minutes. I wonder what @obadz think about this :) |
It's not just EC2. None of the major cloud providers (GCE, Azure, etc.) do,
and I can't really argue with a straight face to companies that they should
send their compute loads to another country (Hetzner) because NixOS decided
to use VMs for simplicity where everything else they run works fine on
their current provider.
…On Tue, Jan 17, 2017 at 05:40 Domen Kožar ***@***.***> wrote:
The slowness comes from the fact that EC2 doesn't use nested
virtualization. This is hurting me as well, having 1h cycles to deploy a
change is painful. With this and some improvements to image generation we
could get down to a few minutes.
I wonder what @obadz <https://github.com/obadz> think about this :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21943 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAKP5SUDne3IFyOnJgsH7UBUgUlUEJSks5rTJqpgaJpZM4LlNZN>
.
|
Hell, this doesn't even seem much more complicated in concept than the old
way; it's just different and needs a few refactors to support it nicely,
that I'm willing to do.
…On Tue, Jan 17, 2017 at 05:44 Daniel Peebles ***@***.***> wrote:
It's not just EC2. None of the major cloud providers (GCE, Azure, etc.)
do, and I can't really argue with a straight face to companies that they
should send their compute loads to another country (Hetzner) because NixOS
decided to use VMs for simplicity where everything else they run works fine
on their current provider.
On Tue, Jan 17, 2017 at 05:40 Domen Kožar ***@***.***>
wrote:
The slowness comes from the fact that EC2 doesn't use nested
virtualization. This is hurting me as well, having 1h cycles to deploy a
change is painful. With this and some improvements to image generation we
could get down to a few minutes.
I wonder what @obadz <https://github.com/obadz> think about this :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21943 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAKP5SUDne3IFyOnJgsH7UBUgUlUEJSks5rTJqpgaJpZM4LlNZN>
.
|
By the way, you may check in to using Packet.net for some of these workloads, as they provide bare metal servers via an API. I'm working on NixOps integration, and can also share the temporary tooling I've developed to deploy to them. They run the qemu tests mighty fast. |
@grahamc we know that running vm build on non virtualized machine works. We already have Hetzner support. But it's REALLY inconvenient to build vms on other machines where everything you do is based on Amazon. |
I know the pain :( |
@grahamc that's good to know, but it's still a major complication to datacenter design for stuff like this. Networking within a VPC is a world apart from opening up a VPN connection across providers and so on. It also just feels weird that we normally go to great lengths to make arbitrary builds "pure" and not rely on unnecessary global things, unless it's code that we control that makes unnecessary global assumptions about owning a filesystem; in that case we wrap the whole thing in a VM and call it a day. It feels like a neat trick, but in many ways |
@shlevy just suggested that I might be able to use user namespaces to do a proper native |
@dezgeg you mean using systemd-boot? I don't think EC2 supports UEFI yet, unfortunately, but we probably also don't need a full grub (since a boot menu is useless on EC2). Perhaps something like |
6cb547f
to
7eb69de
Compare
@copumpkin The main issue is that I don't want |
@edolstra fully agree that duplicating As you say, the bootloader will probably be more of a pain, as will the activation script. I need to look into whether grub-install can be put directly into an image file, but if not, I might split up the logic a bit and e.g., not pass My main goal is to make sure that you're roughly on the same page before putting more time in this thing. I'm happy to be responsible for making it good (and not duplicating logic, and keeping it simple, and so on) but it'll be frustrating if I put a bunch of time into it only to have you disagree with the overall effort. |
This changes much of the make-disk-image.nix logic (and thus most NixOS image building) to use LKL to set up the target directory structure rather than a Linux VM. The only work we still do in a VM is less IO-heavy stuff that while still time-consuming, is less of the overall load. The goal is to kill more of that stuff, but that will require deeper changes to NixOS activation scripts, and switch-to-configuration.pl, and I don't want to bite off too much at once.
7eb69de
to
4755858
Compare
I'm also in the position to speed up these ami builds, but I see that it's going to be harder to maintain this code compared to current state. Could we just have |
@domenkozar eh, the VM-based stuff also breaks inside VMWare and other things. None of today's cloud environments support nested virtualization and they're becoming more and more mainstream. I haven't had a chance to refactor this to share code across |
Scenario: I want to enable somewhat seamless Linux Nix builds inside macOS by running a VM in the background, the same way we run Docker containers on macOS today. Keeping the VM-centric path by default will just be confusing and off-putting to folks who want to try this sort of thing out and who are tempted to press one of the big EC2 buttons on http://nixos.org/nixos/download.html, which is easier by far (if you're used to EC2) than any of the other "try it out" mechanisms on that page. |
@edolstra @domenkozar I'm almost ready to push my refactoring of this and
In this world, the new This image builder, on the other hand, does the same thing but uses Nix's internal build machinery (mostly just The end result is that we've factored out the "pure" aspects of Thoughts? I actually think it makes more sense than the current installer, should be more maintainable (i.e., testable in isolation, clearer separation of concerns), and allows for this image builder to work with almost no logic duplication. Edit: I'm not married to the name |
For anyone following this thread, I've created PR #23026 with the |
This has been on my back burner ever since I heard about xhyve (and my interest was subsequently renewed after Docker's HyperKit work). Sadly, I lack the bandwidth to take such a project on at the moment. Nonetheless, I'm a strong proponent for any work that gets us closer to achieving that goal. It would go a long ways towards making a case for Nix/OS adoption at |
This changes much of the make-disk-image.nix logic (and thus most NixOS image building) to use LKL to set up the target directory structure rather than a Linux VM. The only work we still do in a VM is less IO-heavy stuff that while still time-consuming, is less of the overall load. The goal is to kill more of that stuff, but that will require deeper changes to NixOS activation scripts and switch-to-configuration.pl, and I don't want to bite off too much at once.
On my test EC2 instance, the old image building code took about 25 minutes (see #20471), and this takes a little less than a minute. I've been testing it as follows:
Things I'm unsure about and would appreciate comments on:
fakechroot
+nix-env
is actually necessary. If I'm just doing--set
with no previous generations, isnix-env
doing anything beyond making two symlinks? Not doing it myself future-proofs this code a bit if the link scheme changes someday, but that also seems unlikely.Things I still need to do before I take the WIP marker off:
qcow2
, which I'd probably do as a post-VM conversion withqemu-img
now that I touch image files directly and LKL doesn't understand anything but a flat imagenixos-install.sh
logic to share the core bits and pieces with this work, so we don't risk getting out of sync. I'd however rather do that as a follow-up PR to keep moving parts to a minimum.