Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/digital-ocean-image: init #58464

Closed
wants to merge 7 commits into from

Conversation

eamsden
Copy link
Contributor

@eamsden eamsden commented Mar 28, 2019

Motivation for this change

A just-works NixOS image that can be uploaded to Digital Ocean, hopefully to be used as a basis for better Digital Ocean support in NixOps.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@colemickens
Copy link
Member

Can you share expected example user data?

@eamsden
Copy link
Contributor Author

eamsden commented Mar 28, 2019

@colemickens if you mean username/password: it's just nixos/nixos. It should be able to sudo without a password.

I got into a console and looked at the systemd journal. I can't copy-paste from the console so I apologize for the screenshot of text instead of copy-paste:
no hosts template

It looks like cloud-init isn't very robust and doesn't like having "nixos" as the distro. I'll look into that more tomorrow.

@colemickens
Copy link
Member

@eamsden I meant example user-data you're passing into the Droplet config, or are you not specifying any and just expecting to see the SSH keys?

The problem isn't there. If you scroll further, you see that the network portion of the "vendor-data" config fails to apply. It then doesn't process the user-data or ssh keys, I suspect.

But this just makes me question all the more if it's worth the fun of cloud-init just to get ssh keys when they're accessible in at least one or two easy ways.

@colemickens
Copy link
Member

I may have been looking at the wrong log, this is the most recent -u cloud-init log I got: https://gist.github.com/colemickens/98dee80a8f5d08188d0e932cdc1004a1

@eamsden
Copy link
Contributor Author

eamsden commented Mar 28, 2019

I added the template file it was complaining about (see 37d3a52) and now it's not able to write /etc/hosts because NixOS makes it read-only. Is there an idiomatic way to override that?

Log:
https://gist.github.com/eamsden/bfb5153844f800241549695b48606f11

@eamsden eamsden changed the title digital-ocean-image: init nixos/digital-ocean-image: init Mar 28, 2019
@colemickens
Copy link
Member

I don't know. I'm still stuck on the last line of the log which reminds the same: handlers.py[DEBUG]: finish: init-network: FAIL: searching for network datasources. I'm not an expert, but the previous lines make me think the /etc/hosts issue is non-fatal: util.py[WARNING]: Running module update_etc_hosts (<module 'cloudinit.config.cc_update_etc_hosts' from '/nix/store/lfb4vx0fl1v2n30hs79bsy5pwfk622ip-cloud-init-0.7.9/lib/python2.7/site-packages/cloudinit/config/cc_update_etc_hosts.pyc'>) failed since it only says WARNING ? Again, cloud-init could have provided slightly more useful logging and this would be less of a guessing exercise.

To be honest, the time already spent reading cloud-init logs reminded me why I've gone to lengths to avoid this in the past, especially on NixOS where it's only real utility is better served with a few lines of bash.

In case it's useful -- I did write up a nixos/maintainers script that will auto-build and upload the image to Digital Ocean. I got the iteration cycle down to a few minutes. I also opened another PR that reduces the size of the image by about ~230MB by changing the cloud-init->cloud-utils packages. You can find it in here: https://github.com/colemickens/nixpkgs/blob/digitalocean/nixos/maintainers/scripts/digitalocean/upload-image.sh

@eamsden
Copy link
Contributor Author

eamsden commented Mar 28, 2019

@colemickens Perhaps I should go ahead and build a utility that can parse the digital ocean metadata and set things properly. The only thing that wouldn't work for would be resetting the root password, since (for obvious reasons) they don't expose that in the metadata, but it isn't too hard to do from the nix config anyway if you really want a root password set.

Copy link
Member

@nlewo nlewo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just realized this is a draft PR. So, my comments might be not relevant at all!

nixos/modules/virtualisation/digital-ocean-config.nix Outdated Show resolved Hide resolved
nixos/modules/virtualisation/digital-ocean-config.nix Outdated Show resolved Hide resolved
nixos/modules/virtualisation/digital-ocean-config.nix Outdated Show resolved Hide resolved
nixos/modules/virtualisation/digital-ocean-config.nix Outdated Show resolved Hide resolved
@eamsden
Copy link
Contributor Author

eamsden commented Mar 29, 2019

@nlewo the stuff you mentioned in the review is to help me debug the nixos image on digital ocean, and will most certainly be removed before the draft tag comes off the PR

@eamsden
Copy link
Contributor Author

eamsden commented Apr 2, 2019

It looks like the best approach is going to be a very optional module that can load bits and pieces of configuration from the config drive. Digital Ocean uses an OpenStack config drive so maybe we should just have a generic OpenStack service module? This would let us set (optionally according to the NixOS config)

  • root password
  • root SSH keys
  • network interface configurations
  • DNS resolvers
  • hostname
  • entropy seed

from the Digital Ocean metadata

Further, we could do as the Amazon AMI image does and allow loading of a NixOS module from the user data, so users of the NixOS image could quickly spin up a custom NixOS system directly from a cloud console.

The goal with this is not to enforce compliance with the host provider's metadata, but to make starting with NixOS in a Digital Ocean droplet as painless as possible, while also not interfering with the operation of e.g. NixOps or other deployment tools.

@arianvp
Copy link
Member

arianvp commented Apr 10, 2019

Note that you can just directly curl the digitalocean metadata API for public keys and userdata. No need for bringing in cloud-init and openstack

https://developers.digitalocean.com/documentation/metadata/

You could put this in a systemd unit that starts up at boot:

curl http://169.254.169.254/metadata/v1/public-keys > /root/.ssh/authorised_keys

and:

curl http://169.254.169.254/metadata/v1/user-data > /etc/nixos/configuration.nix
nixos-rebuild switch

This is similar to what the NixOS amazon, GCE and Azure images do

@eamsden
Copy link
Contributor Author

eamsden commented Apr 10, 2019

@arianvp I am working on something similar to this. The issue is that we cannot depend on networking being up when we start to configure the system.

Fortunately, there is a config drive available in a standard location, with the same JSON metadata available. I'm working on making sure the instance can configure its networking etc from this.

@arianvp
Copy link
Member

arianvp commented Apr 10, 2019

Why can't we depend on the network being up?
Because networkd / networking isn't configured yet? You can make that part of the base image config. Then the metadata retrieval systemd unit just blocks until it can reach the metadata server and you make it a RequiredBy=metadata.target to make components that wait on the metadata start in the right order.

This is how the coreos digital ocean image works too.

@eamsden
Copy link
Contributor Author

eamsden commented Apr 11, 2019

Interesting. Thank you! It wasn't clear from the Digital Ocean documentation that we could depend on DHCP being enabled and configure networking that way. Since it appears that we can. I'll go with your suggestion as soon as I get time to hack on it again.

@arianvp
Copy link
Member

arianvp commented Apr 14, 2019

I was able to successfully boot an image, set up ssh, and hostname, with this config:
https://github.com/arianvp/nixos-stuff/blob/master/modules/digitalocean/config.nix

Feel free to take inspiration from it, or else I'm also happy to provide a patchset myself on top of this PR

@eamsden
Copy link
Contributor Author

eamsden commented Apr 15, 2019

@arianvp I'm certainly not going to complain if someone else does work. :)

That said, my plan this week is to take the config you wrote and wrap the systemd services in options, so a user can turn them off in an updated system config, and add something similar to the Amazon/GCE setup where a user can put a NixOS configuration module in the user data and have the machine rebuild itself to that config on startup. I'd also like to make sure the default config is present on the system so that the user can use /etc/nixos/configuration.nix to configure the system without having to manually replicate everything the image does for Digital Ocean compatibility.

@arianvp
Copy link
Member

arianvp commented Apr 15, 2019

Yeh good idea. we should probably disable the hostname fetching when someone sets the hostname in the NixOS config too. Such that nixos-rebuild doesn't race with the metadata daemon

@eamsden
Copy link
Contributor Author

eamsden commented Apr 16, 2019

@arianvp I'm testing an image now and then I'll push, then I have to go to $DAYJOB. Some things I'd like to do before I take off the draft label.

  • Duplicate the support from Amazon/GCE images for putting a NixOS config in the user data.
  • MIME-decode the vendor data and run the RNG initialization script therein.
  • Figure out why we get a complete GNOME install in the image (wallpapers, Pango/Cairo, kitchen sink, etc) and stop that.

@eamsden
Copy link
Contributor Author

eamsden commented Apr 16, 2019

@arianvp (In the latest push I did make systemd set the hostname from metadata only if it is not set in the NixOS config)

@eamsden
Copy link
Contributor Author

eamsden commented Apr 16, 2019

A bit of hunting reveals that cloud-utils is pulled in by nixos/modules/system/boot/grow-partition.nix, via the option boot.growPartition. That seemingly innocuous option pulls in cloud-utils, but just for the growpart utility, which in turn pulls in qemu, which pulls in GTK3. Which means we really need #58469 to de-bloat the image.

@eamsden
Copy link
Contributor Author

eamsden commented Apr 16, 2019

I'm hoping the bloat will be fixed separately when #58471 is merged, so I'm going to make this into a real PR now.

@arianvp @colemickens do either of you want to be in the maintainers list for this image? Or should I leave it as just me?

@eamsden eamsden marked this pull request as ready for review April 16, 2019 23:36
@eamsden eamsden requested a review from infinisil as a code owner April 16, 2019 23:36
eamsden pushed a commit to eamsden/website that referenced this pull request Apr 17, 2019
@arianvp
Copy link
Member

arianvp commented Apr 17, 2019

You can add me to the maintainers list. I'll be actively using this module do I don't mind maintaining it. Good job on the size hunting.

@eamsden
Copy link
Contributor Author

eamsden commented Apr 17, 2019

@arianvp I wasn't able to find you in https://github.com/NixOS/nixpkgs/blob/master/maintainers/maintainer-list.nix. Am I searching the wrong handle perhaps? I was grepping for 'arianvp'.

@arianvp
Copy link
Member

arianvp commented Apr 17, 2019

@eamsden I don't maintain any packages so far, so I'm not in that list yet =)
You can add this entry if you want to:

"arianvp" = {
    email = "arian.vanputten@gmail.com";
    name = "Arian van Putten";
    github = "arianvp";
};

set -e
TEMPDIR=$(mktemp -d)
curl --retry-connrefused http://169.254.169.254/metadata/v1/vendor-data | munpack -C $TEMPDIR
$TEMPDIR/entropy-seed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an error sometimes when this script runs, which causes issues in config switching when importing this module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the curl part that is failing or is it the $TEMPDIR/entropy-seed part that is failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is the curl not getting data, which is causing munpack to fail. But I'll have to try it again after work.

It didn't seem to be stopping the base system from coming up from the image, but if I build a system closure that imports digital-ocean-config.nix (I was trying this for my website) and I don't turn off the entropy seed fetcher, it gives an error when switching to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a known issue with munpack, since 2005 at least! https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=211472

That bug report from 2005 is talking about the same version of munpack packaged by nixpkgs now. I think we need an alternate solution, if we are going to grab the entropy data at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I'm using the following variant:

curl http://169.254.169.254/metadata/v1/vendor-data | munpack -tC $TEMPDIR
ENTROPY_SEED=$(grep -rl "DigitalOcean Entropy Seed script" $TEMPDIR)
${pkgs.runtimeShell} $ENTROPY_SEED

echo "attempting to fetch configuration from Digital Ocean user data..."
export HOME=/root
export NIX_PATH=/nix/var/nix/profiles/per-user/root/channels/nixos:nixos-config=/etc/nixos/configuration.nix:/nix/var/nix/profiles/per-user/root/channels
userData=$(mktemp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from the Amazon init module and changed obvious things, but I haven't tested it yet.

fi

echo "setting configuration from Digital Ocean user data"
cp "$userData" /etc/nixos/configuration.nix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy about this part. I'd like to find a way to make sure that digital-ocean-config is in the module list without having to explicitly include it. That way if a user provides a config in the user data they can assume still that the digital ocean stuff is in place, and disable it via the options if necessary.

cc @arianvp thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the pattern that people are used to , is:

# configuration.nix
{ imports = [ ./hardware-configuration.nix ]
}

right? At least, that is the standard /etc/nixos/configuration.nix that is being generated.

then just put the DO specific things in /etc/nixos/hardware-configuration.nix

It's not ideal because if people forget, the config won't work... but it is the same as on their laptops.

Other thing we could do is:

cp $userData /etc/nixos/do-userdata.nix

And then hardcode /etc/nixos/configuration.nix to be:

{ imports = [./do-userdata.nix  <nixpkgs....blah../do-config.nix> ]; }

Then people don't have to remember to include the hardware-configuration manually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along those lines too. But maybe we should stick to what the AWS/GCE images do, which is to put the user data in /etc/nixos/configuration.nix, and then make an issue to discuss this. Or we could just lead the way in doing things in a bit nicer way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be nice to get @infinisil's take as the code owner on this question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @arianvp suggested sounds much better, I don't think there's much to discuss there. I'd just go ahead and do it this way, which can serve as an example for others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eamsden shall we go with my suggestion? I'd love to see this merged. Been using it for quite a while already on my own nixpkgs fork

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arianvp Yeah go with your suggestion. I too really want to see this merged

…sh keys when users.mutableUsers is disabled
@arcnmx
Copy link
Member

arcnmx commented Jul 7, 2019

A few notes from trying this out.

  • a switch fails because /dev/sda doesn't exist, root appears to be /dev/vda instead
  • digitalocean-entropy-seed service seems to fail due to munpack tempdesc.txt: File exists

Besides these minor issues it seems to be working pretty well though!

};
};

/* Fetch the ssh keys for root from Digital Ocean */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse nixos/modules/virtualisation/{openstack-config.nix,ec2-data,amazon-init.nix} here? There's already a lot of metadata server fetching and applying…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not really. They all are very specific to their platforms and I'm not sure how I could get a lot of code reuse out of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The openstack-init from openstack-config.nix uses ec2-metadata-fetcher.nix, which creates a service fetching metadata, which is picked up by the service defined in ec2-data.nix (which is imported).

This looks pretty similar to what we're trying to do here (and what we should do in brightbox-image.nix too, btw.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made more than one attempt to use openstack to fetch metadata, and eventually settled on just using DigitalOcean's documented metdata service.

Copy link
Member

@arcnmx arcnmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using this for the past month or so and would like to see or help it get pushed to completion if possible! So, I'm using this review to organize what I believe are the remaining three issues that have been brought up regarding this PR, and I think it should be in a pretty good shape once they're addressed?

script = ''
set -e
TEMPDIR=$(mktemp -d)
curl --retry-connrefused http://169.254.169.254/metadata/v1/vendor-data | munpack -C $TEMPDIR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As has been noted, this fails due to issues with munpack being unable to name the files properly; we can avoid this with the -t argument and find the file by its header comment:

Suggested change
curl --retry-connrefused http://169.254.169.254/metadata/v1/vendor-data | munpack -C $TEMPDIR
curl --retry-connrefused http://169.254.169.254/metadata/v1/vendor-data | munpack -tC $TEMPDIR
ENTROPY_SEED=$(grep -rl "DigitalOcean Entropy Seed script" $TEMPDIR)
${pkgs.runtimeShell} $ENTROPY_SEED

(also remove the following line that executes $TEMPDIR/entropy-seed)

initrd.kernelModules = [ "virtio_scsi" ];
kernelModules = [ "virtio_pci" "virtio_net" ];
loader = {
grub.device = "/dev/sda";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grub.device = "/dev/sda";
grub.device = "/dev/vda";

All of my droplets have vda rather than sda so this is required for switch to succeed when installing bootloader updates. Not sure if there are machines where sda exists on DO or if this just wasn't tested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it probably just wasn't tested

fi

echo "setting configuration from Digital Ocean user data"
cp "$userData" /etc/nixos/configuration.nix
Copy link
Member

@arcnmx arcnmx Jul 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As was previously suggested by arianvp:

Suggested change
cp "$userData" /etc/nixos/configuration.nix
cp "$userData" /etc/nixos/do-userdata.nix
echo '{ modulesPath, ... }: {
imports = [
./do-userdata.nix
(modulesPath + "/virtualisation/digital-ocean-config.nix")
];
}' > /etc/nixos/configuration.nix

(not necessarily suggesting that the template config be inlined like that but within the constraints of github review ui...)

@arianvp
Copy link
Member

arianvp commented Aug 6, 2019

@arcnmx would you want to make a PR based on this PR with your suggested changes? I think this will then be ready to merge

@arcnmx
Copy link
Member

arcnmx commented Aug 19, 2019

@arianvp sorry was away for a bit but can do!

@infinisil
Copy link
Member

Closing in favor of #66978

@infinisil infinisil closed this Oct 27, 2019
@eamsden eamsden deleted the init_digital_ocean_image branch March 6, 2020 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants