nixos/digital-ocean-image: init (rebase) #66978

arcnmx · 2019-08-19T20:33:13Z

Motivation for this change

A fork of the stalled #58464 made at @arianvp's suggestion, containing various fixes and changes that were brought up in review of the original PR (is a merge preferred here or is rebase fine?)

Things done

Notify maintainers

cc @eamsden @infinisil

mmahut · 2019-08-22T15:03:10Z

Please feel free to squash and rebase your commit to make it tidy.

Also, is it possible to include a test with it?

arcnmx · 2019-08-22T15:22:06Z

Also, is it possible to include a test with it?

No idea how to do this, I guess we could build the image - is there somewhere that's appropriate to introduce that? All the services require DO's internal network environment so I imagine testing any other parts would be a pretty involved process.

arianvp · 2019-08-22T16:01:52Z

i dont think a test is needed. I also don't know what to test here in the first place; we're just building an image. I don't think any of the other images have tests either. If they do, they're probably in the NixOps repo and not the Nixos one

arcnmx · 2019-08-22T16:14:27Z

To be fair, there are some valid points to testing here:

Since this all needs to be explicitly imported by path, I'm not sure if there's currently anything that even tests if the files evaluate? It's presumably not checking if they're valid nix expressions let alone valid in the context of a nixos system configuration.
- (for all I know this assumption is incorrect and some magic test walks the module tree to import every file it finds?)
Nixops at some point will probably want a reachable http endpoint to download the image from, so a long-term goal may be to add image generation+hosting to a release.nix at some point?

So I imagine something that builds the image (or at least computing the outPath of one) would be nice. Not sure of the impact of adding a test that depends on an entire nixos base image though, for all I know it may trigger a new build for every new nixpkgs commit and that may be overkill? I'm not familiar with nixos testing strategies.

Testing the services themselves via mocking the metadata network service or something similar seems like overkill, however, so yes; I agree that integration tests make more sense to run against DO via nixops.

mmahut · 2019-08-22T16:38:01Z

No worries, was just asking. :)

arianvp · 2019-09-10T09:47:47Z

nixos/modules/virtualisation/digital-ocean-config.nix

+      networking = {
+        firewall.allowedTCPPorts = [ 22 ];
+        hostName = mkDefault ""; # use Digital Ocean metadata server
+        useNetworkd = mkDefault true;


Why have networkd by default? This is an experimental feature that still has some known bugs. I'd prefer not enabling it by default on installations. Especially because it behaves differently from what people expect

Turns out the answer is: Because I added it there. Which was a stupid mistake on my side! :D. we should remove it

Hm, it's been working well for me... From initial testing turning it off it causes problems for the metadata init services and neither hostname nor ssh keys are making it to the VM :(

That sounds like a (serious) bug in the NixOS networking module. By default networking module should listen to DHCP requests on all interfaces. investigating...

New theory network-online.target works well on systemd-networkd but maybe not with the default networking module. Workaround: add a Restart=always to digitalocean-init such that it keeps restarting until the network is available instead of it failing immediately...

This is me purely speculating though; I'll spin up a digitalocean box and do some actual debugging later tonight

I just did this on my digitalocean image and it seems to work fine! curl will then do the retries with exponential backoff and whatnot

Weird, depending on which network target (link a diff if possible)? I tried connrefused but it did not work for me because curl seems to give up if the error specifically isn't ECONNREFUSED, which is generally the case when the network isn't available or the metadata server isn't reachable/routable.

Perhaps we could use it to unify all our cloud image metadata fetching steps? (Just a thought; and probably for a later PR).

Definitely worth looking into! But yeah, not necessarily for this PR.

They have a service called coreos-metadata.service

Cool, yeah, I think in general that's the way forward here. Whether built on top of afterburn or a bash loop around curl or anything else, it gives us a central point to place the init/detection logic. I'll at least restructure the units around this approach, then even if we kept the network-online dependency for now, it should be easy to make follow-up improvements. I'll also look into writing a retry loop, I'm just slightly wary about making sure that I choose the appropriate retry logic.

Oh I'm running this on networkd I just realised. That's probably why it works.

If you're wary about the retry-loop, we can also let systemd handle that, which is maybe more elegant:

[Service] Restart=on-failure RestartSec=zzz StartLimitIntervalSec=yyy StartLimitBurst=xxx

I think either way works. Systemd doesn't do exponential backoff though, but im not sure we require that. the link-local server isn't gonna crash from hammering. Restarting every 2 seconds for like 10 times is probably sufficient.

I'll see if I can package afterburner tonight

If you're wary about the retry-loop, we can also let systemd handle that, which is maybe more elegant.

Though I wanted to, it requires that the service type be changed to simple, which then makes it unusable as a dependency for other units. It does work okay for the individual units, but yeah the lack of backoff is slightly annoying... A loop will work fine.

the link-local server isn't gonna crash from hammering. Restarting every 2 seconds for like 10 times is probably sufficient

Probably, my concern is more that we don't want it to spam journald forever, but we do want to make sure it doesn't give up too early if the network is slow to start or whatever, because these services are important for making sure you can access the machine! While also being fairly responsive because if you wait 2 seconds, a bunch of services will start in the meantime.

yeh not having it a oneshot is annoying. I didn't realise oneshot didn't support Restart=

So, pushed a few things:

metadata.service added with a retry loop that can be used as a dependency so that services that need metadata can just wait on it before doing their thing. The service is ordered After dhcpcd or systemd-networkd (if you have them enabled) and thus the retry loop isn't likely to be needed except in unusual circumstances, as the service should just start as soon as an ip/route is assigned to the interface.

sethostname and entropy services are wantedBy and Before network.target, so that any services that are ordered After network or network-online will wait for them. This does have the effect of delaying the startup of certain services until the droplet has an ip, but that's also kind of the point. networkd is pretty instantaneous, dhcpcd takes about 6 seconds.

Some fixes to the userdata init service. It remains part of / After network-online.

the password/sshkey services are just part of mult-user but ordered to take place before sshd so you can't have racy scenarios where the server is online and reachable but are unable to login.

Some changes to image/init to allow them to share the config file because it seemed kind of silly to duplicate?

arcnmx · 2019-09-13T15:11:26Z

nixos/modules/virtualisation/digital-ocean-config.nix

+  options.virtualisation.digitalOcean = with types; {
+    setRootPassword = mkOption {
+      type = bool;
+      default = true;


I'm considering changing this default to false because:

It's a pretty insecure option, setting the password to something accessible from the metadata server to any unauthenticated user on the system that can access the network.

It currently resets the root password on every boot rather than first boot. I should maybe add a path condition to make sure it only ever runs once?

DO always tells me my custom images don't support root password setting, so I can't even get it to email me the password to test this. Maybe I'm using the wrong Distribution setting when uploading it?

arcnmx · 2019-09-13T15:16:44Z

nixos/modules/virtualisation/digital-ocean-config.nix

+
+    }
+    /* Give nice error messages if a user tries to turn off mutable users and enable
+     * root password or SSH key fetching from Digital Ocean


Want to remove these defaults+assertions because mutableUsers=false does not imply that /root/.ssh is readonly, nor even /etc/shadow. In fact, enabling these settings is a likely motivation/reason for a server to use mutableUsers=false in the first place, so that it can provide auth details in some way other than via the nixos config.

nixos/modules/virtualisation/digital-ocean-config.nix

infinisil · 2019-09-17T21:45:11Z

nixos/modules/virtualisation/digital-ocean-config.nix

+        script = ''
+          set -eu
+          DO_DELAY_ATTEMPTS=0
+          while ! curl -fsSLI http://169.254.169.254/metadata/v1.json > /dev/null; do


Should I be worried that this isn't using https? This is used for root passwords and such..

Also it would be nice to download all files in this unit and store them somewhere to be accessed by the others (in a location only readable by root).

I'm more worried that unless you explicitly firewall it, any process can retrieve the same data. Though, the root password service in particular I think needs some changes and should probably also be disabled by default.

Also it would be nice to download all files in this unit and store them somewhere to be accessed by the others (in a location only readable by root).

I'd considered that but decided against it just because I didn't feel like deciding on a proper path for it or dealing with the corner cases:

The hostname can be changed while the server is running, so you can restart that service to update it. Making it depend on data from the first makes that dependency relationship (and the meaning of the systemd service states) a bit more complicated?

Each service was interested in a different endpoint, so downloading them all when most services never even run seemed like an unnecessary extra second of boot time.

Though, I believe the json blob contains everything... so each service can probably just use jq to pick out the pieces they need from that one file. I believe the hostname one is the only mutable piece of data so it could just continue to use the network to simplify things.

Nope this is a Link-Local address so should be safe.

"should be safe" doesn't make me feel safe! Can we check that it's a link-local address?

It is by definition a link local address because it's in the link-local private ip range. Ipv4 mandates that the IP-range isn't next-hopped https://tools.ietf.org/html/rfc3927. Only people on the same physical link can talk over the link-local ip range

Oh nice! I feel safe now

infinisil · 2019-09-17T21:49:44Z

nixos/modules/virtualisation/digital-ocean-config.nix

+          set -eo pipefail
+          TEMPDIR=$(mktemp -d)
+          curl -fsSL http://169.254.169.254/metadata/v1/vendor-data | munpack -tC $TEMPDIR
+          ENTROPY_SEED=$(grep -rl "DigitalOcean Entropy Seed script" $TEMPDIR)


I'd be interested to see an example of all these downloaded files to see what they look like and what other info they contain.

I'd be interested to see an example of all these downloaded files to see what they look like and what other info they contain.

Example here.

nixos/modules/virtualisation/digital-ocean-init.nix

infinisil · 2019-09-17T21:59:17Z

nixos/modules/virtualisation/digital-ocean-init.nix

+        if [[ -s $userData ]]; then
+          # If the user-data looks like it could be a nix expression,
+          # copy it over. Also, look for a magic three-hash comment and set
+          # that as the channel.


How about doing something like nix-instantiate --eval '<nixpkgs/nixos>' -A config.system.stateVersion to find the desired channel too?

I'm a bit wary of making significant changes to the logic here, and it comes straight from other modules. I guess ideally there should some be some common module or place for this to go where cleanup can be applied to nixos on all relevant cloud providers?

nixos/modules/virtualisation/digital-ocean-init.nix

nixos/modules/virtualisation/digital-ocean-image.nix

infinisil · 2019-09-17T22:04:39Z

Very cool PR!

infinisil · 2019-10-02T20:41:15Z

I just merged #69186, so you could rebase this PR ontop of master and enable do-agent to get the better monitoring here :)

arcnmx · 2019-10-03T15:34:12Z

I just merged #69186, so you could rebase this PR ontop of master and enable do-agent to get the better monitoring here :)

Nice, done, but unfortunately master has been broken for a week due to vm test failures which means this is now semi-broken. Anyone looking to try image generation now will currently need to rebase this onto nixos-unstable instead and cherry-pick the #69186 merge on top of it.

EDIT: or apply #70027 which unblocks this and nixos-unstable.

arcnmx · 2019-10-15T15:24:13Z

#69302 seems to be relevant in some way, particularly if you want to use networkd for faster droplet boot. I'm not quite sure how or what needs to change though?

arianvp · 2019-10-15T15:32:32Z

It means you have to explicitly set for which network interfaces you want DHCP if networkd is denabled. I think this is a thing we can fix after merging to be honest, as it's only a problem for now if networkd is enabled. Lets file a new issue for it. I'm pretty familiar with the networkd changes and can probably create a PR for it. but it's not worth blocking what we already have.

arcnmx · 2019-10-15T15:55:19Z

Yeah, fair, it's easy enough to add networking.interfaces.ens3.useDHCP = true; to your own config and making any changes for it right now seems out of scope of this PR. So what's left outstanding here, if anything?

(also I'm at least not confident enough in including a change like that without going with a more flexible [Match] block because I do not trust the "predictable" names if some other config or DO can subtly change things)

EDIT: just pushed a rebase because the last time I did so master was broken

bbigras · 2019-10-24T00:57:17Z

Is this ready for testing? How do I build the image?

arcnmx · 2019-10-24T01:08:31Z

Yeah it works and can be built with something like:

nix build '(with import <nixpkgs> { }; nixos { imports = [<nixpkgs/nixos/modules/virtualisation/digital-ocean-image.nix>]; }).digitalOceanImage'

Then the resulting file can be uploaded as a custom image via the DO web UI and then used to create a droplet. Though beware that if you plan on using the userdata feature to configure the image (or just in general doing a rebuild/switch), you will need to point it to a channel URL that has these changes merged in, otherwise it will fail when it tries to use nixpkgs without the DO modules.

I'd recommend managing the resulting droplets with nixops or something that makes it easier to pin the channel, but other than that complication it should work fine.

infinisil

Looking pretty good, I tested it and it works very well (except the do-agent which didn't start for some reason). Some final remarks and this is ready to merge imo

infinisil · 2019-10-27T17:11:44Z

nixos/modules/virtualisation/digital-ocean-config.nix

+      };
+      services.do-agent.enable = mkDefault true;
+      networking = {
+        firewall.allowedTCPPorts = [ 22 ];


This should be done by default already, or is there another reason this is added? In addition it would have to be services.openssh.listenAddresses.*.port

infinisil · 2019-10-27T17:13:37Z

nixos/modules/virtualisation/digital-ocean-image.nix

+        A path to a configuration file which will be placed at `/etc/nixos/configuration.nix`
+        and be used when switching to a new configuration.
+        If set to `null`, a default configuration is used that imports
+        `(modulesPath + "/virtualisation/digital-ocean-config.nix")`.


This needs to use the xml syntax <literal>...</literal>, same for other option descriptions

infinisil · 2019-10-27T17:20:13Z

maintainers/maintainer-list.nix

+  arianvp = {
+    email = "arian.vanputten@gmail.com";
+    name = "Arian van Putten";
+    github = "arianvp";


Needs githubId since recently

infinisil

LGTM! Can you squash commits a bit?

arcnmx · 2019-10-28T00:25:22Z

I tried.

infinisil · 2019-10-28T11:56:37Z

Personally I'd like it if every commit is one that won't trip up anybody when bisecting. So that in all of them eval still succeeds, the manual build successfully and things like that. So I'd do one commit for adding maintainer, one for the *-init.nix module, one for the *-config.nix and one for *-image.nix.

Rizary · 2019-11-08T07:09:25Z

nixos/modules/virtualisation/digital-ocean-image.nix

+
+with lib;
+let
+  cfg = config.virtualisation.digitalOceanImage;


Hi! @arcnmx, I am planning to use this custom image. But I am confused about where is this digitalOceanImage located? Because in the file digital-ocean-image.nix, I only found digitalOcean.

You'll have to manually import the module file before you can use it.

This seems to be a convention all images use. Not sure if it's a good one.

E.g.. https://github.com/nix-community/nixos-generators/blob/master/formats/gce.nix

@arianvp thank you! Yes, I think I have misunderstanding about the use of cfg = config.virtualisation.digitalOceanImage here.

arcnmx · 2019-11-08T16:59:01Z

Personally I'd like it if every commit is one that won't trip up anybody when bisecting.

That does kind of just sound like a single commit for the whole thing I guess, though I can't quite make sense of applying that logic here. If someone isn't using the module, it won't trip them up. If they are, they would necessarily start bisecting with a known good commit that's at least from the head of this branch (or point at which this is merged) regardless of whether it's one commit or many, because it wouldn't exist in a state they could have used it otherwise.

So that in all of them eval still succeeds, the manual build successfully and things like that.

To my knowledge they are currently like that, though each commit adds a particular feature (excluding the commits from the prior PR).

So I'd do one commit for adding maintainer, one for the *-init.nix module, one for the *-config.nix and one for *-image.nix.

That does seem simplest, though I'm not very comfortable with messing with someone else's history...

arianvp · 2019-11-24T13:17:22Z

I would honestly just squash everything in one commit; and keep the descriptions of each of the original commits. I'm also OK getting it merged as is...

flokli · 2019-11-24T14:06:19Z

Please rebase on latest master, and squash into two commits (one editing maintainer-list, the other one adding the other files).

RobbieMcKinstry · 2022-12-17T06:47:35Z

Thank you for this work! I came across it via this very nice tutorial. What would it take to get this image built on a MacOS device?

I want to spin up a DigitalOcean instance of NixOS so I can use a remote builder to build a Docker image with Nix... and that has landed me here! :D

RobbieMcKinstry · 2022-12-17T07:40:09Z

Feel free to ignore my above question. I found a workaround.
I created an Ubuntu droplet, used nix-infect to convert it into NixOS, then ssh'd into the box and ran this build step to generate the image. Finally I downloaded the image from the box onto my local machine so I can upload it as a custom image to my DO account.

ofborg bot added 6.topic: nixos 8.has: module (update) 10.rebuild-darwin: 0 10.rebuild-linux: 0 labels Aug 19, 2019

arcnmx force-pushed the digital-ocean branch from 8f084e5 to 02d54ba Compare August 23, 2019 03:15

arianvp reviewed Sep 10, 2019

View reviewed changes

lheckemann added this to the 20.03 milestone Sep 10, 2019

arcnmx force-pushed the digital-ocean branch from 02d54ba to 20877e1 Compare September 10, 2019 23:33

arcnmx commented Sep 13, 2019

View reviewed changes

arcnmx force-pushed the digital-ocean branch from 4701648 to 07dcf84 Compare September 13, 2019 22:57

infinisil requested changes Sep 17, 2019

View reviewed changes

arcnmx force-pushed the digital-ocean branch from 4cd97ee to 954b082 Compare September 22, 2019 02:22

arcnmx force-pushed the digital-ocean branch from 954b082 to ac0cd98 Compare October 3, 2019 15:25

arcnmx force-pushed the digital-ocean branch from ac0cd98 to 8922fe4 Compare October 3, 2019 17:03

arcnmx force-pushed the digital-ocean branch from 8922fe4 to 0558dce Compare October 15, 2019 15:56

infinisil mentioned this pull request Oct 27, 2019

nixos/digital-ocean-image: init #58464

Closed

10 tasks

infinisil requested changes Oct 27, 2019

View reviewed changes

arcnmx force-pushed the digital-ocean branch from 8965b18 to e71e35a Compare October 27, 2019 19:31

infinisil approved these changes Oct 27, 2019

View reviewed changes

arcnmx force-pushed the digital-ocean branch from e71e35a to a74a353 Compare October 28, 2019 00:24

Rizary reviewed Nov 8, 2019

View reviewed changes

nixos/digital-ocean-image: init

8bba282

arcnmx force-pushed the digital-ocean branch from a74a353 to 8bba282 Compare November 24, 2019 16:12

infinisil merged commit dce4e72 into NixOS:master Nov 24, 2019

noteed mentioned this pull request Dec 1, 2019

Add a Digital Ocean format. nix-community/nixos-generators#47

Merged

infinisil mentioned this pull request Apr 3, 2020

Support removing python from zfs/grub closure #84129

Merged

10 tasks

ixxie mentioned this pull request Jul 2, 2020

WIP nix-community/nixops-digitalocean#1

Closed

ixxie mentioned this pull request Jul 14, 2020

Replace nixos-infect with image-based droplet instantiation nix-community/nixops-digitalocean#3

Open

arcnmx deleted the digital-ocean branch November 26, 2022 05:55

nixos/digital-ocean-image: init (rebase) #66978

nixos/digital-ocean-image: init (rebase) #66978

Conversation

arcnmx commented Aug 19, 2019 • edited

Motivation for this change

Things done

Notify maintainers

mmahut commented Aug 22, 2019

arcnmx commented Aug 22, 2019

arianvp commented Aug 22, 2019

arcnmx commented Aug 22, 2019 • edited

mmahut commented Aug 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arianvp Sep 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arianvp Oct 3, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

infinisil commented Sep 17, 2019

infinisil commented Oct 2, 2019

arcnmx commented Oct 3, 2019 • edited

arcnmx commented Oct 15, 2019

arianvp commented Oct 15, 2019 • edited

arcnmx commented Oct 15, 2019 • edited

bbigras commented Oct 24, 2019

arcnmx commented Oct 24, 2019 • edited

infinisil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

infinisil left a comment

Choose a reason for hiding this comment

arcnmx commented Oct 28, 2019

infinisil commented Oct 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arcnmx commented Nov 8, 2019

arianvp commented Nov 24, 2019

flokli commented Nov 24, 2019

RobbieMcKinstry commented Dec 17, 2022

RobbieMcKinstry commented Dec 17, 2022

arcnmx commented Aug 19, 2019 •

edited

arcnmx commented Aug 22, 2019 •

edited

arianvp Sep 13, 2019 •

edited

arianvp Oct 3, 2019 •

edited

arcnmx commented Oct 3, 2019 •

edited

arianvp commented Oct 15, 2019 •

edited

arcnmx commented Oct 15, 2019 •

edited

arcnmx commented Oct 24, 2019 •

edited