Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nomad: init #105739

Merged
merged 1 commit into from Jan 18, 2021
Merged

nixos/nomad: init #105739

merged 1 commit into from Jan 18, 2021

Conversation

lovesegfault
Copy link
Member

Motivation for this change

Noticed that while consul has a NixOS module, nomad does not. Decided to add
it since it's needed at work.

cc. @cpcloud for testing
cc. @flokli for review

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Copy link
Member

@aanderse aanderse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments I hope you find useful. Please let me know if you have any questions about my comments, or if you need a hand on anything here.

nixos/modules/misc/ids.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
Copy link
Member

@aanderse aanderse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module looks really good 👍

If anyone was interested a future PR could utilize the upstream unit via systemd.packages. Full bonus points would be awarded.

serviceConfig = {
DynamicUser = cfg.dropPrivileges;
ExecReload = "${pkgs.coreutils}/bin/kill -HUP $MAINPID";
ExecStart = "${cfg.package}/bin/nomad agent -config=/etc/nomad.json";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the config directory option as well here so that an application can add additional config without having to change the unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the switch-to-configuration.pl script will automatically handle the for you...

ping @flokli to discuss this over winter break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I would like to see here is a settingsFiles option that is a list of strings or paths. This allows ad-hoc configuration when enabling the service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only request that I haven't had time to implement/test yet.

Would you mind giving a shot at it @cpcloud, doing it as in your consul PR?

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of changes are in order here IMO to make this unit usable out of the box.

RestartSec = 2;
StartLimitBurst = 3;
StartLimitIntervalSec = 10;
StateDirectory = "nomad";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this will materialize as /var/lib/nomad, nomad does not set data_dir by default so I think the unit should set data_dir in cfg.settings to "/var/lib/nomad".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

A question though, @cpcloud:

Is what you do in https://github.com/cpcloud/nixpkgs/blob/f84c234a4f328b5c181cb9c955798cdfb5f57240/nixos/modules/services/networking/consul.nix#L7-L9 better:

  settings = {
    data_dir = "/var/lib/consul";
  } // cfg.settings;
in
      settings = mkOption {
        type = format.type;
        default = { };

or what I did with

      settings = mkOption {
        type = format.type;
        default = {
          data_dir = "/var/lib/nomad";
        };

? Does one have benefits vs the other? The let-bout one seems to have the drawback of not being obvious from documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I think that what you did is probably more obvious from a documentation perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably set it this way:

config = mkIf cfg.enable {
  services.nomad.settings = {
    data_dir = "/var/lib/nomad";
  };
};

@nh2 if you do that then data_dir will be overridden if the user puts anything in settings. By setting it the way I have mentioned it gives the sysadmin an escape hatch where they can set services.nomad.settings.data_dir = lib.mkForce "/something/else"; if they absolutely need to.

serviceConfig = {
DynamicUser = cfg.dropPrivileges;
ExecReload = "${pkgs.coreutils}/bin/kill -HUP $MAINPID";
ExecStart = "${cfg.package}/bin/nomad agent -config=/etc/nomad.json";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I would like to see here is a settingsFiles option that is a list of strings or paths. This allows ad-hoc configuration when enabling the service.

nh2
nh2 previously requested changes Jan 13, 2021
Copy link
Contributor

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, it is very useful. Some feedback as I explore it:

  • Also add the module to nixos/modules/module-list.nix
    • Edit: I did this.

nixos/modules/services/networking/nomad.nix Outdated Show resolved Hide resolved
@nh2
Copy link
Contributor

nh2 commented Jan 13, 2021

@lovesegfault I've taken the liberty to push some fixup commits into this PR that address many of the issues found above.

Please have a look at the changes; then we can squash them.

Using the fixed module, I've managed to get nomad started and it runs docker images fine, with this config:

{
  services.nomad = {

    enable = true;
    settings = {
      data_dir = "/var/lib/${config.systemd.services.nomad.serviceConfig.StateDirectory}";
      server = {
        enabled = true;
        bootstrap_expect = 1; # for demo; no fault tolerance
      };
      client = {
        enabled = true;
        network_speed = 1000; # Mbit/s
      };
    };
  };
}

I believe we should default enableDocker = true; as I've done here now, because the official Nomad tutorial uses docker as an example.

The example above includes network_speed = 1000; # Mbit/s because that's also in the official tutorial, but it seems to be either no longer used, or undocumented; I've pointed that out upstream: hashicorp/nomad#9385 (comment)

nh2 added a commit to nh2/nixos-vm-building that referenced this pull request Jan 13, 2021
nh2 added a commit to nh2/nixos-vm-building that referenced this pull request Jan 13, 2021
nh2 added a commit to nh2/nixos-vm-building that referenced this pull request Jan 13, 2021
@@ -34,6 +34,18 @@ in
Configuration for Nomad. See the <link xlink:href="https://www.nomadproject.io/docs/configuration">documentation</link>
for supported values.
'';
example = literalExample ''
# A minimal config example:
data_dir = "/var/lib/''${config.systemd.services.nomad.serviceConfig.StateDirectory}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a slight chicken-egg problem here. If user configures data_dir = "/my/nomad/data"; we'd still create StateDirectory not at that location. What should we do with that? Only set StateDirectory if the dirname of data_dir is /var/lib?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred way is to only provision the data/state directory if the user hasn't "customized" it. In this example there is a dataDir option, but the same idea can apply here:

serviceConfig = mkMerge [
  { # standard options
  }
  (mkIf (cfg.settings.data_dir == "/var/lib/nomad") { StateDirectory = "nomad"; })
];

I take special care to explain to the sysadmin their responsibilities, if any:

The directory or NFS/SMB network share where MPD reads music from. If left
as the default value this directory will automatically be created before
the MPD server starts, otherwise the sysadmin is responsible for ensuring
the directory exists with appropriate ownership and permissions.

Why? See the motivation section of the mentioned PR.

@@ -85,10 +96,16 @@ in
StateDirectory = "nomad";
TasksMax = "infinity";
User = optionalString cfg.dropPrivileges "nomad";
} // (if !cfg.enableDocker then { } else {
Copy link
Contributor

@cpcloud cpcloud Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use lib.optionalAttrs here:

lib.optionalAttrs cfg.enableDocker { SupplementaryGroups = "docker"; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

serviceConfig = mkMerge [
  { DynamicUser = cfg.dropPrivileges;
    ...
  }
  (mkIf cfg.enableDocker { SupplementaryGrousp = "docker"; })
];

Also, isn't SupplementaryGroups a list (conceptually)?

default = { };
default = {
# Agrees with `StateDirectory = "nomad"` set below.
data_dir = "/var/lib/nomad";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does data_dir need to be configurable? If so, then we should make StateDirectory match, and not set StateDirectory in cases where its dirname isn't /var/lib

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment.

Additional to this you should add an assertion corresponding to dropPrivileges - the only possibility for data_dir is under /var/lib if DynamicUser is used.

@cpcloud
Copy link
Contributor

cpcloud commented Jan 14, 2021

@nh2 Thanks for taking this on!

@lovesegfault
Copy link
Member Author

Rebased this branch on the latest master, also pushed some additional fixups.

@nh2 Thanks for all the help :)

@@ -40,7 +40,7 @@ in

enableDocker = mkOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure that having this open is that useful. It seems to give the illusion that a user only need set this to true to allow nomad to launch containers, and that isn't the case. You have to enable to docker task driver plugin.

I think we either remove this option altogether, or we have to add a client.enabled = true option (and then probably a server.enabled = true option) and then we have to enable the plugin.

Unfortunately, that still probably won't be enough. Users will want to be able to launch containers using images from a private registry, in which case we'd basically have to replicate the nomad client config structure here. Of course, when people see that option they might start expecting that other config options can be set in the same way and before long we've replicated the entire nomad client and server configuration structs here!

I propose that we leave this option out to avoid confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to give the illusion that a user only need set this to true to allow nomad to launch containers, and that isn't the case. You have to enable to docker task driver plugin.

@cpcloud For me it worked out of the box, I didn't have to enable a task driver plugin (as stated in #105739 (comment) with the config shown in there).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting! Ok then, I guess I'm wrong here. I agree then that we should keep this option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lovesegfault @aanderse @cpcloud Given this, what's your opinion on the default?

Mine is that we should default to true because docker is the most common way to use Nomad, and one can only play through the official tutorial when it's enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh2 I agree with you. enableDocker = true seems like the right default.

@lovesegfault
Copy link
Member Author

Are there any objections before I squash these commits and merge?

cc. @aanderse @cpcloud @nh2

@aanderse
Copy link
Member

@lovesegfault yeah this looks great 👍 Awesome work 🎉

@cpcloud
Copy link
Contributor

cpcloud commented Jan 17, 2021

LGTM!

@nh2
Copy link
Contributor

nh2 commented Jan 17, 2021

The example above includes network_speed = 1000; # Mbit/s because that's also in the official tutorial, but it seems to be either no longer used, or undocumented; I've pointed that out upstream: hashicorp/nomad#9385 (comment)

Upstream replied; it's truly obsolete so we should remove it:

@lovesegfault
Copy link
Member Author

The example above includes network_speed = 1000; # Mbit/s because that's also in the official tutorial, but it seems to be either no longer used, or undocumented; I've pointed that out upstream: hashicorp/nomad#9385 (comment)

Upstream replied; it's truly obsolete so we should remove it:

* [hashicorp/nomad#9385 (comment)](https://github.com/hashicorp/nomad/issues/9385#issuecomment-759443116)

* [hashicorp/nomad#9793](https://github.com/hashicorp/nomad/pull/9793)

Removed :)

Co-authored-by: Niklas Hambüchen <mail@nh2.me>
@lovesegfault
Copy link
Member Author

Will merge this tonight

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Ship it.

@lovesegfault lovesegfault merged commit f353f24 into NixOS:master Jan 18, 2021
@lovesegfault lovesegfault deleted the nomad-module branch January 18, 2021 02:20
@lovesegfault
Copy link
Member Author

Thanks everyone for the help and reviews!

@cpcloud
Copy link
Contributor

cpcloud commented Jan 18, 2021

Great work, y'all!

@nh2
Copy link
Contributor

nh2 commented Jan 18, 2021

I think turning docker back on by default was forgotten:

#105739 (comment)

Also @aanderse's comment from https://github.com/NixOS/nixpkgs/pull/105739/files#r557693150 (especially the one linked in there), which is a good point.

@lovesegfault
Copy link
Member Author

I think turning docker back on by default was forgotten:

#105739 (comment)

Also @aanderse's comment from https://github.com/NixOS/nixpkgs/pull/105739/files#r557693150 (especially the one linked in there), which is a good point.

I'll push a fix to enable docker by default again right away.

The other comment I think deserves a follow up PR.

@lovesegfault
Copy link
Member Author

Addressed the default issue: e134019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants