Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/certmgr: init #44556

Merged
merged 1 commit into from Aug 10, 2018
Merged

nixos/certmgr: init #44556

merged 1 commit into from Aug 10, 2018

Conversation

johanot
Copy link
Contributor

@johanot johanot commented Aug 6, 2018

Motivation for this change

Added nixos module for easy configuration of certmgr daemon service.
See #44406 for reference.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am again :)

description = "The interval before a certificate expires to start attempting to renew it.";
};

interval = mkOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these 2 options (interval and before), the acme module already has options that probably do the same: security.acme.{renewInterval,validMin}. So I suggest using the acme names for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in: a94946e

};
}
'';
type = with types; (attrsOf (submodule ({ ... }: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be just

type = with types; attrsOf (submodule {

And therefore remove 2 parens at the end of this section.

And it might work well to use attrsOf (either path (submodule { here instead, this would be able to replace impSpecs (so this should probably be renamed to just specs then).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed parens as part of: a94946e .. Will look into the possibility (and potential problems) of merging impSpecs and declSpecs now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined impSpecs and declSpecs in e77155c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the intention behind combining the specs, I think it'll lead to a poorer understanding of the intended separation. It's important that impSpecs are never imported into the store, since they'll most likely (always?) contain secrets. The either machinery necessary to facilitate a joined specs introduces a lot of opportunities for future maintainers to accidentally import the paths -- and even if it's safe (which it is in the current state) the error messages if you muck up the value are not great.

In contrast, impSpecs might look non-dry, but it requires very trivial machinery and it'll be easier to ensure that none of its values are accidentally leaked into the store.

service = mkOption {
type = nullOr str;
default = null;
description = "The service on which to perform <action> after fetching.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The > should also be escaped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a94946e

mkTargetDirs = mkOption {
default = false;
type = types.bool;
description = "Determines whether target directories (including parents) should be created for keys and certs in declarative specs.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem with always setting this to true? Why is it false by default? I would like to have this handled automatically, an option for this doesn't seem justified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed option mkTargetDirs as part of e77155c

type = types.enum [ "circus" "command" "dummy" "openrc" "systemd" "sysv" ];
description = ''
The behavior to use when executing actions after fetching certs
See: <link xlink:href="https://github.com/cloudflare/certmgr#certmgryaml" />.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this description correct? The linked page says "this specifies the service manager to use for restarting or reloading services."

Also, can this ever be anything other than systemd? We are on NixOS after all, which doesn't support anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops.. c/p error. as for the description. It is changed as part of a94946e.

Regarding other init-systems, yes. We probably won't need them in NixOS. But the command service manager is useful, when you just want to execute a verbatim command or script instead of invoking systemd, and dummy can be useful for debugging.

I'm personally ok with leaving the list as-is (vendor provided). But if you think so, I'm also happy to remove circus openrc and sysv. As long as default remains to be systemd.

(["mkdir -p"] ++ (uniqueFilter (catDirsRecursive (n: v: n == "path") cfg.declSpecs)))
)
)
("${pkgs.certmgr}/bin/certmgr -f ${certmgrYaml} check")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary parens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a94946e

machine = { config, lib, pkgs, ... }:
{
networking.firewall.allowedTCPPorts = with config.services; [ cfssl.port certmgr.metricsPort ];
networking.extraHosts = lib.concatStringsSep "\n" [ "127.0.0.1 imp.example.org decl.example.org" ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the same as the one string itself, no need for concatStringsSep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a94946e

Type = "oneshot";
WorkingDirectory = config.services.cfssl.dataDir;
};
script = with pkgs; ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer without with pkgs; here, it's only needed 3 times and it's rather short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. a94946e

'';
};

services.nginx = with pkgs; {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, pkgs is only needed once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. a94946e

];
};
service = "nginx";
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file could need a big nice let in at the start to declare everything and make the code much more readable, so many brackets right now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. e77155c

@infinisil
Copy link
Member

Also: Can this in any way interact with ACME? NixOS has a great ACME module, maybe something can be done with that.

@johanot
Copy link
Contributor Author

johanot commented Aug 7, 2018

@infinisil Let me start with the general question about acme. I like acme and Let's Encrypt very much myself, and I use it everywhere I can. I can think of only one (maybe two) scenarios where using acme is not possible.

  1. When you don't have access to the internet at all, or you just don't have any means of responding to acme challenges.

  2. When you need x509 certificates with non-DNS names in the subject section.

It's no secret that I plan to use cfssl and certmgr for a refactoring of the Kubernetes module. The Kubernetes PKI-setup in particular, requires certificates with rolenames like system:controller-manager and system:scheduler stamped in the certificate subject section, common name (CN) field. As far as I know, this is not something you can request with Let's Encrypt?

.. and even if you could change the default system role names to something like controller-manager.kubernetes.domain.org, you would not be able to use Kubernetes without opening your cluster to the internet. IMHO, cluster-components should be able to communicate via TLS without depending on external services.

All that said; I really wish products like cfssl or Vault PKI had support for the acme protocol. That though, would also require a change of the NixOS acme module, AFAIK? It's hardcoded for Let's Encrypt as issuer :).

otherCert = "/var/certmgr/specs/other-cert.json";
}
'';
type = with types; attrsOf (either (submodule ({ ... }: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The { ... }: here is still unnecessary, can drop another pair of () then.

Is there a problem with having the path type in front of the submodule as originally recommended? I'd rather not go scrolling down many lines to find the other part of the either. So I suggest using either path (submodule { ... as previously mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove the { ... }:, no worries :)

I'm so happy you ask about the order of the types, because yes there is a problem actually. IMO it's a bug in the module system. You see, the type check for types.path is defined as builtins.substring 0 1 (toString x) == "/"; . This means that if x is of type attrs, then (toString x) will fail with cannot coerce set to string.

types.either is defined as t1.check || t2.check. This means that if the submodule is t1, then the t1 type check will guard the failing path check, which thus will never be executed for sets. If, on the other hand, the submodule becomes t2 in the either-clause, then the path type-check will fail.

I think this should be reported as a bug. The path type-check should somehow check if the input is coercible to a string (and if not return false) before treating it as a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ ... } removed in 9082a61

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the check for path really be (builtins.tryEval (builtins.substring 0 1 (toString x) == "/")).value? It feels icky to use tryEval here, but I certainly do expect either to not care about the order of its arguments, and path does break this as described.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard my comment here, it was pointed out to me that it's complete nonsense, since string coercion errors aren't catchable with tryEval.

allSpecs = pkgs.buildEnv {
name = "certmgr.d";
paths = optional (specs != []) (pkgs.linkFarm "certmgr-specs" specs);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the same as just allSpecs = pkgs.linkFarm "certmgr.d" specs;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. I just forgot to reduce it after all my testing. Fixed in 9082a61

name = n + ".json";
path = if isAttrs v then pkgs.writeText name (builtins.toJSON v) else v;
in
{ inherit name; inherit path; })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also use this which looks a bit cleaner imo:

specs = mapAttrsToList (n: v: rec {
  name = n + ".json";
  path = if isAttrs v then pkgs.writeText name (builtins.toJSON v) else v;
}) cfg.specs;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to match your suggestion in: 9082a61

if isAttrs v then
collect isString (filterAttrsRecursive (n: v: isAttrs v || n == "path") v)
else
v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things: n isn't used -> concatMap/map instead. More descriptive name instead of v (can be confusing with later v). Named this variable properly, and removed the flexibility of set and using cfg.specs always instead, that's all we need. I don't think using unique is worth it, it's an expensive operation (O(n^2)) and mkdir handles duplicates just fine. And some more things. All in all I think it should look like this:

specPaths = map dirOf (concatMap (spec:
  if isAttrs spec then
    collect isString (filterAttrsRecursive (n: v: isAttrs v || n == "path") spec)
  else
    [ spec ]
) (attrValues cfg.specs)));

preStart = ''
  ${concatStringsSep " \\\n" (["mkdir -p"] ++ map escapeShellArg specPaths)}
  ...
''

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to match your suggestion in: 9082a61


config = {
assertions = with lib; singleton {
assertion = !any (hasAttrByPath [ "authority" "auth_key" ]) (mapAttrsToList (_: value: value) cfg.specs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapAttrsToList (_: value: value) cfg.specs = attrValues cfg.specs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9082a61 .. And I added an extra assertion to check if the list of specs is empty.

$machine->waitUntilSucceeds('ls /tmp/imp.example.org-key.pem');
$machine->waitUntilSucceeds('ls /tmp/imp.example.org-cert.pem');
$machine->waitForUnit('nginx.service');
$machine->succeed('[ "1" -lt "$(journalctl -u nginx | grep "Starting Nginx" | wc -l)" ]');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this weird check? Maybe a $machine->sleep(3) will work instead if the waitForUnit really isn't enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is there to confirm that nginx is in fact restarted by certmgr after certificate issue. Starting nginx ... would be present in the journal more than once if nginx was restarted. waitForUnit will terminate just fine even if certmgr restart action is set to no-op/dummy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see

@infinisil
Copy link
Member

Regarding the thing mentioned by @srhb in #44556 (comment) we had a small discussion about it: https://logs.nix.samueldr.com/nixos-dev/2018-08-09#1533829782-1533831340;

I personally like to have this additional complexity of having path and option specs combined. I feel it makes more sense, since a spec of a specific name can be one of either, which corresponds to this name -> path/attrset mapping.

But it also has the potential for messing up the implementation (the isAttrs distinguishing), which I think is also important to get right, especially with security related things, nothing sensitive should land in the store and such.

An actual good solution would be to have ADT's in the module system (see @shlevy's https://github.com/shlevy/nix-adt, which I think doesn't fully fit into the module system however), but that's probably a bit off as of now.

@johanot What do you think of this? I'm good with either, keeping the options for path/attrsets together or splitting them.

When that's cleared up, this PR looks about ready for merging to me :)

serviceConfig = {
Restart = "always";
RestartSec = "10s";
ExecStart = "${pkgs.certmgr}/bin/certmgr -f ${certmgrYaml}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost missed this, but very important:

Like this, this service is running as root, which should be avoided (also see #41092). Check out the DynamicUser and RuntimeDirectory systemd options, they should probably suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is not as simple as DynamicUser.. Ok.. I looked at the acme module, curious to see how it handles private key file-permissions, when running as non-root. Certmgr has to be able to issue certificates, and generate private keys for daemon A, B and C - each running with different users - and still private key files must be protected and readable for the exact daemon it is intended to only.

It looks like the acme module creates a systemd service for each cert, allowing for different users per cert. @infinisil .. Do you think I should rewrite the certmgr module along the same lines, or do you have other suggestions?

};

config = mkIf cfg.enable {
assertions = with lib; [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib is already with;'d above, can be removed.

};

action = mkOption {
type = addCheck str (x: cfg.svcManager == "command" || (any (e: e == x) ["restart" "reload" "nop"]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any (e: e == x) [ ... ] = elem x [ ... ]

the parens around that can also be removed, function application binds very strongly

@johanot
Copy link
Contributor Author

johanot commented Aug 9, 2018

@infinisil I understand the pros and cons of combined vs. separate imp and decl-specs. I'm fine with leaving it as is (combined) for now if you are. I'll do the rest of the minor changes tomorrow. But then there is the question of running as non-root. Please let me know whether you agree on the acme-approach (one service per cert), if yes, I'll start implementing that tomorrow morning as well.

@infinisil
Copy link
Member

Oh I see, well if that's the case then it's probably fine to use root, acme uses root by default as well. I'm fine with leaving it like that. No need for additional complexity, unless somebody really needs it.

@johanot
Copy link
Contributor Author

johanot commented Aug 10, 2018

@infinisil I removed the with lib and replaced any (e == x) with elem x. And then squashed. Feel free to merge if you have no more comments.

@srhb
Copy link
Contributor

srhb commented Aug 10, 2018

@GrahamcOfBorg test certmgr.systemd certmgr.command

@GrahamcOfBorg
Copy link

Timed out, unknown build status on x86_64-linux (full log)

Attempted: tests.certmgr.systemd, tests.certmgr.command

Partial log (click to expand)

cannot build derivation '/nix/store/g7zdr4scn3d2z2a50xmb0dglrkr4xp1j-closure-info.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/k1brark7avhlyfa9wjgfd8wyv2ypx3kf-run-nixos-vm.drv': 2 dependencies couldn't be built
cannot build derivation '/nix/store/z447zilzdqr64zxymx2jzcdwxdzxf1k7-run-nixos-vm.drv': 2 dependencies couldn't be built
cannot build derivation '/nix/store/8fpgfhvy64dr0mgfqhscyfpfags1hhlf-nixos-vm.drv': 2 dependencies couldn't be built
cannot build derivation '/nix/store/qi3f68ci4qnxdq8vd89gp6vv6jn0m25z-nixos-vm.drv': 2 dependencies couldn't be built
cannot build derivation '/nix/store/d1ad60c27qqhw5rvazdsm574j87vyyyh-nixos-test-driver-certmgr-command.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/zw43nr06v25nzkagmb835lziy7640hxa-nixos-test-driver-certmgr-systemd.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/slmynqbg7gmfwyhqaa7ixximq5c7bnd2-vm-test-run-certmgr-command.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/y07n52mifs6rl412h6cqvsqgp28i2az0-vm-test-run-certmgr-systemd.drv': 1 dependencies couldn't be built
error: build of '/nix/store/slmynqbg7gmfwyhqaa7ixximq5c7bnd2-vm-test-run-certmgr-command.drv', '/nix/store/y07n52mifs6rl412h6cqvsqgp28i2az0-vm-test-run-certmgr-systemd.drv' failed

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: tests.certmgr.systemd, tests.certmgr.command

Partial log (click to expand)

syncing
machine: running command: sync
machine: exit status 0
test script finished in 180.68s
cleaning up
killing machine (pid 631)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/kpscp4py6bq4b5ivy8vak1ignaq2zx3i-vm-test-run-certmgr-systemd
/nix/store/n9p88lj8l3wrbp8hggmga216dgw726h2-vm-test-run-certmgr-command

@infinisil
Copy link
Member

Ran both tests locally, both successfully. Thanks for the PR! I'll merge in a bit.

@infinisil infinisil merged commit 1a3b9e1 into NixOS:master Aug 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants