Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/acme: Fix ordering of certificate requests (#81482) #85223

Merged
merged 5 commits into from Jun 29, 2020

Conversation

arianvp
Copy link
Member

@arianvp arianvp commented Apr 14, 2020

Motivation for this change

Fixes #81842

Given ACME is considered blocker for 20.03 I think we should backport this change if possible.

Things done

Fixes #81842 by ordering certificate requests after the web service has started up
Adds a regression test too

There are no tests for the apache module. I didn't even know it existed before this PR. @aanderse I also ported the fix to that, but I have not written any tests for it. I'd love to see this module tested. Keeping both in sync is a bit painful if one of the two is only tested and the other isn't. We already have quite a few problems we need to solve with the acme module. It being used by two webservers for me is a bit of a maintenance burden personally, but I'm fine if someone else keeps it up to feature parity

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@arianvp
Copy link
Member Author

arianvp commented Apr 14, 2020

self-review: The regression test isn't actually added to all-tests file so it isn't being run by CI. should add it

@m1cr0man
Copy link
Contributor

It will likely be necessary to add the selfsigned certs services to the before rules for nginx and apache too. I know at least for apache that if no certificate files exist for a domain specified in the config it will fail to start up. Usually I get around this on new machines by starting the selfsigned services by hand but that doesn't scale well.

@arianvp
Copy link
Member Author

arianvp commented Apr 14, 2020

That before rule already exists as far as I see... At least for nginx

@m1cr0man
Copy link
Contributor

Oh..really? I did a systemctl list-dependencies on my own host and I couldn't see it. What depends on it?

@arianvp
Copy link
Member Author

arianvp commented Apr 14, 2020

after = [ "network.target" ] ++ map (vhostConfig: "acme-selfsigned-${vhostConfig.serverName}.service") acmeEnabledVhosts;

HEre ngnix is ordered after the self signed cert, as you'd expect. (Nginx will fail to start if the self-signed cert is not yet there)

On my server it also shows up as a dependency for nginx, and nginx is ordered after the selfsigned correctly.

[root@arianvp:~]# systemd-analyze dot nginx.service
digraph systemd {
	"multi-user.target"->"nginx.service" [color="green"];
	"multi-user.target"->"nginx.service" [color="grey66"];
	"nginx.service"->"acme-selfsigned-arianvp.me.service" [color="green"];
	"nginx.service"->"acme-selfsigned-arianvp.me.service" [color="grey66"];
	"nginx.service"->"acme-arianvp.me.service" [color="grey66"];
        (...sic..)
}
   Color legend: black     = Requires
                 dark blue = Requisite
                 dark grey = Wants
                 red       = Conflicts
                 green     = After
[root@arianvp:~]# systemctl show nginx -p After
After=acme-selfsigned-arianvp.me.service

@arianvp
Copy link
Member Author

arianvp commented Apr 14, 2020

@GrahamcOfBorg test acme

@m1cr0man
Copy link
Contributor

Here ngnix is ordered after the self signed cert, as you'd expect. (Nginx will fail to start if the self-signed cert is not yet there)

Ah ok. The same does not exist for httpd it seems.

@arianvp
Copy link
Member Author

arianvp commented Apr 14, 2020

@m1cr0man that also doesn't seem to be true. It seems to be there. Though ACME support for apache httpd wasn't there yet in 19.09 so perhaps you're referring to the old situation before this was merged?

after = [ "network.target" "fs.target" ] ++ map (hostOpts: "acme-selfsigned-${hostOpts.hostName}.service") vhostsACME;

@m1cr0man
Copy link
Contributor

@m1cr0man that also doesn't seem to be true. It seems to be there. Though ACME support for apache httpd wasn't there yet in 19.09 so perhaps you're referring to the old situation before this was merged?

Ah, looking at this now I believe it's not showing up on my system because I'm not using enableACME with my vhosts (doing it manually because I have a wildcard certificate). Never mind me so! 😅

@arianvp
Copy link
Member Author

arianvp commented Apr 15, 2020

@GrahamcOfBorg test acme

The test seems flakey :/ https://logs.nix.ci/?key=nixos/nixpkgs.85223&attempt_id=207f0be0-8ba1-41a8-93e4-96fee6d90a24

This seems to be because the certificate issuer server's clock is out of sync with the webserver. UGH.

client # curl: (60) SSL certificate problem: certificate is not yet valid
client # More details here: https://curl.haxx.se/docs/sslcerts.html
client #
client # curl failed to verify the legitimacy of the server and therefore could not
client # establish a secure connection to it. To learn more about this situation and
client # how to fix it, please visit the web page mentioned above.

@arianvp
Copy link
Member Author

arianvp commented Apr 15, 2020

hmm test failed again, but I can't reproduce it locally. It actually fails for two reasons.

 - Test "Can request wildcard certificates using DNS-01 challenge" failed with error: "command `curl --cacert /tmp/ca.crt https://c.example.com/ | grep -qF 'hello world'` failed (exit code 1)"

fails because of time-travelling. however it is much more flakey on CI then it seems locally. I can't reproduce

client: must succeed: curl --cacert /tmp/ca.crt https://d.example.com/
client # curl: (60) SSL certificate problem: self signed certificate in certificate chain
client # More details here: https://curl.haxx.se/docs/sslcerts.html
client #
client # curl failed to verify the legitimacy of the server and therefore could not
client # establish a secure connection to it. To learn more about this situation and
client # how to fix it, please visit the web page mentioned above.
client: output:
error: command `curl --cacert /tmp/ca.crt https://d.example.com/` failed (exit code 60)
(15.79 seconds)

seems to be a legitimate issue; but I cant reproduce locally either

@arianvp
Copy link
Member Author

arianvp commented Apr 15, 2020

The tests now passed. but we should probably open an issue about the flakey tests

@flokli
Copy link
Contributor

flokli commented Apr 21, 2020

With #85503 merged, can this be rebased on top of master?

@datafoo
Copy link
Contributor

datafoo commented May 22, 2020

ping

@arianvp arianvp self-assigned this May 22, 2020
@arianvp
Copy link
Member Author

arianvp commented May 22, 2020

I'll rebase this onto master this weekend.

@datafoo
Copy link
Contributor

datafoo commented Jun 15, 2020

I'll rebase this onto master this weekend.

Sorry to bother again. Do you think you will have the time to work on it in a near future?

Reads a bit more naturally, and now the changes to the
acme-${cert}.service actually reflect what would be needed were you to
do the same in production.

e.g.  "for dns-01, your service that needs the cert needs to pull in the
cert"
This fixes NixOS#81842

We should probably also fix this for Apache, which recently also learned
to use ACME.
@arianvp
Copy link
Member Author

arianvp commented Jun 15, 2020

Rebased. @NixOS/acme could you have another look?

@arianvp arianvp requested a review from a team June 15, 2020 09:05
@arianvp
Copy link
Member Author

arianvp commented Jun 15, 2020

@GrahamcOfBorg test acme

Copy link
Contributor

@m1cr0man m1cr0man left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for implementing it for both web servers ;)

@flokli flokli merged commit aed85b7 into NixOS:master Jun 29, 2020
@flokli
Copy link
Contributor

flokli commented Jun 29, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

security.acme timeout because services.nginx takes time to start
4 participants