New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/acme: Fixes for account creation and remove tmpfiles usage #106857
Conversation
Heads up @NixOS/acme |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/migrating-letsencrypt-acme-certificates-from-another-server/9925/2 |
This means that all systems running from master will trigger new certificate creation on next rebuild. Race conditions around multiple account creation are fixed in NixOS#106857, not this commit.
I didn't read this in detail, but I wonder if using a systemd target is the right thing. Starting a target twice will not start the dependent services twice if I understand correctly. |
Closes NixOS#106565 When generating multiple certificates which all share the same server + email, lego will attempt to create an account multiple times. By adding an account creation target certificates which share an account will wait for one service (chosen at config build time) to complete first.
systemd-tmpfiles is no longer required for most of the critical paths in the module. The only one that remains is the webroot acme-challenge directory since there's no other good place for this to live and forcing users to do the right thing alone will only create more issues.
Closes NixOS#106603 Some webservers (lighttpd) require that the files they are serving are world readable. We do our own chmods in the scripts anyway, and lego has sensible permissions on its output files, so this change is safe enough.
The instructions on recreating the cert were missing --what=state. Also added a note on ensuring the group of manual certs is correct.
Rebased |
Nvm, I confused this with #101482. |
Uh, this isn't a backport? These PRs are fresh. I accidently cherry-picked and pushed your commit for #107769 in here, then reverted it. |
I found a logical error in the bash script, but during debugging I enabled command echoing and realised it would be a good idea to have it enabled all the time for ease of bug reporting.
This worked for me. |
Anything that needs to be done before this is merged? |
Nope, I don't have any more changes to make to this. |
I would also like to highlight the other smaller acme PRs I've had open for months. They too need to be merged for some essential fixes. At this point, I would prefer to see this PR merged first then I can work on rebasing and adjusting the other PRs as necessary. |
Let's get this in. |
EDIT: I am not sure whether or not the issue I describe below is related, as I am using Referring to #106603, that this pull request addresses among others, I still get the 403, using nixops on an AWS EC2 instance.
This is my
And I can confirm that, upon visiting http://staging.rubenmoor.net/.well-known/acme-challenge/gLk9Zwwlz1RTXqPBWyOC2i0xuQTDaLtd7nUk89Fl6WA in a browser, that there is 403. My nixops deployment configuration:
|
Assuming you're using the default webroot for certs, can you check the permissions and ownership of |
Owner and group are
|
Ah of course, I'm just after reading your config thoroughly. Because you've manually configured your certs through nginx instead of using the builtins (like services.nginx.virtualHosts..enableACME), you have the wrong group configured for access. The fix is simple: Set security.acme.certs..group to |
Oh, also, you may need to manually set /var/lib/acme/.challenges to "nginx" group afterwards (since now it already exists, it won't get created with the right permissions). Alternatively delete the folder before you fix the config and the module will recreate it with the right permissions :) |
Following your suggestion, now the folders all have owner/group set to Also, the configuration option Instead it says
Also, permissions are set to 754 throughout for all the folders mentioned above. |
Update: the timeout error is based on a different problem in my configuration. During the deployments I have changed the IP at least once, having my domain pointing nowhere. There might have been yet another issue in my security group where I was missing the
By the looks of it, I am stuck with one last issue and about to sort that out myself: I use nginx for ssl certs and my custom server called "backend" to run my app. This custom server is run as a service, thus run by the root user. However, |
Finally got my setup to work. Thanks for your advice, @m1cr0man . Indeed, I needed to set Unrelated, but ultimately, I had configure |
Glad you got sorted! This error in the docs is fixed in #100356 which hopefully I'll get merged soon. |
FYI: Backport of this PR for NixOS 20.09: #112145 (needs reviewers / not yet merged). |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/trying-to-get-a-lets-encrypt-cert-to-renew-connection-refused/11009/2 |
@@ -248,13 +272,18 @@ let | |||
|
|||
# Working directory will be /tmp | |||
script = '' | |||
set -euo pipefail | |||
set -euxo pipefail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man, is this annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so? I added it because debugging people's issues without the shell commands in the logs is quite difficult. Also so much functionality of cert renewal happens in bash. Personally I've found it quite useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't debugging output only be enabled when you're, well, debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well with acme, you know you want to debug only after the fact. besides, the amount of logs this generates is not that big...
Motivation for this change
Closes #106565
Possibly closes some other ACME tickets related to JWS token errors too. A new systemd target named
acme-account-$hash.target
has been added, which is depended on (requiredBy + after) by all but 1 cert service associated with the same account. The one remainder is depended on (requires + before) to do the account creation.Note that I changed the hash data for the account directory which will trigger new account creation on all systems. Given that rate limit issues should now be resolved, this will simply test that I did a good enough job of it ;)
Closes #106603
Umask is changed to 0023 to accommodate for lighttpd
Closes #103121
Remove systemd-tmpfiles dependency follows from this discussion and
may solvehas been reported to solve #103121. I was tempted to totally remove the use of tmpfiles but I couldn't find a good home for the webroot directory creation. Ideally, it should already be created by the user but since most users don't set a webroot at all and expect it to "just work" through httpd/nginx I left it in.Docs update are primarily to cover follow-up in #81634 and fix the
systemctl clean
command.In the test suite, I added a wait for "network-online.target" on the pebble server due to a one time test failure I had because the host wasn't reachable. I hope this resolves it, but it's hard to test.
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)