New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/acme: adjust renewal timer options #80856
Conversation
4379912
to
6e404aa
Compare
Confirmed that this still passes the acme tests for me. |
@emilazy I like how the first commit is nice and clean, simple, documented, and will be easy to test + validate. I like how in your backport you only include this commit. Thanks for doing the research and reaching out to the Let's Encrypt team! I think it will be pretty easy to have the collective of relevant committers agree to merge the first commit. The other 2 commits might require some more discussion, and could potentially hold this PR up, which would hold up the backport PR. Given the stage we're at (with regards to the |
The first two look good to me! I probably need this on my own setup with the 28 certs I'm trying to validate there, to avoid rate limits 😛 As for the additionalLegoRenewFlags, I actually had something similar ( I am not against the idea either, for power users there is certainly desire to allow for custom flags to the service, but they run the above risk of it breaking some day. Would love to hear some other opinions on the matter. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think these changes are good 👍 My comment on code complexity here isn't really because of your code, but I think here and now might be the time to consider if we can simplify the command line argument logic at all... if anyone is up to the challenge, or has any great idea.
nixos/modules/security/acme.nix
Outdated
runOpts = escapeShellArgs (globalOpts ++ [ "run" ] ++ certOpts); | ||
renewOpts = escapeShellArgs (globalOpts ++ | ||
[ "renew" "--days" (toString cfg.validMinDays) ] ++ | ||
certOpts ++ data.additionalLegoRenewFlags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globalOpts
, certOpts
, runOpts
, and renewOpts
- each of which is determined by a load of module options. I worry the code is becoming too complicated to properly test against all potential regressions on future changes.
Anyone have any ideas to simplify this, all while adding the flexibility @emilazy has added here? 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent an unreasonable amount of time thinking about this already. There's just a lot of different ways to configure your certs (multiple places for email, implicit selection between dns and http for example). Maybe using some of the lib.optional* functions would help clean it up a bit. The fact that you need 2 separate strings for "run" and "renew" is a bit annoying too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have good ideas here, other than for lego to support proper configuration files itself or something. I currently have to do
{
# TODO: DNS validation
security.acme = {
email = "...";
acceptTerms = true;
certs = let cfg = {
user = "nginx";
group = "nginx";
mustStaple = true;
}; in {
"emily.moe" = cfg // { extraDomains = { "patchouli.emily.moe" = null; }; };
"mew.build" = cfg;
};
};
}
to share options between my certificates, so the current configuration model certainly isn't ideal.
Updated to split out the Must-Staple/extra flags commits into #80900, and also divide the nervously eyes https://github.com/NixOS/nixpkgs/pulls/emilazy, waiting for it to grow pagination... |
An explanation for the AccuracySec change, in case you find the timers as confusing as I do:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. This is one heck of a module now, I'd love to try my hand at simplifying it at some point.
At least we have a good chunk of people who are interested in this module now. I'm happy to set up some brainstorming session (Perhaps we can create an IRC channel; #nixos-acme) to think how we can tame this code a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave it. I'm also not convinced that assert will live forever
…On Mon, Feb 24, 2020, 16:40 Emily ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In nixos/modules/security/acme.nix
<#80856 (comment)>:
> @@ -399,7 +399,17 @@ in
systemd.tmpfiles.rules =
map (data: "d ${data.webroot}/.well-known/acme-challenge - ${data.user} ${data.group}") (filter (data: data.webroot != null) (attrValues cfg.certs));
- systemd.timers = flip mapAttrs' cfg.certs (cert: data: nameValuePair
+ systemd.timers = let
+ # Allow systemd to pick a convenient time within the day
+ # to run the check.
+ # This allows the coalescing of multiple timer jobs.
+ # We divide by the number of certificates so that if you
+ # have many certificates, the renewals are distributed over
+ # the course of the day to avoid rate limits.
+ numCerts = length (attrNames cfg.certs);
+ _24hSecs = 60 * 60 * 24;
+ AccuracySec = if numCerts == 0 then null else "${toString (builtins.div _24hSecs numCerts)}s";
Makes sense; I was just defensively avoiding the division by zero, though
I suppose with lazy evaluation there's not really much point. I can adjust
this if you'd prefer, though I'm not entirely convinced that that assert
will last forever :)
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#80856?email_source=notifications&email_token=AAEZNI44A5REZ6XKUGTWPKDREPS5FA5CNFSM4KZWVUP2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCWVFI6I#discussion_r383340084>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEZNIYGBM7IL3SCTP6SR3LREPS5FANCNFSM4KZWVUPQ>
.
|
The current weekly setting causes every NixOS server to try to renew its certificate at midnight on the dot on Monday. This contributes to the general problem of periodic load spikes for Let's Encrypt; NixOS is probably not a major contributor to that problem, but we can lead by example by picking good defaults here. The values here were chosen after consulting with @yuriks, an SRE at Let's Encrypt: * Randomize the time certificates are renewed within a 24 hour period. * Check for renewal every 24 hours, to ensure the certificate is always renewed before an expiry notice is sent out. * Increase the AccuracySec (thus lowering the accuracy(!)), so that systemd can coalesce the renewal with other timers being run. (You might be worried that this would defeat the purpose of the time skewing, but systemd is documented as avoiding this by picking a random time.)
@GrahamcOfBorg build nixosTests.acme |
Passes @aanderse Now that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yegortimoshenko yeah, I'm a fan of this change 👍 I also think we should merge the backport, so long as the release managers agree.
EDIT: has someone addressed the test failure? I thought there was a conversation about that...
Couldn't get it to fail locally; https://github.com/NixOS/nixpkgs/pull/80856/checks?check_run_id=476781247 also passes. https://github.com/NixOS/nixpkgs/pull/80856/checks?check_run_id=476784925 (another attempt) does fail. Looks like a race condition? Don't think that it's introduced by this PR. |
This is a very nice change with well substantiated evidence for improvement. Backport is fine with me. |
Motivation for this change
The current weekly setting causes every NixOS server to try to renew its certificate at midnight on the dot on Monday. This contributes to the general problem of periodic load spikes for Let's Encrypt; NixOS is probably not a major contributor to that problem, but we can lead by example by picking good defaults here.
The values here were chosen after consulting with @yuriks, an SRE at Let's Encrypt:
Randomize the time certificates are renewed within a 24 hour period.
Check for renewal every 24 hours, to ensure the certificate is always renewed before an expiry notice is sent out.
Increase the AccuracySec (thus lowering the accuracy(!)), so that systemd can coalesce the renewal with other timers being run.
(You might be worried that this would defeat the purpose of the time skewing, but systemd is documented as avoiding this by picking a random time.)
For more information justifying these changes, see the following documents:
I also added trivial commits to expose the--must-staple
option and allow users to specify additional custom renewal options, as free goodies to encourage merging :pI've tested these changes on my server, and everything seems fine, but I'm just one person; please do look over this carefully!
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)