Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos: mutually exclusive services; application to acme #102387

Closed
wants to merge 2 commits into from

Conversation

symphorien
Copy link
Member

Motivation for this change

In #101445 one of the hypothetical solutions is to ensure certs are renewed one at a time. But this is not the first time I wanted to be able to specify that some oneshot systemd services must not run simultaneously. Notably, I have some periodic io intensive jobs with a timer. To prevent them from running at the same time, I have to choose a unique hour for each of them myself. Instead I would like to set them all to "daily" and mark them as mutually exclusive.

So I implemented a generic solution.

This introduces a systemd.mutex options to define sets of mutually exclusive services, working by adding After= and Before= stanzas. The nix implementation is not very pretty and limited to services (does it really make sense on non-services?).
As illustrations, I added a test and applied it to acme to see if it fixes #101445. Unfortunately, as it seems we cannot reproduce the issue reliably, it's hard to say if it really fixes it...

I can split the acme commit from the generic implementation if you want, and if the nix code is to ugly I can try other things, but before I'd like to hear some opinions about whether we want this mechanism at all.

Tested with the systemd.nix and acme.nix tests.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@aanderse
Copy link
Member

aanderse commented Nov 1, 2020

Doesn't this seem like a shortcoming of upstream systemd if the software can't do this? The NixOS community has had success in the recent past asking upstream for features.

@lheckemann
Copy link
Member

lheckemann commented Nov 2, 2020

This seems to me to overlap somewhat with the Conflicts= relationship? It certainly feels awkward… :/

I think having the services in question acquire (and wait for) a write lock on a common file would be a more elegant approach.

@vikanezrimaya
Copy link
Member

RandomizedDelaySec= in the relevant .timer files set to a big value might also help you.

@symphorien
Copy link
Member Author

This seems to me to overlap somewhat with the Conflicts= relationship? It certainly feels awkward… :/

It is different from Conflict in a way similar to how Wants is different from After.
The manpage reads:

   Conflicts=
       A space-separated list of unit names. Configures negative requirement dependencies. If a unit has a Conflicts= setting on another unit, starting the former will stop the latter and
       vice versa.
       Note that this setting does not imply an ordering dependency, similarly to the Wants= and Requires= dependencies described above. This means that to ensure that the conflicting unit
       is stopped before the other unit is started, an After= or Before= dependency must be declared. It doesn't matter which of the two ordering dependencies is used, because stop jobs
       are always ordered before start jobs, see the discussion in Before=/After= below.
       If unit A that conflicts with unit B is scheduled to be started at the same time as B, the transaction will either fail (in case both are required parts of the transaction) or be
       modified to be fixed (in case one or both jobs are not a required part of the transaction). In the latter case, the job that is not required will be removed, or in case both are not
       required, the unit that conflicts will be started and the unit that is conflicted is stopped.

You cannot start conflicting services at the same time, whereas you can start services in the same mutex, they will just be queued and actually started one at a time. This is what the nixos test tests.


RandomizedDelaySec= in the relevant .timer files set to a big value might also help you.

It does not help if it is really true that lego corrupts its state when two renewal are started at the same time. Especially because nixos-rebuild tends to start them all at the same time.


Doesn't this seem like a shortcoming of upstream systemd if the software can't do this? The NixOS community has had success in the recent past asking upstream for features.

This is a valid point. Personally I'm not interested in doing this work, but if it is estimated that this is the better way to go, I'll close this.

@aanderse
Copy link
Member

aanderse commented Nov 5, 2020

@symphorien understandable if you're not interested in hacking on low level systemd C code, but would you be willing to raise the issue upstream? If upstream doesn't have a solid solution it may prompt them to design one. Showing use cases and being able to work through the problems might be very useful in upstream creating a solution.

@symphorien
Copy link
Member Author

I opened an issue upstream. Let's see what they think.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/lets-encrypt-on-20-09/9950/3

@stale
Copy link

stale bot commented Jun 4, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 4, 2021
@lheckemann
Copy link
Member

Link for completeness: systemd/systemd#17546

@symphorien symphorien added 2.status: wait-for-upstream Waiting for upstream fix (or their other action). and removed 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md labels Jun 9, 2021
@symphorien
Copy link
Member Author

This actually only works for Type=oneshot services.

@symphorien symphorien closed this Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ACME fails with JWS verification error
6 participants