Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/system: socket unit restart logic #33661

Closed
wants to merge 1 commit into from

Conversation

griff
Copy link
Contributor

@griff griff commented Jan 9, 2018

Motivation for this change

When only a socket systemd unit is changed nothting is done to restart
the service. This adds logic to stop the service and dependent sockets
and start the sockets again.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

When only a socket systemd unit is changed nothting is done to restart
the service. This adds logic to stop the service and dependent sockets
and start the sockets again.
@dtzWill
Copy link
Member

dtzWill commented Jan 9, 2018

I'm not familiar enough to review this properly, but just a thought: is there a simple way to test/reproduce this behavior? Asking both for myself but also because it'd be nice to have it as a nixos test to avoid regressing. Thanks for tackling this!

@Mic92
Copy link
Member

Mic92 commented Jan 13, 2018

Testing update transition is currently not supported by our test framework.

@arianvp
Copy link
Member

arianvp commented Apr 23, 2019

In theory I think this is a sensible change . Also the choice to only stop the service unit makes sense. I have not looked deeply into the code itself yet because I'm a bit unfamiliar with the activation logic

@arianvp
Copy link
Member

arianvp commented Apr 23, 2019

Let's document very carefully how this behaves though. As currently restart logic is very fragile and underspecified in my opinion (see #49528)

@griff
Copy link
Contributor Author

griff commented Apr 23, 2019

@arianvp if we are changing the activation logic anyway we should also make it testable. I have a pretty simple idea for how this could be done.

We just need to add an option to the activation script that tells it to dry-run and output what it would have done in some data format we can read from perl and then expand nixos/tests/switch-test.nix with tests for the logic.

@domenkozar
Copy link
Member

We just merged #73871, which takes a bit simpler approach. It's the smaller into the same direction.

Feel free to improve upon :)

@arianvp
Copy link
Member

arianvp commented Feb 21, 2020

Having a look at this again.

When only a socket systemd unit is changed nothting is done to restart
the service. This adds logic to stop the service and dependent sockets
and start the sockets again.

Why would you want to restart the service if the service itself didn't change though?

@griff
Copy link
Contributor Author

griff commented Mar 3, 2020

@arianvp If you change the ports being listened on or the path of a unix socket and you use socket activation then only the .socket unit is changed and for the service to get the new socket it needs to be restarted.

@flokli
Copy link
Contributor

flokli commented Apr 9, 2020

@lheckemann how is this related to #73871 ?

@arianvp
Copy link
Member

arianvp commented Apr 9, 2020

I disagree this is desired behaviour in call cases and I think the desired effect can already be achieved without modifying our activation logic.

If you want the service to never be started without the socket also starting you should set Requires=mysocket.socket on your service..

If you want the service to restart everytime the socket restarts you should set PartOf=mysocket.socket on the service.

However interestingly this will mess with things
because it uses systemd directly to restart the unit which ignores any stopIfChanged settings we have in our activaiton logic. Hmpph (we really should get rid of this stopIfChanged stuff in my opinion)...

I'm now not sure what to do here

@lheckemann
Copy link
Member

@flokli This one is an earlier effort that fpletz and I missed when making that PR (which has now been reverted because it's broken).

@flokli
Copy link
Contributor

flokli commented Apr 9, 2020

@arianvp if we consider socket-activated services, and changing listen adresses, we'd need to restart changed .socket units on activation.

@arianvp
Copy link
Member

arianvp commented Apr 9, 2020

OH yeh i'm misreading the original PR sorry.

However I still think it doesn't work in all scenarios. Some services that are socket-activated are themselves still added to multi-user.target. This way the socket activaiton logic is only used to enable parallel start up, but not on-demand startup. This is a valid usecase that some of the sockets that systemd itself ships utilize. So we shouldn't always stop the service when the socket is restarted

This is also explained in this blog: http://0pointer.de/blog/projects/socket-activation.html

@stale
Copy link

stale bot commented Oct 7, 2020

Hello, I'm a bot and I thank you in the name of the community for your contributions.

Nixpkgs is a busy repository, and unfortunately sometimes PRs get left behind for too long. Nevertheless, we'd like to help committers reach the PRs that are still important. This PR has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

If this is still important to you and you'd like to remove the stale label, we ask that you leave a comment. Your comment can be as simple as "still important to me". But there's a bit more you can do:

If you received an approval by an unprivileged maintainer and you are just waiting for a merge, you can @ mention someone with merge permissions and ask them to help. You might be able to find someone relevant by using Git blame on the relevant files, or via GitHub's web interface. You can see if someone's a member of the nixpkgs-committers team, by hovering with the mouse over their username on the web interface, or by searching them directly on the list.

If your PR wasn't reviewed at all, it might help to find someone who's perhaps a user of the package or module you are changing, or alternatively, ask once more for a review by the maintainer of the package/module this is about. If you don't know any, you can use Git blame on the relevant files, or GitHub's web interface to find someone who touched the relevant files in the past.

If your PR has had reviews and nevertheless got stale, make sure you've responded to all of the reviewer's requests / questions. Usually when PR authors show responsibility and dedication, reviewers (privileged or not) show dedication as well. If you've pushed a change, it's possible the reviewer wasn't notified about your push via email, so you can always officially request them for a review, or just @ mention them and say you've addressed their comments.

Lastly, you can always ask for help at our Discourse Forum, or more specifically, at this thread or at #nixos' IRC channel.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 7, 2020
@lheckemann
Copy link
Member

Not fixed.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 7, 2020
@flokli
Copy link
Contributor

flokli commented Oct 11, 2020

I'm not sure I understand the description. But if only the .socket file is changed (for example due to another ListenStream= there, wouldn't it be sufficient to systemctl restart the .socket unit?

@lheckemann
Copy link
Member

systemctl restart nix-daemon → journal says

Oct 12 11:26:27 ordnungsamd systemd[1]: Closed Nix Daemon Socket.
Oct 12 11:26:27 ordnungsamd systemd[1]: Stopping Nix Daemon Socket.
Oct 12 11:26:27 ordnungsamd systemd[1]: nix-daemon.socket: Socket service nix-daemon.service already active, refusing.
Oct 12 11:26:27 ordnungsamd systemd[1]: Failed to listen on Nix Daemon Socket.

@flokli
Copy link
Contributor

flokli commented Oct 12, 2020

Ah, now this makes more sense. Could the comments in this PR be updated to provide some more insights here?

Also, this only seems to be necessary/working for sockets with a falsy Accept= setting (the default). For truthy Accept= .socket units, services are spawned for each connection (so simply restarting the .socket there should be sufficient)

@arianvp
Copy link
Member

arianvp commented Oct 12, 2020

This is desired behaviour. If the daemon already has an open socket; systemd will not start the .socket unit as it's not needed for activation at that point. The daemon is already succesfully listening.

If you want nix-daemon.service to restart whenever nix-daemon.socket restarts; such that it gets handed a socket FD with new settings from the nix-daemon.socket unit; you should add an explicit Requires=nix-daemon.socket to nix-daemon.service. However because nix-daemon.service works fine without socket activation; you don't need to add this, and the failure for the .socket to restart is acceptable I think.

this is documented in the man-page

No implicit WantedBy= or RequiredBy= dependency from the socket to the service is added. This means that the service may be started without the socket, in which case it must be able to open sockets by itself. To prevent this, an explicit Requires= dependency may be added.

I thus think failure to restart sockets should be silently ignored by the activation script. As it's by design.

Edit: It sounds the problem is similar to what is happening here: systemd/systemd#13271 systemd/systemd#8102

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-ready-for-review/3032/431

@stale
Copy link

stale bot commented Jul 8, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet