nixos/haproxy: add reloading support, use upstream service hardening #88434

pstch · 2020-05-20T13:51:01Z

Motivation for this change

The current systemd service for haproxy doesn't support reloading. Reloading is useful for haproxy as it allows changing the configuration or updating the SSL certificates without losing the reverse proxy state (open connections, backend status, etc), and without missing requests during the restart time. Additionally, upstream recommends several configuration changes to the systemd service, mostly related to hardening, that are not used in the current systemd service.

Things done

Possible improvements/changes

Changing back Restart to on-failure, to preserve existing semantics
Not enabling reloadIfChanged by default, to preserve existing semantics
The reload test may be a bit fragile currently, I can try to improve it if needed.

aanderse · 2020-05-22T08:36:42Z

nixos/modules/services/networking/haproxy.nix

@@ -60,15 +60,31 @@ with lib;
      description = "HAProxy";
      after = [ "network.target" ];
      wantedBy = [ "multi-user.target" ];
+      reloadIfChanged = true;


I'm not sure you actual want this... I'm under the impression if a CVE were reported and fixed then you updated your OS the service would continue to run with the existing vulnerable binary. A manual stop and start would be required to get the newly patched binary.

The reload mechanism implemented here actually starts a completely new process, and kills the old one. The transition is implemented through a socket which forwards the sockets between the two processes. There is no shared memory between the two processes, only a few environment variables are passed from the old master process to the new binaries.

I have manually tested that the old binary is gone after reloading, and I can try to write a NixOS test for that if it is necessary (comparing binaries before/after reloading).

@pstch thanks for mentioning that. I'll have to review some scripts as I was under a different impression.

pstch · 2020-05-22T10:07:14Z

nixos/tests/haproxy.nix

+
+    with subtest("seamless reload"):
+        machine.systemctl("start haproxy-reloader")
+        machine.succeed("echo http://localhost:80/index.txt | http-getter -i1 -n8 -l1")


I am not sure this test is solid enough. Depending on the test machine's performance, the reloading might not complete before http-getter stops sending requests, so it may not even be tested. If necessary I can write a proper multi-threaded test that ensures that we keep sending requests until haproxy has finished reloading, and that HTTP connections opened before the reload are still open afterwards.

https://www.haproxy.com/de/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ mentions a quite sophisticated load generator setup to reproduce these issues, and only on a very small percentage.

I doubt this will uncover these issues - maybe just change the test to do a simple reload, and ensure it still replies afterwards.

Right, I will change the test to ensure that it's still replying afterwards.

flokli · 2020-05-23T09:29:35Z

nixos/modules/services/networking/haproxy.nix

+        # support seamless reload
+        ExecReload = [
+          "${pkgs.haproxy}/sbin/haproxy -c -f ${haproxyCfg}"
+          "${pkgs.coreutils}/bin/kill -USR2 $MAINPID"


How does the reload behaviour work?

Does the "${pkgs.haproxy}/sbin/haproxy -c -f ${haproxyCfg}" exit? Or will it become the new "main process"?

What is $MAINPID pointing to? Will it change after a reload?

Yes, "${pkgs.haproxy}/sbin/haproxy -c -f ${haproxyCfg}" exits, it is just used to check the configuration file.

$MAINPID never changes, as the master process re-executes itself when receiving $USR2.

You're right that I missed something in the reload behaviour. The master process uses exec(argv[0]) to re-execute itself, but that won't work as-is because the binary's path will change on updates.

I've updated the PR to make the main process execute from a symlink to the real binary (in /run/haproxy/haproxy).

Hrm, I'm not sure if this brings us any further to the goal of only restarting where necessary.

I spend quite some thoughts earlier on how to handle reloads and restarts, and reloadIfChanged currently doesn't really work the way it should. I just wrote down in more detail an idea on how to fix this properly in #49528 (comment).

After something like this has landed, we could probably just make use of haproxy's "bind to a specific fd" functionality to provide seamless reloads: https://news.ycombinator.com/item?id=8004153

Changes of the underlying binary, change of sandboxing or other systemd options should basically always restart the service, as there's no way to apply these to an already running process.

In haproxy's case, changing the underlying binary should not restart the service, since haproxy can change it itself while preserving the connections using expose-fd which is already in use in this NixOS module. The last change I made (using a symlink in /run/haproxy/haproxy to start haproxy) allows this to work properly.

It's important to avoid haproxy restarts as much as we can, because in production this can cause significant disruption (losing all established connections), so if the changes you are proposing in #49528 are implemented, we need to expose a setting allowing haproxy to be reloaded even if the binary is updated.

After something like this has landed, we could probably just make use of haproxy's "bind to a specific fd" functionality to provide seamless reloads: https://news.ycombinator.com/item?id=8004153

This is already enabled in this NixOS module (expose-fds is set globally), but it wasn't used before this PR since it only works with Reload : if the old process is killed before the new one is started, the FDs cannot be forwarded.

Indeed, reloadIfChanged comes with the problem that sandboxing options are never updated, and that's a problem that should be fixed.

haproxy goes further than what the linked comment proposes - it doesn't yet talk about applications that support re-executing themselves with a new binary.

If we add support for something like this, generating the /run/haproxy/haproxy symlink in ExecStartPre easily won't work.
The line in ExecStartPre will change on a new haproxy binary (so the logic would normally decide to restart), and even if we'd add a special case for that, the symlink won't be updated if you systemctl reload after a new binary has been deployed.

If we do necessary changes in the module, can we make use of systemd sockets for seamless restarts, of haproxy.service, by making use of its expose-fds haproxy functionality?

[...] won't be updated if you systemctl reload after a new binary has been deployed.

Ah yes, but that's just because I forgot to recreate the symlink in Reload. I corrected this in the last push.

If we do necessary changes in the module, can we make use of systemd sockets for seamless restarts, of haproxy.service, by making use of its expose-fds haproxy functionality?

We are already using expose-fds in this PR, it's used to transfer the socket between the old and new binary. Using systemd sockets should be possible, but I don't think that it is a good solution, as it would force the user to configure the bound addresses outside of haproxy's configuration. It also breaks the possibility for the user to bind to additional addresses using haproxy's CLI, something that is very important in HA scenarios.

In my opinion, the current form of this PR is the best way to allow seamless reloads. Of course, if the changes you propose in #49258 are implemented, we would need some way to indicate that the service should not be restarted even if the package is updated.

EDIT: If you like it better, I can drop the reloadIfChanged option from this PR. I think the rest of the work done to allow seamless reloads should still be included so that users can enable it themselves, although I also think that this should be the default behaviour : not being able to apply new systemd sandboxing options is much less of a problem than losing connections when restarting.

Yes, please remove the reloadIfChanged for now, as it's not always working.

Please also add a small comment next to where we create the binary symlink about haproxy using exec(argv[0]) to re-execute itself, so people understand why we create this symlink.

I assume without reloadIfChanged, our activation script will still just systemctl restart haproxy.service, right?

Ok, I will do that, and add a comment for the exec(argv[0]) code.

I assume without reloadIfChanged, our activation script will still just systemctl restart haproxy.service, right?

Yes.

pstch · 2020-05-23T16:24:53Z

Updated the PR to make the main process execute from a symlink to the binary (in /run/haproxy/haproxy), so that it picks up new versions.

pstch · 2020-05-23T18:10:49Z

Updated the PR to add an indirection for the configuration file, so that it's picked up when the master process reloads.

pstch · 2020-05-25T21:58:51Z

@flokli I force-pushed a commit with the requested changes:

removing reloadIfChanged
adding a comment to justify the symlink
changing the test to only verify that the proxy is still answering after a reload

I had to add a delay in the reload test, to ensure that the test request is handled by the new workers (haproxy took ~300ms to reload on my machine).

nixos/modules/services/networking/haproxy.nix

nixos/tests/haproxy.nix

flokli · 2020-05-29T12:34:58Z

@talyz, @peterhoeg, would you mind taking another look at this?

peterhoeg · 2020-05-29T15:13:14Z

Have you tried reload with an invalid config file? Does it leave the "old" instance running?

pstch · 2020-05-29T15:35:59Z

@peterhoeg Yes. I had added a check of the configuration in ExecReload, but it seems I accidentally dropped it. I updated the PR to add it again, now if the configuration file is invalid the old instance will keep running, but reloading will return a non-zero exit code.

flokli

I assume we want to do the config test with the (possibly new) haproxy binary, but we only want to flip the symlink if the reload was successful - otherwise, this could mess up manual reloads.

nixos/modules/services/networking/haproxy.nix

Refactor the systemd service definition for the haproxy reverse proxy, using the upstream systemd service definition. This allows the service to be reloaded on changes, preserving existing server state, and adds some hardening options.

flokli · 2020-05-31T21:11:40Z

Thanks!

ofborg bot added 6.topic: nixos 8.has: module (update) 10.rebuild-darwin: 0 10.rebuild-linux: 0 labels May 20, 2020

aanderse reviewed May 22, 2020

View reviewed changes

pstch commented May 22, 2020

View reviewed changes

flokli reviewed May 23, 2020

View reviewed changes

pstch force-pushed the patch-2 branch from 747a5db to fac0130 Compare May 23, 2020 16:21

pstch force-pushed the patch-2 branch from fac0130 to 2bf30d0 Compare May 23, 2020 18:08

pstch force-pushed the patch-2 branch 5 times, most recently from b13aaf8 to 83fb6a4 Compare May 25, 2020 21:54

flokli reviewed May 26, 2020

View reviewed changes

nixos/modules/services/networking/haproxy.nix Outdated Show resolved Hide resolved

nixos/tests/haproxy.nix Outdated Show resolved Hide resolved

pstch force-pushed the patch-2 branch 2 times, most recently from 3dcfae8 to e0423ff Compare May 27, 2020 09:54

aanderse mentioned this pull request May 29, 2020

nixos/haproxy: Implement hitless reloads #89012

Closed

10 tasks

pstch force-pushed the patch-2 branch from e0423ff to 87b4813 Compare May 29, 2020 15:35

flokli requested changes May 31, 2020

View reviewed changes

nixos/modules/services/networking/haproxy.nix Outdated Show resolved Hide resolved

pstch force-pushed the patch-2 branch from 87b4813 to c784d3a Compare May 31, 2020 20:35

flokli approved these changes May 31, 2020

View reviewed changes

flokli merged commit 09a7612 into NixOS:master May 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nixos/haproxy: add reloading support, use upstream service hardening #88434

nixos/haproxy: add reloading support, use upstream service hardening #88434

pstch commented May 20, 2020

aanderse May 22, 2020

pstch May 22, 2020 •

edited

aanderse May 23, 2020

pstch May 22, 2020

flokli May 24, 2020

pstch May 24, 2020

flokli May 23, 2020

pstch May 23, 2020 •

edited

flokli May 23, 2020

pstch May 23, 2020

flokli May 23, 2020

pstch May 23, 2020 •

edited

flokli May 24, 2020

pstch May 24, 2020 •

edited

pstch commented May 23, 2020

pstch commented May 23, 2020

pstch commented May 25, 2020

flokli commented May 29, 2020

peterhoeg commented May 29, 2020

pstch commented May 29, 2020 •

edited

flokli left a comment

flokli commented May 31, 2020

Navigation Menu

nixos/haproxy: add reloading support, use upstream service hardening #88434

nixos/haproxy: add reloading support, use upstream service hardening #88434

Conversation

pstch commented May 20, 2020

Motivation for this change

Things done

Possible improvements/changes

Choose a reason for hiding this comment

pstch May 22, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstch May 23, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstch May 23, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstch May 24, 2020 • edited

Choose a reason for hiding this comment

pstch commented May 23, 2020

pstch commented May 23, 2020

pstch commented May 25, 2020

flokli commented May 29, 2020

peterhoeg commented May 29, 2020

pstch commented May 29, 2020 • edited

flokli left a comment

Choose a reason for hiding this comment

flokli commented May 31, 2020

pstch May 22, 2020 •

edited

pstch May 23, 2020 •

edited

pstch May 23, 2020 •

edited

pstch May 24, 2020 •

edited

pstch commented May 29, 2020 •

edited