Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hetzner: Don't use localCommands, set interface device of default gateway #1032

Merged
merged 1 commit into from Dec 13, 2018

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Oct 28, 2018

Should help #833.

Original localCommands patch by aszlig.
Fixups and interface device stuff from me.

Note that this new way of doing it without localCommands is backwards
incompatible with NixOS < 18.03, see:
https://nixos.org/nixos/manual/release-notes.html#sec-release-18.03-notable-changes

As described on

https://wiki.hetzner.de/index.php/Netzkonfiguration_Debian/en#IPv4

Hetzner requires that even traffic to the local subnet goes via the gateway.

NixOps already had code for that (set in localCommands), but it wasn't enough:
The

ip route replace default  via "GATEWAY_IP"  proto static

generated by network-setup.service in NixOS's network-interface-scripted.nix
fails with

RTNETLINK answers: Network is unreachable

because all routes added so far had via "GATEWAY_IP", but
the kernel didn't know how to actually reach GATEWAY_IP because
there was no static route for that IP, see.

https://serverfault.com/questions/581159/unable-to-add-a-static-route-sioaddrt-network-is-unreachable/581162#581162

This commit adds an explicit static route to the default gateway, on the
relevant network device.

This allows us to generate the following instead:

ip route replace GATEWAY_IP dev THE_DEVICE proto static
ip route replace default  via "GATEWAY_IP" dev THE_DEVICE proto static

so that the kernel knows how to reach the gateway.

An example of what routes look like in ip route after deploying and 18.03
machine, where 1.2.3.4 is the server's IP and 1.2.3.1 is the gateway:

default via 1.2.3.1 dev eth0                         src 1.2.3.4 metric 202
1.2.3.0/27          dev eth0 proto kernel scope link src 1.2.3.4 metric 202
1.2.3.1             dev eth0 proto static scope link

For IPv6, the link-local fe80::1 is the gateway, as described on

https://wiki.hetzner.de/index.php/Netzkonfiguration_Debian/en#IPv6

@nh2
Copy link
Contributor Author

nh2 commented Oct 28, 2018

CC @basvandijk @aszlig @fpletz

@basvandijk
Copy link
Member

Good stuff!

@nh2
Copy link
Contributor Author

nh2 commented Oct 28, 2018

I see one last issue in journalctl when the server boots:

Oct 28 20:00:14 builder systemd[1]: Starting Address configuration of eth0...
Oct 28 20:00:14 builder network-addresses-eth0-start[910]: adding address 2a01:123:123:123::/64... done
Oct 28 20:00:14 builder network-addresses-eth0-start[910]: adding route 1.2.3.4/27... failed
Oct 28 20:00:14 builder systemd[1]: network-addresses-eth0.service: Main process exited, code=exited, status=1/FAILURE
Oct 28 20:00:14 builder systemd[1]: network-addresses-eth0.service: Failed with result 'exit-code'.
Oct 28 20:00:14 builder systemd[1]: Failed to start Address configuration of eth0.

@aszlig aszlig self-assigned this Oct 28, 2018
@@ -5,7 +5,6 @@ let

nixpart = python2Packages.nixpart0.override {
useNixUdev = false;
udevSoMajor = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's actually something I wanted to break up into a separate commit, but it was left in that stash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

gw_ip_cmd = default_gw_cmd + " | cut -d' ' -f1"
gw_dev_cmd = default_gw_cmd + " | cut -d' ' -f3"
gw_ip = self.run_command(gw_ip_cmd, capture_stdout=True).strip()
gw_dev = self.run_command(gw_dev_cmd, capture_stdout=True).strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would make sense to use only one single command using split instead of the cut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1,7 +1,7 @@
{ system, nixops }:

with import <nixpkgs/nixos/lib/testing.nix> { inherit system; };
with import <nixpkgs/nixos/lib/qemu-flags.nix>;
with import <nixpkgs/nixos/lib/qemu-flags.nix> { inherit pkgs; };
Copy link
Member

@aszlig aszlig Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (different commit)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@aszlig
Copy link
Member

aszlig commented Oct 28, 2018

@nh2: Did you test whether this works on a server with additional subnets?

@nh2
Copy link
Contributor Author

nh2 commented Oct 28, 2018

@aszlig I think the

    "interfaces": {
      "eth0": {
        "ipv4": {
          "routes": [
            {
              "via": "1.2.3.1",
              "prefixLength": 27,
              "address": "1.2.3.0"
            }
          ]
        },

that we're generating now is unnecessary.

It gets placed into network-addresses-eth0 and results in the adding route 1.2.3.4/27... failed from above with RTNETLINK answers: Network is unreachable:

[root@builder:~]# cat $(systemctl cat network-addresses-eth0.service | grep ExecStart | cut -d= -f2)
#! /nix/store/lw7xaqhakk0i1c631m3cvac3x4lc5gr5-bash-4.4-p12/bin/bash -e
state="/run/nixos/network/addresses/eth0"
mkdir -p $(dirname "$state")

echo "1111:1111:1111:1111::/64" >>  $state
echo -n "adding address 1111:1111:1111:1111::/64... "
if out=$(ip addr add "1111:1111:1111:1111::/64" dev "eth0" 2>&1); then
  echo "done"
elif ! echo "$out" | grep "File exists" >/dev/null 2>&1; then
  echo "failed: $out"
  exit 1
fi


state="/run/nixos/network/routes/eth0"
mkdir -p $(dirname "$state")

echo "1.2.3.0/27" >> $state
echo -n "adding route 1.2.3.0/27... "
if out=$(ip route add "1.2.3.0/27"  via "1.2.3.1" dev "eth0" proto static 2>&1); then
  echo "done"
elif ! echo "$out" | grep "File exists" >/dev/null 2>&1; then
  echo "failed: $out"
  exit 1
fi

This is because network-addresses-eth0.service runs before network-setup.service can set up the direct route to the gateway -- it does that already in https://github.com/NixOS/nixpkgs/blob/6f07d27625f28219095c273b97ded3cc416e9d46/nixos/modules/tasks/network-interfaces-scripted.nix#L120.

@aszlig
Copy link
Member

aszlig commented Oct 28, 2018

@rbvermaa, @edolstra: Do we care about supporting NixOS versions older than 18.03? Or what is the general policy on which NixOS versions we support?

@aszlig
Copy link
Member

aszlig commented Oct 28, 2018

@nh2: Also, what's missing here (and what I wanted to implement when testing this) is a migration function that migrates the old net_info into the new one. This of course would need additional testing.

@nh2
Copy link
Contributor Author

nh2 commented Oct 28, 2018

It gets placed into network-addresses-eth0 and results in the adding route 1.2.3.4/27... failed from above with RTNETLINK answers: Network is unreachable:

I've pushed a fix for that:

Only generate the route if it's not for the device that's the default gateway.

…eway. Should help #833.

Original localCommands patch by aszlig.
Fixups and interface device stuff from me.

Note that this new way of doing it without localCommands is backwards
incompatible with NixOS < 18.03, see:
https://nixos.org/nixos/manual/release-notes.html#sec-release-18.03-notable-changes

As described on

    https://wiki.hetzner.de/index.php/Netzkonfiguration_Debian/en#IPv4

Hetzner requires that even traffic to the local subnet goes via the gateway.

NixOps already had code for that (set in `localCommands`), but it wasn't enough:
The

    ip route replace default  via "GATEWAY_IP"  proto static

generated by `network-setup.service` in NixOS's `network-interface-scripted.nix`
fails with

    RTNETLINK answers: Network is unreachable

because all routes added so far had `via "GATEWAY_IP"`, but
the kernel didn't know how to actually reach GATEWAY_IP because
there was no static route for that IP, see.

    https://serverfault.com/questions/581159/unable-to-add-a-static-route-sioaddrt-network-is-unreachable/581162#581162

This commit adds an explicit static route to the default gateway, on the
relevant network device.

This allows us to generate the following instead:

    ip route replace GATEWAY_IP dev THE_DEVICE proto static
    ip route replace default  via "GATEWAY_IP" dev THE_DEVICE proto static

so that the kernel knows how to reach the gateway.

An example of what routes look like in `ip route` after deploying and 18.03
machine, where 1.2.3.4 is the server's IP and 1.2.3.1 is the gateway:

    default via 1.2.3.1 dev eth0                         src 1.2.3.4 metric 202
    1.2.3.0/27          dev eth0 proto kernel scope link src 1.2.3.4 metric 202
    1.2.3.1             dev eth0 proto static scope link

For IPv6, the link-local `fe80::1` is the gateway, as described on

    https://wiki.hetzner.de/index.php/Netzkonfiguration_Debian/en#IPv6

Signed-off-by: Niklas Hambüchen <mail@nh2.me>
@nh2 nh2 force-pushed the hetzner-no-local-commands-18.03 branch from 87f6171 to bb9fe12 Compare October 28, 2018 22:00
@nh2
Copy link
Contributor Author

nh2 commented Oct 29, 2018

Also, what's missing here (and what I wanted to implement when testing this) is a migration function that migrates the old net_info into the new one.

@aszlig I've written an ad-hoc one that goes via sqlite3 and jq that I used to migrate my servers.

Perhaps the logic in there is useful to some extent for making the proper migration script.

https://gist.github.com/nh2/7707e0069df1d4bebc6572c40c85b673

@flokli
Copy link
Contributor

flokli commented Dec 5, 2018

@nh2 about IPv4, can we get NixOps and NixOS networking to configure something like this instead?

ip a add $server_ipv4/32 peer $gateway_ipv4 dev $interface
ip r add default via $gateway_ipv4

We arent supposed to reach anything on the same L2 except the gateway anyhow, this avoids having to configure a separate route for whatever netmask might be set - I saw both /26 and /27 so far…

@nh2
Copy link
Contributor Author

nh2 commented Dec 5, 2018

@flokli In general I agree that there's no point not to set netmask /32 given what Hetzner writes.

I considered doing that, but then opted to first make the minimal change in this PR that fixes the issue.

I would recommend implementing the /32 chance as a separate PR on top of this, unless anybody can come up with whether /32 would make any problems.

@flokli
Copy link
Contributor

flokli commented Dec 5, 2018 via email

@nh2
Copy link
Contributor Author

nh2 commented Dec 5, 2018

@flokli This PR doesn't do migrations, I'd probably add that with a separate PR as well.

I care first about that new Hetzner deployments work, and second that old ones can be migrated.

I am hesitant to auto-migrate nixops Hetzner users from /27 to /32 because who knows what their network config is.

There must be some reason why Hetzner doens't just set it up as /32 -- perhaps it's for the case where you buy multiple IPs for multiple servers, and put a separate switch in between (they have a service for that)?

Again, I'm all for the /32 approach if nobody comes up with a counter case, but it would ideally not hold back this PR just so that we have some solution that works with NixOS out of the box without losing network connectivity now and then, as soon as possible.

@flokli
Copy link
Contributor

flokli commented Dec 6, 2018

I am hesitant to auto-migrate nixops Hetzner users from /27 to /32 because who knows what their network config is.

Same should apply for not auto-migrating existing deployments without asking back ;-)

There must be some reason why Hetzner doens't just set it up as /32 -- perhaps it's for the case where you buy multiple IPs for multiple servers, and put a separate switch in between (they have a service for that)?
I guess it a) grew historically b is easy to setup on some systems, and even when not adding the /26 or /27 route, most of the internet still works 😆

Additional IPv4 adresses are routed via the main IPv4 adress, so you can either add it with netmask /32 on any interface, or route it somewhere else internally (for example, as a peer route via a tunnel link)

Again, I'm all for the /32 approach if nobody comes up with a counter case, but it would ideally not hold back this PR just so that we have some solution that works with NixOS out of the box without losing network connectivity now and then, as soon as possible.

Fine, so let's merge this, and add some migration assistance and the /32 routing as two separate PRs.

@aszlig aszlig merged commit bb9fe12 into NixOS:master Dec 13, 2018
aszlig added a commit that referenced this pull request Dec 13, 2018
This is based on my initial (untested) patch[1] to get rid of the use of
networking.localCommands and instead uses the appropriate attributes in
networking.interfaces, networking.defaultGateway and
networking.defaultGateway6 instead. The reason why I used localCommands
in the first place was that we didn't have the necessary attributes in
NixOS at the time I was writing the implementation.

What is still missing is the migration from the old physical spec to the
new one, but we can still do that later.

The commit I'm merging here is this patch plus fixes and actual testing
by @nh2 (thank you!).

[1]: https://gist.github.com/aszlig/84e9bf4c8c2a06e77e43d64f89a5c2fc
@aszlig
Copy link
Member

aszlig commented Jan 4, 2019

@flokli: I tested this with a new server just now and routing to the same subnet doesn't work as it's missing the gateway. So I guess I'll add the /32 prefix to make sure that the machine always uses the gateway instead of sending directly, that way we shouldn't need the extra route.

@aszlig
Copy link
Member

aszlig commented Jan 4, 2019

@flokli: Actually, never mind about the doesn't work... because I've tested this against another host that was misconfigured.

aszlig added a commit to aszlig/nixops that referenced this pull request Jan 4, 2019
This was already suggested by @flokli in NixOS#1032 and it basically makes
sure that all traffic is sent to the gateway instead of being sent
directly.

The Hetzner Wiki at [1] describes configuration of Debian to include a
"peertopeer" directive in /etc/network/interfaces, which roughly
translates to the command "ip address add ... peer A.B.C.D".

However, after testing with and without the peer keyword of "ip address"
I didn't notice any difference in behaviour in comparison to setting a
plain /32 prefix length as we do now.

Apart from making configuration less complicated, this also gets rid of
a bunch of code we now no longer need, eg. calculating subnet masks or
getting the real prefix length.

Tested on a newly deployed PX61-NVMe.

[1]: https://wiki.hetzner.de/index.php/Netzkonfiguration_Debian/en#Dedicated_Servers

Signed-off-by: aszlig <aszlig@nix.build>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants