Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
nix/ssh-tunnel: Don't set addrgenmode to none
This partially reverts 9b382b1. It turns out that torvalds/linux@cc9da6c itself isn't the real issue, but just caused the real bug to surface. After hours of debugging together with Hannes Frederic Sowa, we finally found what's the culprit here, so the commit I'm partially reverting is merely a workaround which doesn't fix anything but just causes the bug to show up less frequently. As for the issue itself: OpenSSH polls the file descriptors of the tun devices and as soon as poll returns the descriptor as being writeable, the data coming from the SSH stream is written to the tun device. In this particular case the mentioned kernel commit causes router solicitations to be sent to the tun device, which in theory shouldn't be a problem either. However, if one tun device is up earlier than the counterpart, these solicitation packets are sent to the tun device and get forwarded over the SSH connection to the other peer, which then tries to write these packets into the tun device and fails because the interface is down. Let's illustrate this so it becomes clear: _ ____ .-------------. Channel A ___( )( )___ Channel B .--------------. | SSH Client | --------> (__ Internet ___) --------> | SSH Server | | /|\ | (____)(____) | | | | | | | \|/ | | Tun A (UP) | | Tun B (DOWN) | `-------------' `--------------' So we have tunnel device A here which is already up while the tunnel device B isn't yet. Tun A now gets a router solicitation packed which the SSH client sends over to the SSH server over Channel A and B. The SSH server writes the packet into Tun B, but it is down, so the write() fails with errno EIO (Input/output error) and that in turn causes the SSH server to shut down its channel, resulting in follow-up traffic to end up in the middle of nowhere. While trying to write a patch for OpenSSH to either avoid writing to the tun device when it's down or ignore EIO, I figured out that this really is a kernel bug and also a very longlasting one. Since torvalds/linux@1bd4978 the following code is at the start of tun_get_user(), which is called when user space tries to write() to the device: if (!(tun->dev->flags & IFF_UP)) return -EIO; Unfortunately tun_chr_poll() doesn't take this into account and designate the device as being writeable even if it's not UP, so the real fix here should indicate the file descriptor being able to write when it's really up. As an analogy, it's a bit like this: tun_chr_poll(): "Hey, you can write to the FD, go on :-)" openssh: write(...); tun_get_user(): "Haha, I lied, you can't!" In summary: Setting addrgenmode to none isn't addressing this at all and it only worked in my tests because the addrgenmode has been set quickly enough to not cause router solicitations to be sent. It's also a kernel problem, so it's not the resonsibility of NixOps to work around or try to fix it. The reason I'm only doing a partial revert is that using iproute2 instead of nettools makes the Nix expression more readable and along switching to iproute2 I've also chained the commands via && to catch failures instead of ignoring them. I also did more comprehensive tests during the last few days and I could confirm that the workaround didn't prevent the bug from happening at all but only caused it to be less frequent. Here is a VM test I wrote to confirm this: https://gist.github.com/aszlig/5e8c8462d17f787b7f171c34090fd590 So if all goes well, the real fix should land with the upcoming stable kernel updates. Thanks again to Hannes for help on debugging this properly plus getting this fixed soon in the kernel (I did a first patch, but that one wasn't sufficient enough and he's obviously the expert in this matter). Signed-off-by: aszlig <aszlig@redmoonstudios.org>
- Loading branch information
4bbceb2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great commit message 🍻