New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPv6 broken again #93
Comments
It will keep breaking every time kubernetes upgrades. Until DigitalOcean support IPv6 on their LoadBalancer, this will be the game we are playing :( Sorry about that .. Fixed now, should be up and running again! |
Is there a manual change needed, or would a simple cron restart help? |
As you might imagine, if a cron restart would fix it, it would already been there ;) DigitalOcean LoadBalancer doesn't support IPv6, so there is a Droplet on an IPv6 acting as one. It redirects traffic on port 80/443 to the kubernetes cluster, just like the DO LB would do. But for this it needs to know the internal IPs of all the kubernetes nodes. Sadly, these change every time you upgrade a node (as the old one is killed, and a new one is created). You can make kubernetes call some API when ever a new node appears, and this might be a solution, but it requires some nifty scripting on the IPv6 Droplet. So far I haven't found a clean way that doesn't involve me writing a custom API handler :D Updates are rare and far apart, but possibly I could get something in there that informs me when ever it lost connection to a kubernetes node .. and email me or something. Will look into that. |
Might as well reuse the last issue. It's gone sad again :( |
Though maybe this is a slightly different issue - the website itself is still fine on IPv6, but downloads (via proxy) are broken |
One of the two nodes we run on is down (for some reason). The balancer should switch to the other node, but clearly it is not. Cycling the node now should fix the current issue at least, and I will investigate why it didn't fail over to the other node. |
Problem still isn't really resolved. One of the two nodes is failing 50% of the healthchecks. Logs indicate nothing why this is happening. IPv4 traffic is now routed via the working node. IPv6 is still a bit touch&go. Going to disable IPv6 for a bit to get some further information on the issue. Should be back within 10 minutes or so. Edit: and it is enabled again; still with degraded performance |
Further investigation shows that it is something deep in kubernetes. We were running on two nodes; now on three. The third node shows exactly the same issue as the second: kube-proxy is dropping connections randomly. The first however is working just fine. I tried various of things, but nothing seems to change the situation. I now degraded both LoadBalancer (IPv4 and IPv6) to only use the first node for kube-proxy for now. This seems to be working and stable. I reached out to DigitalOcean, see if they know what is going on. If I cannot find a solution, I will upgrade kubernetes to 1.14 next week, in the hope that fixes it. To be continued! |
More updates, as updates are fun: Turns out the CNI (flannel) lost its ways. 1 node has a subnet the others don't know about, and the others have one nobody knows about. So any traffic that needs to go to the first node from any other will get lost. Luckily the first node does have full view of everything, and this is also why the traffic currently is arriving where it should. I updated my ticket with DigitalOcean. I am going to give them some time to look into this too, as it is a problem of the managed service they deliver (and not my/our mistake, basically). Hopefully they know how to resolve this cleanly, and otherwise I will be rebuilding the cluster (only takes ~15 minutes, so that is okay). At least I now understand the issue. Just no clue how/what caused it. Hopefully DO can answer that. |
IPv6 broke again. Please come back 馃樋
The text was updated successfully, but these errors were encountered: