New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/kubernetes: KubeDNS -> CoreDNS #49516
Conversation
@GrahamcOfBorg test kubernetes.rbac.singlenode kubernetes.dns.singlenode kubernetes.rbac.multinode kubernetes.dns.multinode |
No attempt on aarch64-linux (full log) The following builds were skipped because they don't evaluate on aarch64-linux: tests.kubernetes.rbac.singlenode, tests.kubernetes.dns.singlenode, tests.kubernetes.rbac.multinode, tests.kubernetes.dns.multinode Partial log (click to expand)
|
Unexpected error: unexpected build failure on x86_64-linux (full log) Attempted: tests.kubernetes.rbac.singlenode, tests.kubernetes.dns.singlenode, tests.kubernetes.rbac.multinode, tests.kubernetes.dns.multinode Partial log (click to expand)
|
@GrahamcOfBorg test kubernetes.dns.singlenode kubernetes.dns.multinode |
No attempt on aarch64-linux (full log) The following builds were skipped because they don't evaluate on aarch64-linux: tests.kubernetes.dns.singlenode, tests.kubernetes.dns.multinode Partial log (click to expand)
|
Success on x86_64-linux (full log) Attempted: tests.kubernetes.dns.singlenode, tests.kubernetes.dns.multinode Partial log (click to expand)
|
Hmm.. Tests continue to be somewhat unreliable on ofborg, but they run clean on my machine. |
mhm. If they don't run well on ofborg there is a good chance they also break horrible in hydra that is even more contended at times. Any chance to increase some timeouts or so? This is not meant as blocker of this pull request, but would make our ci more reliable in future. |
Regarding the tests: As far as I know, they are stable on Hydra (modulo the VM connect timeouts that we've been experiencing regularly across the board that are not kubernetes-related) The ofborg failures are caused by the much lower timeout, I believe. |
@johanot If you know of any kubernetes-on-NixOS users, can you ping them for review over the weekend? I'm happy to merge this on monday, but more feedback is nice. |
I would say @offlinehacker or @cstrahan? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted a few things related to the replicas setting. Other than that I trust the tests.
}; | ||
replicas = mkOption { | ||
description = "Number of DNS pod replicas to deploy in the cluster."; | ||
default = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens on a one-node cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream doesn't touch this: https://github.com/kubernetes/kubernetes/blob/v1.12.2/cluster/addons/dns/coredns/coredns.yaml.sed#L85-L88 instead it states it's handled at runtime by horizontal auto-scaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveej A one-node cluster will have 2 replicas deployed (by default), but both on the same node. I understand that scaling can be handled by horizontal auto-scalers, but 1) we don't deploy the horizontal auto-scaler with this module (as of now), and 2) I prefer multiple replicas not for scaling reasons, but for redundancy reasons. In our production cluster, we cannot tolerate DNS-service downtime due to the failure of a single container (or node). I can accept setting replicas to 1 by default, but in that case we really should set maxUnavailable: 0 to ensure that at least 1 DNS pod is always up during rolling-update of the DNS-deployment.
But that would still not ensure redundancy in case of node-failure. In our production cluster we have 12 nodes now, and we want to be able to guarantee zero downtime of the DNS-service, even in the case we have multiple nodes failing. We therefore currently set replicas = 4
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. My remarks have been resolved, not to say invalidated ;-)
cache 30 | ||
loop | ||
reload | ||
loadbalance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this depend on replicas > 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean loadbalance
? Afaik that's just round-robin DNS? https://github.com/coredns/coredns/tree/master/plugin/loadbalance
.. which.. Is a bit unnecessary in Kubernetes perhaps? Since L4-loadbalancing is implemented through services and L7 can be enabled through ingresses. But.. I guess it can't hurt to enable DNS "loadbalancing", even though I can't think of which queries would return multiple A-records in our cluster. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know what exactly that option was saying ;-) After learning what exactly it does I see it has no relation to load balancing across the replicas of the CoreDNS instances.
@GrahamcOfBorg test kubernetes |
OK, this was unnecessary. All changes have been already tested. |
Success on aarch64-linux Attempted: tests.kubernetes No partial log is available. |
Success on x86_64-linux Attempted: tests.kubernetes No partial log is available. |
Motivation for this change
In accordance with Kubernetes defaults as per >v1.11, I've updated the Kubernetes DNS addon to provide CoreDNS instead of Kube-DNS.
See: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#sig-cluster-lifecycle
The kubernetes specs for the CoreDNS components are heavily based on this template:
https://github.com/coredns/deployment/blob/master/kubernetes/coredns.yaml.sed
I've made an effort of preserving the current container-ports in use, for backward compatibility:
DNS: 10053
Health: 10054 (not exposed)
Metrics: 10055
Also in this PR is a release note section which should explain some gotchas and details about the DNS-addon migration.
Things done
CoreDNS has been deployed as presented in the PR to our staging cluster for 3 days now and so far testing looks good. We've had no issues.
sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)nix path-info -S
before and after)