Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes: 1.15.4 -> 1.16.3 #70905

Merged
merged 2 commits into from Nov 15, 2019
Merged

kubernetes: 1.15.4 -> 1.16.3 #70905

merged 2 commits into from Nov 15, 2019

Conversation

saschagrunert
Copy link
Member

@saschagrunert saschagrunert commented Oct 10, 2019

Motivation for this change

Updating kubernetes to the latest release.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @johanot

@saschagrunert
Copy link
Member Author

@GrahamcOfBorg build kubernetes

@saschagrunert
Copy link
Member Author

@johanot this is my first time nix(os) kubernetes update, do you know if I have to adapt something else? I would not expect any breaking change from the current deployment (last famous words).

@srhb
Copy link
Contributor

srhb commented Oct 10, 2019

@saschagrunert Thank you!

Looks like this breaks something in the CNI components:

machine2# [  406.457500] kubelet[1126]: W1010 11:36:36.841528    1126 cni.go:202] Error validating CNI config &{mynet  false [0xc000fb2740] [123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 34 44 34 110 97 109 101 34 58 34 109 121 110 101 116 34 44 34 112 108 117 103 105 110 115 34 58 91 123 34 100 101 108 101 103 97 116 101 34 58 123 34 98 114 105 100 103 101 34 58 34 100 111 99 107 101 114 48 34 44 34 105 115 68 101 102 97 117 108 116 71 97 116 101 119 97 121 34 58 116 114 117 101 125 44 34 110 97 109 101 34 58 34 109 121 110 101 116 34 44 34 116 121 112 101 34 58 34 102 108 97 110 110 101 108 34 125 93 125]}: [plugin flannel does not support config version ""]

Maybe it's just related to the empty version string, but at least we know that this is not a drop-in PR.

You can run the tests yourself with nix-build nixos/tests/kubernetes -A dns -A rbac in order to debug the problem further. 😄

@saschagrunert
Copy link
Member Author

Maybe it's just related to the empty version string, but at least we know that this is not a drop-in PR.

Yes, flannel had issues recently with missing CNI versions in the configuration, I added it as a default to the kubelet, let's see if the tests are passing now. 🤞

@saschagrunert
Copy link
Member Author

@GrahamcOfBorg build kubernetes

@saschagrunert
Copy link
Member Author

@GrahamcOfBorg test kubernetes

@johanot
Copy link
Contributor

johanot commented Oct 10, 2019

@saschagrunert The tests definitely won't run on ofborg, unfortunately. :) They're too memory-hungry.

@saschagrunert
Copy link
Member Author

@saschagrunert The tests definitely won't run on ofborg, unfortunately. :) They're too memory-hungry.

Ah okay thanks for the hint :)

@saschagrunert
Copy link
Member Author

saschagrunert commented Oct 10, 2019

On my machine it looks like that the error is gone, can someone verify (or not) this please? 🙃

@srhb
Copy link
Contributor

srhb commented Oct 10, 2019

I'll give it a spin tomorrow! :)

@srhb
Copy link
Contributor

srhb commented Oct 13, 2019

The specific error is gone, but it looks like the tests are still broken. Did you successfully run them, or did you only look into the version string? :)

@srhb
Copy link
Contributor

srhb commented Oct 13, 2019

Looks like something changed with respect to how the kubelet looks itself up -- it now tries to look up the unqualified hostname which fails, because the nodename is FQDN. Maybe we changed something in the hostname infrastructure in NixOS. Investigating.

@saschagrunert
Copy link
Member Author

I only checked if the error message disappeared, since I was not sure how long the tests should run on my machine... It might be that we have to override the host names on the Kubelet and the proxy.

@srhb
Copy link
Contributor

srhb commented Oct 13, 2019 via email

@srhb
Copy link
Contributor

srhb commented Oct 13, 2019

OK, I think what's going on is this:

https://github.com/kubernetes/kubernetes/pull/77167/files#diff-934760ea1a4e195e49bcd10625cf4e46R146-R148

So, the previous behaviour appears to have been to fallback to listen on 127.0.0.1 when the node cannot be looked up in the API by hostname. Now, our mismatch between short/fqdn hostname (in the proxy/apiserver sides respectively) will cause the proxy to fail to start. This is a misconfiguration in our test suite -- the proxy should be set up to query for the fqdn of the node instead of the shortname, since that's how our kubelets are set up to register.

@saschagrunert
Copy link
Member Author

OK, I think what's going on is this:

https://github.com/kubernetes/kubernetes/pull/77167/files#diff-934760ea1a4e195e49bcd10625cf4e46R146-R148

So, the previous behaviour appears to have been to fallback to listen on 127.0.0.1 when the node cannot be looked up in the API by hostname. Now, our mismatch between short/fqdn hostname (in the proxy/apiserver sides respectively) will cause the proxy to fail to start. This is a misconfiguration in our test suite -- the proxy should be set up to query for the fqdn of the node instead of the shortname, since that's how our kubelets are set up to register.

I would generally add this hostname option to the kube-proxy, since I think this can also fail in other usage scenarios. WDYT? I added it on top of another commit that I can easily revert it.

@saschagrunert
Copy link
Member Author

It might be possible that this:

services.kubernetes.kubelet.hostname = with config.networking;
mkDefault (hostName + optionalString (domain != null) ".${domain}");

has to be cfg.clusterDomain instead of just domain, right?

@srhb
Copy link
Contributor

srhb commented Oct 14, 2019

[ ... ] has to be cfg.clusterDomain instead of just domain, right?

I think domain is actually correct right now, though either could work. In fact, things would be easier if we weren't using the FQDN at all right now, because of how hostname lookups work in NixOS... That's something to keep in mind for the new module.

I would generally add this hostname option to the kube-proxy, since I think this can also fail in other usage scenarios. WDYT? I added it on top of another commit that I can easily revert it.

I think this may be the best way to go right now, though I'm sad about adding yet another ad-hoc option as needs arise. We really should generalize all those command lines in time, so that module updates aren't as necessary again. Also something to keep in mind. 😄

@johanot care to weigh in on that addition?

@srhb
Copy link
Contributor

srhb commented Oct 14, 2019

With these changes and appropriate reconfiguration of the test suite, I think we're almost there. The coredns deployment needs to be updated to take into account the deprecated beta APIs, but it still doesn't come up.

@srhb
Copy link
Contributor

srhb commented Oct 14, 2019

[...] but it still doesn't come up.

Disregard that. When I spell "apps/v1" correctly, it does come up. 😊

@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.1 kubernetes: 1.15.4 -> 1.16.2 Oct 16, 2019
@saschagrunert
Copy link
Member Author

saschagrunert commented Oct 16, 2019

FYI, 1.16.2 has arrived (https://github.com/kubernetes/kubernetes/releases/tag/v1.16.2). The changelog is not really useful, but.. afaik, most notably: CVE-2019-11253.

Yes, I updated it to 1.16.2 and squashed my commits together.

@saschagrunert
Copy link
Member Author

Can someone please assist me and give it another test run? I'm not sure how to run these tests on my AWS nix machine :-/

@srhb
Copy link
Contributor

srhb commented Nov 7, 2019 via email

@srhb
Copy link
Contributor

srhb commented Nov 7, 2019

So the problem with the node name mismatch is still there -- the test case will need something like services.kubernetes.proxy.hostname = machineName + "." + domain; -- and a release note to go with it in order to explain that people upgrading to 20.03 will need to take care to ensure that this change doesn't break their setup.

I think everything else looks okay.

You should be able to run the test commands I gave you on any non-virtualized Linux with Nix where you have access to virtualization. Let me know if you need help with that. For virtualized Linuxen, I suspect you don't have the necessary hardware access.

@saschagrunert
Copy link
Member Author

Got the tests to run locally and both (rbac, dns) succeeded. PTAL :)

@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.2 kubernetes: 1.15.4 -> 1.16.3 Nov 14, 2019
@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.3 kubernetes: 1.15.4 -> 1.16.4 Nov 14, 2019
@saschagrunert saschagrunert changed the title kubernetes: 1.15.4 -> 1.16.4 kubernetes: 1.15.4 -> 1.16.3 Nov 14, 2019
@johanot johanot mentioned this pull request Nov 14, 2019
10 tasks
@srhb
Copy link
Contributor

srhb commented Nov 14, 2019

I'm OK with this now. Since we're just adding an option I think we don't need to add any release notes currently, and hopefully we'll have some more to report on the kubernetes module come 20.03 anyway. :)

@srhb
Copy link
Contributor

srhb commented Nov 14, 2019

(Just running the tests a final time, then merging)

@srhb
Copy link
Contributor

srhb commented Nov 14, 2019

Oops, I just realized the commit message is out of date. Would you mind fixing it? And while you're at it, I prefer splitting the module and test change into a separate commit prefixed "nixos/kubernetes: ..."

Last step, promise! 😁

@saschagrunert
Copy link
Member Author

Oops, I just realized the commit message is out of date. Would you mind fixing it? And while you're at it, I prefer splitting the module and test change into a separate commit prefixed "nixos/kubernetes: ..."

Last step, promise!

Hm, like this? 🙃

@srhb
Copy link
Contributor

srhb commented Nov 14, 2019

Sorry, no, I meant keep the package stuff as separate as possible from any module/test stuff. :) Like so:

srhb@6c22549
srhb@a8d4321

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
@saschagrunert
Copy link
Member Author

Sure, I changed it as suggested. 👍

@srhb
Copy link
Contributor

srhb commented Nov 15, 2019

Reran the tests for good measure, all looks good, thank you for sticking with this. :)

@srhb srhb merged commit b3a4f74 into NixOS:master Nov 15, 2019
@saschagrunert saschagrunert deleted the kubernetes-1.16 branch November 15, 2019 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants