Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes: 1.14.3 -> 1.15.3 (with revert of module systemd dependencies) #67563

Merged
merged 3 commits into from Sep 5, 2019

Conversation

johanot
Copy link
Contributor

@johanot johanot commented Aug 27, 2019

Motivation for this change
  1. To get kubernetes bumped to v1.15 before 19.09 freeze.
  2. To make the NixOS module work, be compatible with 1.15 and make current rbac and dns tests pass.

Note: This PR reverts changes introduced in #56789.

While I think it is a good idea to use systemd to sync startup of components, it has unfortunately introduced some problems in practice (e.g. #60687, #61135).
I actually have a branch here, on which I tried making the upgrade and tests work with the current module: https://github.com/johanot/nixpkgs/commits/kubernetes-1.15.2 .. Unfortunately, there are still race condition issues causing systemd restart-limits to be breached and Flannel vxlan connections be broken, among other things.

After spending a lot of time thinking about and working on the current implementation, considering the trade-offs, I ultimately think it's better to let eventual consistency rule and let components restart until the cluster is fully healthy. While it slows down startup time and causes more restarts, it reduces code complexity and untangles the systemd dependency web - the latter being invaluable in debugging situations.

@calbrecht I'm sorry I didn't state this argument explicitly before now. Are you ok with a revert at this time?

19.09 vs. 20.03

I know this PR might conflict with #66496. I just want to make sure that we get something working in 19.09. And then, when we hit October or something like that, let's discuss the real future for kubernetes in nixpkgs.

cc @offlinehacker @srhb

Things done

RBAC and DNS tests run clean locally.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@srhb
Copy link
Contributor

srhb commented Aug 27, 2019

I am in favor of this. I was also surprised with the difficulties in upgrading because of the systemd dependencies, even though I used to think they were a good idea. I guess we learned something.

And yes, a radical redesign of the k8s infrastructure before freeze is not really viable, but I agree with an extremely simplified module being the way forward afterwards.

@srhb
Copy link
Contributor

srhb commented Sep 2, 2019

@globin In lieu of @calbrecht's feedback, how do you feel about the module dependency reverts? I tried to catch you on -dev the other day to discuss it, but I have the feeling you're pretty busy 😁

If no one voices any serious concerns about this, I'm inclined to merge.

@srhb srhb force-pushed the kubernetes-1.15-withmodulerevert branch from 50105df to d891283 Compare September 4, 2019 15:53
@srhb
Copy link
Contributor

srhb commented Sep 4, 2019

Rebased to fix the pname conflict.

@arianvp
Copy link
Member

arianvp commented Sep 4, 2019

Yep I'm also in favor of doing this. It will make startups more noisy for sure but at least it will work reliably. We can then slowly contemplate maybe introducing some of these ordering constraints again in some shape or another.

unitConfig.ConditionPathExists = kubeletPaths;
};

systemd.paths.kubelet = {
Copy link
Member

@arianvp arianvp Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this didn't do what we probably wanted in the first place.

This systemctl start kubelet.service any time either PathExists= or PathChanged= is matched. It does not create a unit that is active for the lifetime of the kubeletPaths file. Think of .path unit like .timer units. A .timer unit isn't active for the lifetime for when the Calendar clause is active, it is always active and gets pulled in by timers.target It merely activates another .service unit when the Calendar clause is matched. Similarily, a .path unit is part of paths.target and is always active and then Starts a .service whenever some conditions changes. It's a mistake I have made a lot too when first looking at .path units.

Another example is a .socket unit. a .socket unit is always active (gets pulled in by sockets.target) but starts a .service whenever activity on a socket occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't even noticed that. Definitely something to keep in mind. That said, I'm not sure we can come up with an ordering scheme that's reasonably simple and still works in disparate environments, with all the combinations of daemons we can think of. We'll see. :)

@srhb srhb merged commit 11e72e5 into NixOS:master Sep 5, 2019
@johanot johanot deleted the kubernetes-1.15-withmodulerevert branch September 5, 2019 06:16
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/kubernetes-the-nixos-module-of-the-future/3922/1

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/kubernetes-using-multiple-nodes-with-latest-unstable/3936/3

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/kubernetes-network-malfunction-after-upgrading-to-19-09/4620/1

dtzWill pushed a commit to dtzWill/nixpkgs that referenced this pull request Jan 27, 2020
…factor"

This reverts commit 7dc6e77, reversing
changes made to bce47ea.

Motivation for the revert in NixOS#67563

(cherry picked from commit 00975b5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants