New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes: 1.14.3 -> 1.15.3 (with revert of module systemd dependencies) #67563
Conversation
I am in favor of this. I was also surprised with the difficulties in upgrading because of the systemd dependencies, even though I used to think they were a good idea. I guess we learned something. And yes, a radical redesign of the k8s infrastructure before freeze is not really viable, but I agree with an extremely simplified module being the way forward afterwards. |
@globin In lieu of @calbrecht's feedback, how do you feel about the module dependency reverts? I tried to catch you on -dev the other day to discuss it, but I have the feeling you're pretty busy 😁 If no one voices any serious concerns about this, I'm inclined to merge. |
…factor" This reverts commit 7dc6e77, reversing changes made to bce47ea. Motivation for the revert in NixOS#67563
50105df
to
d891283
Compare
Rebased to fix the pname conflict. |
Yep I'm also in favor of doing this. It will make startups more noisy for sure but at least it will work reliably. We can then slowly contemplate maybe introducing some of these ordering constraints again in some shape or another. |
unitConfig.ConditionPathExists = kubeletPaths; | ||
}; | ||
|
||
systemd.paths.kubelet = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this didn't do what we probably wanted in the first place.
This systemctl start kubelet.service
any time either PathExists=
or PathChanged=
is matched. It does not create a unit that is active for the lifetime of the kubeletPaths
file. Think of .path
unit like .timer
units. A .timer
unit isn't active for the lifetime for when the Calendar clause is active, it is always active and gets pulled in by timers.target
It merely activates another .service
unit when the Calendar
clause is matched. Similarily, a .path
unit is part of paths.target
and is always active and then Starts a .service
whenever some conditions changes. It's a mistake I have made a lot too when first looking at .path
units.
Another example is a .socket
unit. a .socket
unit is always active (gets pulled in by sockets.target
) but starts a .service
whenever activity on a socket occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't even noticed that. Definitely something to keep in mind. That said, I'm not sure we can come up with an ordering scheme that's reasonably simple and still works in disparate environments, with all the combinations of daemons we can think of. We'll see. :)
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/kubernetes-the-nixos-module-of-the-future/3922/1 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/kubernetes-using-multiple-nodes-with-latest-unstable/3936/3 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/kubernetes-network-malfunction-after-upgrading-to-19-09/4620/1 |
…factor" This reverts commit 7dc6e77, reversing changes made to bce47ea. Motivation for the revert in NixOS#67563 (cherry picked from commit 00975b5)
Motivation for this change
Note: This PR reverts changes introduced in #56789.
While I think it is a good idea to use systemd to sync startup of components, it has unfortunately introduced some problems in practice (e.g. #60687, #61135).
I actually have a branch here, on which I tried making the upgrade and tests work with the current module: https://github.com/johanot/nixpkgs/commits/kubernetes-1.15.2 .. Unfortunately, there are still race condition issues causing systemd restart-limits to be breached and Flannel vxlan connections be broken, among other things.
After spending a lot of time thinking about and working on the current implementation, considering the trade-offs, I ultimately think it's better to let eventual consistency rule and let components restart until the cluster is fully healthy. While it slows down startup time and causes more restarts, it reduces code complexity and untangles the systemd dependency web - the latter being invaluable in debugging situations.
@calbrecht I'm sorry I didn't state this argument explicitly before now. Are you ok with a revert at this time?
19.09 vs. 20.03
I know this PR might conflict with #66496. I just want to make sure that we get something working in 19.09. And then, when we hit October or something like that, let's discuss the real future for kubernetes in nixpkgs.
cc @offlinehacker @srhb
Things done
RBAC and DNS tests run clean locally.
sandbox
innix.conf
on non-NixOS)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)