Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes: 1.7.9 -> 1.9.1 #33954

Merged
merged 4 commits into from Feb 16, 2018
Merged

kubernetes: 1.7.9 -> 1.9.1 #33954

merged 4 commits into from Feb 16, 2018

Conversation

kuznero
Copy link
Member

@kuznero kuznero commented Jan 16, 2018

Motivation for this change

Upgrade kubernetes to latest v1.9.1 (as well as kuvecfg to v0.6.0, as well as kubernetes-dashboard to v1.8.2). Related to #30639.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@srhb
Copy link
Contributor

srhb commented Jan 16, 2018

Nice!
Sorry I missed you in #nixos!

You can run the tests via:

nix-build nixos/tests/kubernetes -A rbac
nix-build nixos/tests/kubernetes -A dns

@kuznero
Copy link
Member Author

kuznero commented Jan 16, 2018

@srhb Excellent! Thanks, will try that!

@kuznero
Copy link
Member Author

kuznero commented Jan 16, 2018

Ran rbac test for a long time with the following failure in the end:

machine1# [ 1082.420971] kube-apiserver[1013]: I0116 21:27:26.547088    1013 pathrecorder.go:247] kube-aggregator: "/api/v1/resourcequotas" satisfied by prefix /api/
machine1# [ 1082.422157] kube-apiserver[1013]: I0116 21:27:26.547129    1013 handler.go:150] kube-apiserver: GET "/api/v1/resourcequotas" satisfied by gorestful with webserv1
machine1# [ 1082.423348] kube-apiserver[1013]: I0116 21:27:26.548427    1013 get.go:238] Starting watch for /api/v1/resourcequotas, rv=1 labels= fields= timeout=8m15s
machine1# [ 1082.424599] kube-controller-manager[1014]: I0116 21:27:26.548675    1014 round_trippers.go:436] GET https://api.my.zyx/api/v1/resourcequotas?resourceVersion=1&ts
error: action timed out after -1 seconds at /nix/store/lp80aincldbqcdfj2bxshw4ls314lymm-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
action timed out after -1 seconds at /nix/store/lp80aincldbqcdfj2bxshw4ls314lymm-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
cleaning up
killing machine1 (pid 593)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/tmp/nix-build-vm-test-run-kubernetes-rbac-singlenode.drv-0/vde1.ctl': Directory not empty
builder for ‘/nix/store/brq94jki7y0hnx88yimsyl6ycb4dpv6z-vm-test-run-kubernetes-rbac-singlenode.drv’ failed with exit code 255
error: build of ‘/nix/store/brq94jki7y0hnx88yimsyl6ycb4dpv6z-vm-test-run-kubernetes-rbac-singlenode.drv’ failed

And also dns took quite some time with the following failure in the end:

machine1# [ 1084.051487] kube-proxy[1048]: I0116 21:49:04.148737    1048 iptables.go:321] running iptables-save [-t filter]
machine1# [ 1084.056529] kube-proxy[1048]: I0116 21:49:04.172086    1048 iptables.go:321] running iptables-save [-t nat]
machine1# [ 1084.069982] kube-proxy[1048]: I0116 21:49:04.185701    1048 proxier.go:1664] Restoring iptables rules: *filter
machine1# [ 1084.071917] kube-proxy[1048]: :KUBE-SERVICES - [0:0]
machine1# [ 1084.073313] kube-proxy[1048]: :KUBE-FORWARD - [0:0]
machine1# [ 1084.074508] kube-proxy[1048]: -A KUBE-SERVICES -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp -p tcp -d 10.0.0.254/32 --dport 53 -jT
machine1# [ 1084.077681] kube-proxy[1048]: -A KUBE-SERVICES -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp -p udp -d 10.0.0.254/32 --dport 53 -j REJT
error: action timed out after -1 seconds at /nix/store/lp80aincldbqcdfj2bxshw4ls314lymm-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
action timed out after -1 seconds at /nix/store/lp80aincldbqcdfj2bxshw4ls314lymm-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
cleaning up
killing machine1 (pid 593)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/tmp/nix-build-vm-test-run-kubernetes-dns-singlenode.drv-0/vde1.ctl': Directory not empty
builder for ‘/nix/store/9sai7v94s418hnljn68a5ll3r4bnnqdf-vm-test-run-kubernetes-dns-singlenode.drv’ failed with exit code 255
error: build of ‘/nix/store/9sai7v94s418hnljn68a5ll3r4bnnqdf-vm-test-run-kubernetes-dns-singlenode.drv’ failed

Will be running same tests on master now to compare results.

P.S. I assume that test passes when exit code is 0.

@kuznero
Copy link
Member Author

kuznero commented Jan 16, 2018

rbac test on master gives no error - here is the end of the stdout:

machine1# [  666.179324] kubelet[1455]: I0116 22:26:31.283127    1455 generic.go:182] GenericPLEG: Relisting
machine1# [  666.184917] dockerd[901]: time="2018-01-16T22:26:31.319811155Z" level=warning msg="unknown container" container=907a9d0b7cc03c69f3840d141b72b2928757edb55b9631a6y
machine1# [  666.191117] kubelet[1455]: I0116 22:26:31.323034    1455 server.go:794] GET /cri/exec/shK4aP2A: (270.364529ms) hijacked [[kubectl/v1.7.9+7f63532e4ff4f (linux/am]
machine1: exit status 1
collecting coverage data
machine1: running command: test -e /sys/kernel/debug/gcov
machine1# [  666.193438] kube-apiserver[996]: E0116 22:26:31.324203     996 proxy.go:199] Error proxying data from client to backend: write tcp 192.168.1.1:56668->192.168.1.e
machine1: exit status 1
syncing
machine1: running command: sync
machine1# [  666.198385] kube-apiserver[996]: I0116 22:26:31.324481     996 wrap.go:42] POST /api/v1/namespaces/default/pods/kubectl/exec?command=kubectl&command=delete&comm]
machine1: exit status 0
test script finished in 667.18s
cleaning up
killing machine1 (pid 593)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/tmp/nix-build-vm-test-run-kubernetes-rbac-singlenode.drv-0/vde1.ctl': Directory not empty
/nix/store/pk67nb0fdqm027nf0dimraisb1vdmixn-vm-test-run-kubernetes-rbac-multinode
/nix/store/khz2sj56jcyq1aj50rwq5lapg02x4g7l-vm-test-run-kubernetes-rbac-singlenode

dns on master also gives no error - here is the end of stdout:

machine1# [  109.500882] kubelet[1476]: I0116 22:33:32.636713    1476 server.go:794] GET /cri/exec/Jf46yACz: (1.342959231s) hijacked [[kubectl/v1.7.9+7f63532e4ff4f (linux/am]
machine1# [  109.586333] kube-apiserver[1029]: I0116 22:33:32.722066    1029 wrap.go:42] POST /api/v1/namespaces/default/pods/probe/exec?command=%2Fbin%2Fhost&command=redis.]
machine1# [  109.626825] kube-apiserver[1029]: I0116 22:33:32.762798    1029 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (174.38194ms) ]
machine1: exit status 0
collecting coverage data
machine1: running command: test -e /sys/kernel/debug/gcov
machine1: exit status 1
syncing
machine1: running command: sync
machine1# [  109.668227] kube-controller-manager[1034]: I0116 22:33:32.804238    1034 round_trippers.go:405] GET https://api.my.zyx/api/v1/namespaces/kube-system/endpoints/ks
machine1# [  109.678463] kubelet[1476]: I0116 22:33:32.814565    1476 config.go:101] Looking for [api file], have seen map[file:{} api:{}]
machine1# [  109.680205] kubelet[1476]: I0116 22:33:32.816315    1476 kubelet.go:1959] SyncLoop (housekeeping)
machine1# [  109.717406] dhcpcd[869]: vethb65ee3fc: no IPv6 Routers available
machine1# [  109.743463] kube-apiserver[1029]: I0116 22:33:32.876495    1029 handler.go:160] kube-aggregator: PUT "/api/v1/namespaces/kube-system/endpoints/kube-controller-ml
machine1# [  109.745526] kube-apiserver[1029]: I0116 22:33:32.876524    1029 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/endpoints/kube-controller-/
machine1# [  109.747288] kube-apiserver[1029]: I0116 22:33:32.876542    1029 handler.go:150] kube-apiserver: PUT "/api/v1/namespaces/kube-system/endpoints/kube-controller-ma1
machine1: exit status 0
test script finished in 110.59s
cleaning up
killing machine1 (pid 593)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/tmp/nix-build-vm-test-run-kubernetes-dns-singlenode.drv-0/vde1.ctl': Directory not empty
/nix/store/a6lw5fgdpicmf0nd9ja6m25cazr9h7g4-vm-test-run-kubernetes-dns-multinode
/nix/store/19ws0gsprrjrd3jxrvw1wb86156h0idx-vm-test-run-kubernetes-dns-singlenode

One observation - with kubernetes 1.9.1 tests run much much longer. Perhaps some behavior changed in the new kubernetes version? Hope it is not necessarily wrong, but in this case tests have to be changed. Will try to figure out how exactly those tests work.

@kuznero
Copy link
Member Author

kuznero commented Jan 16, 2018

On the other hand how to test that kubernetes modules work properly after this upgrade?

@kuznero kuznero mentioned this pull request Jan 16, 2018
8 tasks
@offlinehacker
Copy link
Contributor

offlinehacker commented Jan 17, 2018

I can test this on one of our kubernetes clusters which uses nixos in next week and check if everything works as it should. We want to update all clusters to kubernetes 1.9 anyway, so this is on agenda.

@kuznero
Copy link
Member Author

kuznero commented Jan 17, 2018

@offlinehacker thanks

@NeQuissimus
Copy link
Member

@GrahamcOfBorg test kubernetes.rbac

Copy link

@GrahamcOfBorg GrahamcOfBorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure for system: x86_64-linux

error: while evaluating ‘hydraJob’ at /var/lib/gc-of-borg/.nix-test-rs/repo/38dca4e3aa6bca43ea96d2fcc04e8229/builder/grahamc-zoidberg/lib/customisation.nix:167:14, called from /var/lib/gc-of-borg/.nix-test-rs/repo/38dca4e3aa6bca43ea96d2fcc04e8229/builder/grahamc-zoidberg/nixos/release.nix:286:22:
while evaluating the attribute ‘name’ at /var/lib/gc-of-borg/.nix-test-rs/repo/38dca4e3aa6bca43ea96d2fcc04e8229/builder/grahamc-zoidberg/lib/customisation.nix:172:24:
attribute ‘name’ missing, at /var/lib/gc-of-borg/.nix-test-rs/repo/38dca4e3aa6bca43ea96d2fcc04e8229/builder/grahamc-zoidberg/lib/customisation.nix:172:10

@kuznero
Copy link
Member Author

kuznero commented Jan 17, 2018

Is there any way to reproduce this on my box somehow?

@NeQuissimus
Copy link
Member

@grahamc ? Maybe I kicked it off wrong?

@srhb
Copy link
Contributor

srhb commented Jan 17, 2018

No, evaluating it via release.nix is also weird here. Building the test directly seems the only way, but ofborg doesn't do that, obviously.

@grahamc
Copy link
Member

grahamc commented Jan 17, 2018

Why is it weird to evaluate it through release.nix?

@jirkadanek
Copy link
Member

I cherry-picked the two commits from this PR on top of current master (98b35db Wed Jan 17 eclipse-plugins-ansi-econsole: init at 1.3.5) and ran nix-build nixos/tests/kubernetes -A rbac -I .. I got an error, both times I ran this:

machine1# [ 1178.071316] kubelet[1742]: I0117 22:56:48.946459    1742 config.go:99] Looking for [api file], have seen map[file:{}]
machine2# [ 1178.304523] kubelet[1114]: I0117 22:56:49.146572    1114 generic.go:183] GenericPLEG: Relisting
machine2# [ 1178.308840] kubelet[1114]: I0117 22:56:49.153175    1114 config.go:99] Looking for [api], have seen map[]
machine1# [ 1178.320576] kubelet[1704]: I0117 22:56:49.087860    1704 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1178.116734] kube-apiserver[1079]: I0117 22:56:48.994321    1079 handler.go:160] kube-aggregator: GET "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by nonGoRestful
machine1# [ 1178.121814] kube-apiserver[1079]: I0117 22:56:48.994365    1079 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by prefix /api/
machine1# [ 1178.126070] kube-apiserver[1079]: I0117 22:56:48.994396    1079 handler.go:150] kube-apiserver: GET "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by gorestful with webservice /api/v1
machine1# [ 1178.128363] kube-apiserver[1079]: I0117 22:56:48.997994    1079 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (3.928991ms) 200 [[kube-controller-manager/v1.9.1 (linux/amd64) kubernetes/3a1c944/leader-election] 192.168.1.1]
machine1# [ 1178.131345] kube-apiserver[1079]: I0117 22:56:48.999263    1079 handler.go:160] kube-aggregator: PUT "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by nonGoRestful
machine1# [ 1178.133724] kube-apiserver[1079]: I0117 22:56:48.999288    1079 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by prefix /api/
machine1# [ 1178.136221] kube-apiserver[1079]: I0117 22:56:48.999317    1079 handler.go:150] kube-apiserver: PUT "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by gorestful with webservice /api/v1
machine2# [ 1178.339793] kube-proxy[528]: I0117 22:56:49.184135     528 config.go:141] Calling handler.OnEndpointsUpdate
machine1# [ 1178.145787] kube-controller-manager[1965]: I0117 22:56:48.998251    1965 round_trippers.go:436] GET https://api.my.zyx/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 4 milliseconds
machine1# [ 1178.148623] kube-controller-manager[1965]: I0117 22:56:49.018441    1965 graph_builder.go:601] GraphBuilder process object: v1/Endpoints, namespace kube-system, name kube-controller-manager, uid 14480d14-fbd7-11e7-b236-525400123456, event type update
machine1# [ 1178.151654] kube-controller-manager[1965]: I0117 22:56:49.021523    1965 round_trippers.go:436] PUT https://api.my.zyx/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 22 milliseconds
machine1# [ 1178.153594] kube-controller-manager[1965]: I0117 22:56:49.021822    1965 leaderelection.go:199] successfully renewed lease kube-system/kube-controller-manager
machine1# [ 1178.155378] kubelet[1742]: I0117 22:56:49.019767    1742 generic.go:183] GenericPLEG: Relisting
machine1# [ 1178.156744] kube-proxy[1086]: I0117 22:56:49.019117    1086 config.go:141] Calling handler.OnEndpointsUpdate
machine1# [ 1178.157994] kube-apiserver[1079]: I0117 22:56:49.021116    1079 wrap.go:42] PUT /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (21.953628ms) 200 [[kube-controller-manager/v1.9.1 (linux/amd64) kubernetes/3a1c944/leader-election] 192.168.1.]
machine1# [ 1178.167633] kubelet[1742]: I0117 22:56:49.046376    1742 config.go:99] Looking for [api file], have seen map[file:{}]
machine2# [ 1178.408940] kubelet[1114]: I0117 22:56:49.253301    1114 config.go:99] Looking for [api], have seen map[]
machine1# [ 1178.420664] kubelet[1704]: I0117 22:56:49.187859    1704 config.go:99] Looking for [api file], have seen map[file:{}]
error: action timed out after -1 seconds at /nix/store/22rg6kabcnqjadb6ajfdxqk8sq415dq3-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
action timed out after -1 seconds at /nix/store/22rg6kabcnqjadb6ajfdxqk8sq415dq3-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
cleaning up
killing machine1 (pid 27482)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/tmp/nix-build-vm-test-run-kubernetes-rbac-singlenode.drv-0/vde1.ctl': Directory not empty
builder for ‘/nix/store/5r32c0qs8v4bqy3hy6ym3n6p1nc6kgzd-vm-test-run-kubernetes-rbac-singlenode.drv’ failed with exit code 255
error: build of ‘/nix/store/5r32c0qs8v4bqy3hy6ym3n6p1nc6kgzd-vm-test-run-kubernetes-rbac-singlenode.drv’ failed
nix-build nixos/tests/kubernetes -A rbac -I .  5.45s user 3.49s system 0% cpu 19:45.08 total

@jirkadanek
Copy link
Member

same error with nix-build nixos/tests/kubernetes -A dns -I .

machine1# [ 1177.817411] kubelet[1734]: I0118 06:35:14.599854    1734 round_trippers.go:436] GET https://api.my.zyx/api/v1/pods?fieldSelector=spec.nodeName%3Dmachine1.my.zyx&limit=500&resourceVersion=0 403 Forbidden in 5 milliseconds
machine1# [ 1177.820796] kubelet[1734]: E0118 06:35:14.615350    1734 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "apiserver-client-kubelet" cannot list pods at the cluster scope
machine1# [ 1178.098719] kubelet[1698]: I0118 06:35:14.940601    1698 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1177.865278] kubelet[1734]: I0118 06:35:14.659768    1734 config.go:99] Looking for [api file], have seen map[file:{}]
machine2# [ 1178.098806] kubelet[1112]: I0118 06:35:14.960770    1112 config.go:99] Looking for [api], have seen map[]
machine1# [ 1178.198759] kubelet[1698]: I0118 06:35:15.040712    1698 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1177.953699] kube-apiserver[1082]: I0118 06:35:14.748223    1082 handler.go:160] kube-aggregator: GET "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by nonGoRestful
machine1# [ 1177.955672] kube-apiserver[1082]: I0118 06:35:14.750233    1082 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by prefix /api/
machine1# [ 1177.958596] kube-apiserver[1082]: I0118 06:35:14.751720    1082 handler.go:150] kube-apiserver: GET "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by gorestful with webservice /api/v1
machine1# [ 1177.966206] kube-apiserver[1082]: I0118 06:35:14.755374    1082 wrap.go:42] GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (7.446756ms) 200 [[kube-controller-manager/v1.9.1 (linux/amd64) kubernetes/3a1c944/leader-election] 192.168.1.1]
machine2# [ 1178.177076] kube-proxy[531]: I0118 06:35:15.039014     531 config.go:141] Calling handler.OnEndpointsUpdate
machine1# [ 1177.969819] kube-apiserver[1082]: I0118 06:35:14.756447    1082 handler.go:160] kube-aggregator: PUT "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by nonGoRestful
machine1# [ 1178.235028] kubelet[1698]: I0118 06:35:15.076974    1698 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1177.972678] kube-apiserver[1082]: I0118 06:35:14.756477    1082 pathrecorder.go:247] kube-aggregator: "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by prefix /api/
machine1# [ 1178.236714] kubelet[1698]: I0118 06:35:15.078676    1698 kubelet.go:1921] SyncLoop (housekeeping, skipped): sources aren't ready yet.
machine1# [ 1177.975052] kube-apiserver[1082]: I0118 06:35:14.756503    1082 handler.go:150] kube-apiserver: PUT "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager" satisfied by gorestful with webservice /api/v1
machine1# [ 1177.977558] kube-apiserver[1082]: I0118 06:35:14.759349    1082 wrap.go:42] PUT /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (3.069384ms) 200 [[kube-controller-manager/v1.9.1 (linux/amd64) kubernetes/3a1c944/leader-election] 192.168.1.1]
machine1# [ 1177.980150] kube-proxy[1088]: I0118 06:35:14.760057    1088 config.go:141] Calling handler.OnEndpointsUpdate
machine1# [ 1177.981560] kubelet[1734]: I0118 06:35:14.759724    1734 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1177.983307] kube-controller-manager[2064]: I0118 06:35:14.755601    2064 round_trippers.go:436] GET https://api.my.zyx/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 7 milliseconds
machine1# [ 1177.985632] kube-controller-manager[2064]: I0118 06:35:14.758955    2064 graph_builder.go:601] GraphBuilder process object: v1/Endpoints, namespace kube-system, name kube-controller-manager, uid 21f12473-fc17-11e7-a2e9-525400123456, event type update
machine1# [ 1177.988147] kube-controller-manager[2064]: I0118 06:35:14.759510    2064 round_trippers.go:436] PUT https://api.my.zyx/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 3 milliseconds
machine2# [ 1178.198684] kubelet[1112]: I0118 06:35:15.060700    1112 config.go:99] Looking for [api], have seen map[]
machine1# [ 1177.990277] kube-controller-manager[2064]: I0118 06:35:14.763013    2064 leaderelection.go:199] successfully renewed lease kube-system/kube-controller-manager
machine1# [ 1178.298694] kubelet[1698]: I0118 06:35:15.140626    1698 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1178.061839] kubelet[1734]: I0118 06:35:14.856299    1734 config.go:99] Looking for [api file], have seen map[file:{}]
machine1# [ 1178.063498] kubelet[1734]: I0118 06:35:14.858002    1734 kubelet.go:1921] SyncLoop (housekeeping, skipped): sources aren't ready yet.
machine1# [ 1178.065488] kubelet[1734]: I0118 06:35:14.859721    1734 config.go:99] Looking for [api file], have seen map[file:{}]
error: action timed out after -1 seconds at /nix/store/22rg6kabcnqjadb6ajfdxqk8sq415dq3-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
action timed out after -1 seconds at /nix/store/22rg6kabcnqjadb6ajfdxqk8sq415dq3-nixos-test-driver/lib/perl5/site_perl/Machine.pm line 227, <__ANONIO__> line 901.
cleaning up
killing machine1 (pid 20663)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/tmp/nix-build-vm-test-run-kubernetes-dns-singlenode.drv-0/vde1.ctl': Directory not empty
machine2# [ 1178.298946] kubelet[1112]: I0118 06:35:15.160802    1112 config.go:99] Looking for [api], have seen map[]
builder for ‘/nix/store/9d46cpdm16df4rrqwljx6b1xazzghg7j-vm-test-run-kubernetes-dns-singlenode.drv’ failed with exit code 255
error: build of ‘/nix/store/9d46cpdm16df4rrqwljx6b1xazzghg7j-vm-test-run-kubernetes-dns-singlenode.drv’ failed
nix-build nixos/tests/kubernetes -A dns -I .  5.56s user 3.38s system 0% cpu 19:57.85 total

the return code was 100 in both cases.

@jirkadanek
Copy link
Member

$ nix-info -m

  • system: "x86_64-linux"
  • host os: Linux 4.9.76, NixOS, 18.03pre125130.3a763b91963 (Impala)
  • multi-user?: yes
  • sandbox: no
  • version: nix-env (Nix) 1.11.16
  • channels(root): "nixos-18.03pre125130.3a763b91963"
  • channels(jdanek): ""
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs

@srhb
Copy link
Contributor

srhb commented Jan 18, 2018

@grahamc I haven't looked into it, but the tests are sufficiently different from the others in release.nix that the naive approach doesn't work:

❯ nix-build nixos/release.nix -A tests.kubernetes.rbac --arg supportedSystems '[ "x86_64-linux" ]'
error: attribute 'meta' missing, at /home/sarah/src/nixpkgs/lib/customisation.nix:172:10

Hydra knows how to do it, clearly, but I'm not sure whether ofborg does the exact same thing.

@kuznero
Copy link
Member Author

kuznero commented Jan 22, 2018

@offlinehacker, did you have a chance to see if it works for you in the field?

@jirkadanek
Copy link
Member

@kuznero I've been able to set this up as a single node cluster on my laptop and run it without problems since Monday. Compared to the previous version, I only had to add services.kubele.extraOpts = "--fail-swap-on=false";, because 1.9 now by default refuses to start on systems with swap.

@jirkadanek
Copy link
Member

little update. I actually hit an issue with this just now, the issue matches exactly this one, kubernetes/kubernetes#32796. Which is weird, as this was supposed to be resolved few releases before 1.9.1...

@kuznero
Copy link
Member Author

kuznero commented Jan 24, 2018

@jdanekrh thanks for the update.

@srhb
Copy link
Contributor

srhb commented Feb 2, 2018

@jdanekrh I believe that something has changed relating to bootstrapping of the kubelets' authorization.

Adding the following line to the start of every test script works:

      $machine1->waitUntilSucceeds("kubectl create clusterrolebinding kubelet-node-binding --clusterrole=system:node --user=apiserver-client-kubelet");

I'm wondering whether this is just due to the CN we're using, or if we need to do something else to bootstrap the clients.

@srhb
Copy link
Contributor

srhb commented Feb 4, 2018

How about something like this?

The issue with the current tests is that there is no longer a default ClusterRoleBinding that confers registration access for kubelets with users in the system:nodes group. To alleviate this, I've enabled the Node authorizer, which requires the username (CN) be system:node:<nodename>. I've also thrown in the NodeRestriction admission controller, which should furthermore limit the kubelets' access drastically.

We could just add them both to the tests, but I think they're sane defaults and match up well with what the k8s community is doing.

nixos/k8s: Enable Node authorizer and NodeRestriction by default
@voobscout
Copy link
Contributor

Will this ever get merged?

@NeQuissimus
Copy link
Member

I think this is good to go?! @srhb ? @kuznero ?
The tests pass for me anyways...

@kuznero
Copy link
Member Author

kuznero commented Feb 16, 2018

Should be ok

@NeQuissimus NeQuissimus merged commit 8755902 into NixOS:master Feb 16, 2018
@Baughn
Copy link
Contributor

Baughn commented Feb 26, 2018

This won't be particularly reliable.

Kubernetes 9.x works with Docker 17.03.x. We're currently shipping 17.12.x, and Docker has API changes in minor version bumps, so all sorts of flakiness may ensue. It really should be set to force Docker 17.03.

(Or better, 1.12. To quote from the documentation: On each of your machines, install Docker. Version v1.12 is recommended, but v1.11, v1.13 and 17.03 are known to work as well. Versions 17.06+ might work, but have not yet been tested and verified by the Kubernetes node team.)

@srhb
Copy link
Contributor

srhb commented Feb 27, 2018

@Baughn It appears that snippet is from kubeadm, not from Kubernetes itself. I haven't found anything in the actual Kubernetes docs that is worded as strongly; the release notes simply call those "verified" versions, and 17.12 appears to work just fine. We may still want to do something to signal this to the user, and at least provide (one of) the verified version(s) as an option, but forcing this seems a bit strong.

@srhb
Copy link
Contributor

srhb commented Feb 27, 2018

@Baughn This issue is relevant, too: kubernetes/kubernetes#53221

Until k8s switches to matching some/more Docker API versions, the problem remains that we can either choose an EOL Docker or a K8s-unvalidated version. Fun!

@drdaeman
Copy link
Contributor

drdaeman commented Mar 2, 2018

I've tried this today, cherry-picking those on release-17.09. Unfortunately, this has a problem: if CA/key/cert files are not specified explicitly, they'll be generated under /var/run/kubernetes, so any restart breaks the cluster.

Thanks to @srhb for help on #nixos, I had it solved. I've scratched a few notes here: https://gist.github.com/drdaeman/fee048df456ced9f604fb554b78f549f (a sample config and a script to generate dirty certs that would work for a totally insecure local-dev single-node "cluster").

Unfortunately, I'm really brain-dead after the struggle with K8s, so can't write a proper issue. And my weekend's going to be very hasty, so I'm not sure I'll have time for this in next few days. But I thought I'd at least leave this comment here, in case someone else would have similar problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants