New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/network-interfaces-scripted: fix a container networking bug #47252
Conversation
When a bridge interface was reconfigured, running containers using this bridge lost connectivity: restarting network-addresses-brN.service triggered a restart of network-setup.service via a "partOf" relationship introduced in 07e0c0e. This in turn restarted brN-netdev.service. The bridge was thus destroyed and recreated with the same name but a new interface id, causing attached veth interfaces to lose their connection. This change removes the "partOf" relationship between network-setup.service and network-addresses-brN.service for all bridges.
affects #45960, please backport to 18.09. |
I also found that despite our apparent fix, the original problem we intended to fix reappeared: https://github.com/NixOS/nixops/issues/833 We don't know yet why it is. |
If that's the case, maybe we could just revert 07e0c0e ? |
Well, it did fix some of the occurrences (we verified that), but apparently not all of them. |
Perhaps we'll figure it out for good at NixCon? That'd be awesome. |
Let's do that 😄 |
Your analysis sounds and looks very correct, thanks a lot for that @xeji ! I will also happily wait for some more feedback from the original authors. :) |
Huh, containers test was failing for one year :) I admit I don't see the full picture of networking dependencies and what should trigger a restart given changed configuration, but exclusion of bridges does sound like a step forward. |
I'll merge this now as there were no objections. It's probably not the best possible solution but works for now. We should try to refactor and simplify the networking dependencies eventually. An alternative would be to just let |
backported to 18.09: 589d270 |
Motivation for this change
Fixes #47210: When a bridge interface was reconfigured, running containers using this bridge lost connectivity: restarting
network-addresses-brN.service
triggered a restart ofnetwork-setup.service
via apartOf
relationship introduced in 07e0c0e. This in turn restartedbrN-netdev.service
.The bridge was thus destroyed and recreated with the same name but a new interface id, causing attached
veth
interfaces to lose their connection.This change removes the
partOf
relationship betweennetwork-setup.service
andnetwork-addresses-brN.service
for all bridges so they can be reconfigured without being destroyed and recreated.I don't see any negative side effects atm, but would kindly ask the the authors of of 07e0c0e (which introduced the bug while fixing NixOS/nixops#640) to have a look: cc @nh2, @domenkozar, @fpletz, @aszlig, @basvandijk
There's also an (uglier) alternative solution without changing the
partOf
relationships: modify the start/stop scripts to skip deleting and recreating a bridge while there's still a veth interface attached.Things done
NixOS tests:
containers-restart_networking
now passes (which failed before due to the bug)cc @srhb