Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/vswitches : Fix vswitch-netdev.service going down just after startup #34586

Closed
wants to merge 1 commit into from

Conversation

netixx
Copy link
Contributor

@netixx netixx commented Feb 4, 2018

The service has a 'BindsTo' Dependency to sys-subsystem-vsiwtch.device
which causes the service to stop immediately after startup because the
add-br commands make the device flap somehow.

This patch moves creation of the device to 'ExecStartPre=' phase of the
service, where 'BindsTo' does not apply (tested, bu could not find docs
about this).

See #34336 for details

Motivation for this change

Configuring networking.vswitches did not work before.
Also, as we remove the device in postStop, it makes sense to create it in preStart.

Things done

Tested in the virtualbox VM (see related issue for logs and details).

  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@netixx
Copy link
Contributor Author

netixx commented Feb 9, 2018

After some more testing, it seems this patch might not be enough to definitely fix the issue.

I have done some more testing by adding a good old sleep 10 command which seems to do the trick more reliably. But I am not a fan of this solution (unless you guys say it's ok...).

@Mic92
Copy link
Member

Mic92 commented Feb 9, 2018

How do other distributions handle this? I am familiar with network configuration, but I have little experience with openvswitch to propose a better solution.

@Mic92
Copy link
Member

Mic92 commented Feb 9, 2018

Maybe we have to switch to Requires= instead of BindsTo? I am not sure 100% sure, what the downside of that would be.

@netixx
Copy link
Contributor Author

netixx commented Feb 10, 2018

My understanding is that the bindsTo = [ "vswitchd.service" (subsystemDevice n) ] ++ deps binds to the current vswitch subsystem device (as well as other, such as interfaces comprises in the vswitch and the vswitchd service) : if any of those go down, then the service will be shutdown. This enables synchronisation between the kernel openvswitch device and the -netdev.service (meaning if you shut down the openvswitch device manually, this will be reflected on the -netdev.service).

After some more testing and reflexion, I have the following remarks on dependencies configured :

  1. the BindsTo=vswitchd.service makes sense : if the service go down, all vswitches should be brought down
  2. the BindsTo=(subsystemDevice n) could make sense : if the vswitch device is shutdown (by external events), the resulting netdev should go down (so that it can be brought back up)
  3. the BindsTo=deps (interfaces member of the vswitch) does not make sense to me: if an interface of the vswitch goes down, the vswitch should be able to continue working with the other interfaces

Regardless of the issue mentioned in this PR, I think we should refactor the following dependencies like so:

bindsTo=["vswitchd.service" (subsystemDevice n)];
wants=deps;

Because:

  • the vswitch should go up and should stay up if one of the interfaces fails
  • the vswitch should shut down if vswitchd daemon shuts down

We should maybe to the same analysis for other services (e.g. bridges).

For the original issue, I have tested adding ovs-vsctl wait-until bridge ${n} at the end of the pre-start script, but to no avail (I think add-br implicitly does that). I have had some success with adding sleep 1 (positive result after multiple reboots).

The problem seems to be random, sometimes occurring when manually restarting the service, more frequently on reboots (but not consistently failing across reboots). This leads me to think that there is some kind of race condition, because the -netdev.service binds to the device it's creating. I don't know if this is a supported scenario by systemd and it is very specific to nixos : I haven't seen other distro is creating vswitches and associating interfaces in a systemd service (mostly because vswitch configuration is persistent in the openvswitch database, the user does it once and it is reactivated on reboots).

Maybe we should ask @tstrobel ("Thomas Strobel ts468@cam.ac.uk") who made the commits to the vswitchd, for more information ?
What do you think about the proposed refactoring ?

@netixx
Copy link
Contributor Author

netixx commented Feb 16, 2018

Regarding this issue, and in respect with issue #34851, I think I might resubmit a full PR with updates and new features, because upgrading the openvswitch package may change implementation details.

…artup

The service has a 'BindsTo' Dependency to sys-subsystem-vsiwtch.device
which causes the service to stop immediately after startup because the
add-br commands make the device flap somehow.

This patch moves creation of the device to 'ExecStartPre=' phase of the
service, where 'BindsTo' does not apply (tested, bu could not find docs
about this).

See NixOS#34336 for details
@netixx
Copy link
Contributor Author

netixx commented Feb 18, 2018

Closed since I opened the more complete #35127

@netixx netixx closed this Feb 18, 2018
netixx added a commit to netixx/nixpkgs that referenced this pull request Oct 8, 2018
Originnaly, the need was to fix NixOS#34336 and add features in NixOS#34851.

With remarks from @volth, openvswitch was upgraded to the latest
stable version (currenty 2.9.2). This remove ovs-monitor-ipsec
commands. LTS version is still available using
`config.virtualisation.vswitch.package = pkgs.openvswitch-lts`

Systemd dependencies for scripted mode
were refactored according to analysis in NixOS#34586.

networking.vswitches can now be used with systemd-networkd,
although they are not supported by the daemon, a nixos receipe
creates the switch and attached required interfaces (just like
the scripted version).

Vlans and internal interfaces are implemented following the
 `networking.interfaces` template format i.e. each interface is
described using an attributeSet (vlan and type at the moment).
If vlan is present, then interface is added to the vswitch with
given tag (access mode). Type internal enabled vswitch to create
interfaces (see openvswitch docs).

Added configuration for configuring supported openFlow version on
the vswitch
Mic92 pushed a commit that referenced this pull request Feb 21, 2020
Systemd dependencies for scripted mode
were refactored according to analysis in #34586.

networking.vswitches can now be used with systemd-networkd,
although they are not supported by the daemon, a nixos receipe
creates the switch and attached required interfaces (just like
the scripted version).

Vlans and internal interfaces are implemented following the
  template format i.e. each interface is
described using an attributeSet (vlan and type at the moment).
If vlan is present, then interface is added to the vswitch with
given tag (access mode). Type internal enabled vswitch to create
interfaces (see openvswitch docs).

Added configuration for configuring supported openFlow version on
the vswitch

This commit is a split from the original PR #35127.
netixx added a commit to netixx/nixpkgs that referenced this pull request May 23, 2020
Systemd dependencies for scripted mode
were refactored according to analysis in NixOS#34586.

networking.vswitches can now be used with systemd-networkd,
although they are not supported by the daemon, a nixos receipe
creates the switch and attached required interfaces (just like
the scripted version).

Vlans and internal interfaces are implemented following the
  template format i.e. each interface is
described using an attributeSet (vlan and type at the moment).
If vlan is present, then interface is added to the vswitch with
given tag (access mode). Type internal enabled vswitch to create
interfaces (see openvswitch docs).

Added configuration for configuring supported openFlow version on
the vswitch

This commit is a split from the original PR NixOS#35127.
rnhmjoj pushed a commit that referenced this pull request Aug 16, 2021
Systemd dependencies for scripted mode
were refactored according to analysis in #34586.

networking.vswitches can now be used with systemd-networkd,
although they are not supported by the daemon, a nixos receipe
creates the switch and attached required interfaces (just like
the scripted version).

Vlans and internal interfaces are implemented following the
  template format i.e. each interface is
described using an attributeSet (vlan and type at the moment).
If vlan is present, then interface is added to the vswitch with
given tag (access mode). Type internal enabled vswitch to create
interfaces (see openvswitch docs).

Added configuration for configuring supported openFlow version on
the vswitch

This commit is a split from the original PR #35127.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants