Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolvconf service: init #62955

Merged
merged 1 commit into from Jul 17, 2019
Merged

resolvconf service: init #62955

merged 1 commit into from Jul 17, 2019

Conversation

abbradar
Copy link
Member

@abbradar abbradar commented Jun 10, 2019

Motivation for this change

This is a refactor of how resolvconf is managed on NixOS. We split it
into a separate service which is enabled internally depending on whether
we want /etc/resolv.conf to be managed by it. Various services now take
advantage of those configuration options, in particular NetworkManager
which fixes #61490.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Tested that #61490 is fixed.

@abbradar
Copy link
Member Author

(Messed up with issue numbers, I meant #61490).

@colonelpanic8
Copy link
Contributor

colonelpanic8 commented Jun 21, 2019

What's the status of this PR?

@abbradar
Copy link
Member Author

abbradar commented Jun 21, 2019 via email

@abbradar abbradar force-pushed the resolvconf branch 3 times, most recently from 9be5ae0 to c1153c8 Compare July 11, 2019 14:13
@abbradar
Copy link
Member Author

I updated this PR taking into account changes from #61949. Given that rc-manager option can and should be determined on NixOS configuration instead I removed it and set it accordingly to whether resolvconf is enabled now. @talyz what do you think?

@jameysharp
Copy link
Contributor

I want to get rid of as many activation scripts as possible and the current resolvconf snippet was making my head hurt, so I went looking and found this PR. I think your approach brings more clarity to these modules. I like it! I do have some questions though.

It doesn't appear to me that removing resolvconf.conf actually disables resolvconf. If anything calls openresolv's resolvconf, in the absence of a config file, it'll just default to running the libc subscriber which will overwrite /etc/resolv.conf. So either we have to be super careful about making sure that nothing else on the system will invoke resolvconf, or maybe we should write out a config file that just has libc = NO in it.

I'm skeptical about the commands in system.activationScripts.resolved, which were in system.activationScripts.resolvconf before your patch. I couldn't figure out if they were useful before, but it seems to me that they definitely serve no purpose after your changes. If resolved is enabled, then it symlinks /etc/resolv.conf, which is supposed to disable resolvconf. Therefore there's no point in messing around with /run/resolvconf in that case, right?

As far as I can tell, system.activationScripts.resolvconf does not depend on anything in /var, so it shouldn't ask to be ordered after the "var" activation snippet. (The snippet before your patch didn't need it either, I think.)

But is there any way we could avoid invoking resolvconf during activation? I think it might work to create a oneshot service with RestartTriggers = [ config.environment.etc."resolvconf.conf".source ] and ExecStart = "${pkgs.openresolv}/bin/resolvconf -u"... or something like that, maybe?

The new assertion in rdnssd.nix has a somewhat misleading message, since after this patch NetworkManager never manages /etc/resolv.conf by itself.

Okay, that's all I have brain for at the moment. I hope this was helpful and I'll try to keep an eye on this PR.

@talyz
Copy link
Contributor

talyz commented Jul 12, 2019

So far, I've only read through the changes - I haven't tried it out yet, but it looks good to me! The only case not covered by this would be if one wants NetworkManager to explicitly control /etc/resolv.conf, but I can't really see a legitimate reason for why one would want that.

I'm seconding @jameysharp's remarks, though: in particular, removing /etc/resolvconf.conf doesn't really disable resolvconf. To get around this we could put either libc=NO or resolvconf=NO into /etc/resolvconf.conf, or we could stop updating /etc/resolv.conf in place by setting the resolv_conf variable to someplace else and symlinking /etc/resolv.conf to it only if resolvconf is enabled.

@abbradar
Copy link
Member Author

@jameysharp @talyz Thank you both for your comments!

I've updated the patch. Instead of somehow forbidding resolvconf usage in /etc/resolvconf.conf or via symlinks I opted for complete removal of openresolv from system if resolvconf is not used. I feel this is the right fix for this problem, if something uses openresolv separately it's a bug.

I also moved resolvconf update to a systemd unit as per @jameysharp suggestion, it seems to work nicely!

A possible improvement would be to add support for pre-activation commands into stage-2 and split handling of host-provided resolv.confs into resolvconf module too, but that's subject for future work.

@jameysharp
Copy link
Contributor

@talyz, I also thought about the loss of the ability to make NetworkManager control resolv.conf and I agree, it doesn't seem important.

@abbradar Ooh, I'm glad the oneshot unit thing worked here. I bet I can apply it other places too...

I'm not entirely comfortable with hoping that nothing will call resolvconf just because we haven't told it to. The consequence if we're wrong is that networking breaks, sometimes, unpredictably, which would lead to bug reports that look similar to #61230.

From a quick git grep -lFw openresolv across the whole repo, it looks like dhcpcd, openvpn, wireguard, strongswan, vpnc, and wicd all might potentially invoke it. (Also network-manager but I'm okay with believing that at least that will only invoke it if we tell it to.) Removing it from the system path won't necessarily stop them. (But it probably shouldn't be on the system path no matter what...)

Instead I'd prefer to ensure that if something does go wrong, it fails in a predictable way and without discarding the configured settings. Since resolvconf.conf is a shell fragment which resolvconf sources, a "belt and suspenders" approach here might be to write something like this into it:

echo "resolvconf is disabled on this system but was used anyway:" >&2
echo "$0 $*" >&2
exit 1

Which I hope would help track down things which we didn't expect would be using it.

A more extreme approach would be to set disallowedRequisites = [ pkgs.openresolv ] in the baseSystem derivation in system/activation/top-level.nix. That would prevent the configuration from building if openresolv got pulled into it in any way. Getting all configurations to work while being that strict might be tricky though. For one thing, it looks like NetworkManager grabs the resolvconf path at build time even if you don't decide to use it at runtime.

I'm not totally convinced that useHostResolvConf should be moved into networking.resolvconf. I think it's reasonable to expect that setting it should prevent any configuration from discarding the host-provided settings. An assertion along the lines of config.networking.useHostResolvConf -> !(config.environment.etc."resolv.conf".enable or false) might be in order?

I'm pretty sure the "import host's resolv.conf into resolvconf" step can be a oneshot service ordered before network-pre.target or something. Nothing should be trying to use the network before systemd reaches the network-online.target anyway. I'm not even sure why it'd be useful to use both the host's resolv.conf and resolvconf together, honestly. But you're right that this is a separate question.

@abbradar
Copy link
Member Author

abbradar commented Jul 13, 2019

@jameysharp The problem of host's resolv.conf is that we need to do this only at the boot time, so systemd service or activation script is problematic. Otherwise we cannot reliably distinguish if current /etc/resolv.conf is host-provided or generated by resolvconf itself. The best solution that I see is either use a file or the host interface in resolvconf itself as flags, but both solutions seem flaky to me - if the flag is cleared this will lead to fun bugs.

After some thought I agree we should protect against random usage of resolvconf. Thank you for a nice suggestion about using resolvconf.conf as a shell script! I updated the PR with that. I don't like disallowedRequisites approach because that doesn't cover nix-env, nix-shell etc.

Moving the useHostResolvConf option to resolvconf shows that it's a feature of our resolvconf module. I'm not sure but I feel systemd-resolvectl implements that in their own way (perhaps by checking for magic strings in resolv.conf and using it) - I didn't immediately find any confirmation of that though.

@jameysharp
Copy link
Contributor

I thought the case of host resolv.conf was funny because usually when I've been trying to replace activation scripts, the problem has been how to get a systemd unit to run not just at boot, but also on switch-to-configuration. So in this case, where we actually want it to only run at boot, I think that's the easy case. 😁 But it isn't important right now.

From reading the manual pages, it looks like systemd-resolved has a number of different modes for how it treats resolv.conf, but I don't think any of them have special treatment for a host-provided copy. Either /etc/resolv.conf is a symlink to one of resolved's three different available versions... or it isn't, and resolved doesn't write to it, assuming that something else is managing it. Which I guess is fine for the host-provided case, if nothing else writes to it either, and everything on the system knows to use resolved's dbus API or something instead of paying attention to the traditional file? This is hard for me to reason about...

Because it is hard for me to reason about, I still lean toward not renaming that option in this patch. But I'm not going to push, especially since I have no say in whether this PR gets merged. 😁

As far as I can tell, nixos/modules/system/boot/resolved.nix no longer needs to have a mkMerge added, right? I think that's an artifact of a previous version of the patch, and removing it will make the diff smaller.

Aside from those two quibbles, I think this is a good patch and I'd be in favor of somebody merging it.

@abbradar
Copy link
Member Author

@jameysharp I'm just wary of using file flags for one-time initialization, no other reason. Maybe you have any ideas about how to ensure that this code runs only once?

I grepped through systemd-resolved and systemd-nspawn code today and I think you are right - nspawn and resolved don't communicate in any way, and it's up to user to specify desired resolv.conf behaviour (copy, bind, don't touch) for nspawn and specify mode of operation for resolved.

Therefore it'll indeed be better for this option to be top-level and various modules to deal with it in their various ways. I'll revert this change, thanks!

This is a refactor of how resolvconf is managed on NixOS. We split it
into a separate service which is enabled internally depending on whether
we want /etc/resolv.conf to be managed by it. Various services now take
advantage of those configuration options.

We also now use systemd instead of activation scripts to update
resolv.conf.

NetworkManager now uses the right option for rc-manager DNS
automatically, so the configuration option shouldn't be exposed.
@jameysharp
Copy link
Contributor

Okay, I endorse commit 01b90dc! I've re-read it carefully after all these changes and I think it should be merged as-is. I can't do that myself though.


I'll outline what I'm thinking should work for making a systemd unit replace the @useHostResolvConf@ block in system/boot/stage-2-init.sh, but just to be clear: I do not think the change I'm about to describe should be in this commit. You have a nice unit of work here as it stands, and if what I'm about to propose turns out to have some subtle failure mode, I'd rather people be able to revert it by itself...

Let's consider the constraints. This statement needs to:

  • Run before any other invocation of resolvconf.
  • Run at most once per boot. When the container is booted a second time, this statement should run again.

It must not run if, for example, the activation script is re-run, even if the relevant configuration changes; and the admin should be protected from accidentally running it.

I think a configuration like this should work:

{
  systemd.services.host-resolvconf = {
    # ensure that no matter what else happens, this happens during initial boot
    wantedBy = [ "sysinit.target" ];

    # ensure network managers don't start until this is done
    wants = [ "network-pre.target" ];
    before = [ "network-pre.target" ];

    unitConfig = {
      # protect the admin or switch-to-configuration from accidentally running this again
      RefuseManualStart = true;

      # make systemd remember that this already ran, so no dependency can pull it in again
      RefuseManualStop = true;
      RemainAfterExit = true;

      # only run if there's something to do
      ConditionFileNotEmpty = "/etc/resolv.conf";
      StandardInput = "file:/etc/resolv.conf";
      ExecStart = "${pkgs.openresolv}/bin/resolvconf -m 1000 -a host";
    };
  };
}

@abbradar
Copy link
Member Author

@jameysharp I should confess, I didn't consider systemd to be a possible way of handling this! Nice idea!

I agree; let's leave that for another time. Given that I tested this on several configurations (servers, clients, containers) and we did quite a several rounds of review (thanks!) I'll merge this in several days unless anyone else wants to take a stab at reviewing and testing.

@jameysharp
Copy link
Contributor

...wait, did you have that "Member" tag all along and I just didn't notice, or did you just recently get commit rights? 😕

Anyway, sounds great 😁

@abbradar
Copy link
Member Author

For quite a long time actually :3

@jameysharp
Copy link
Contributor

Oh... Maybe you could review/merge some of my pending pull requests? 😁

#64387 is a small patch that's necessary to get one of the tests to work again; #64268 has a "LGTM" from the person who's done the most work in the same area recently; and #63541 has several thumbs-ups and an approval review. I'm just not sure how to get them merged.

@abbradar abbradar merged commit 294751a into NixOS:master Jul 17, 2019
@jameysharp
Copy link
Contributor

@abbradar, I hadn't realized until I tried this in conjunction with my no-networking patches that setting networking.resolvconf.enable = false directly fails with this error:

error: The option `networking.resolvconf.enable' has conflicting definitions, in `.../configuration.nix' and `.../nixpkgs/nixos/modules/config/resolvconf.nix'.

Should there be a mkDefault in the definition in resolvconf.nix? I know the option is marked as internal, so is the intent that the only way to disable it is to set environment.etc."resolv.conf".enable = false or similar?

@abbradar
Copy link
Member Author

@jameysharp Yeah, the idea is to abstract away how exactly it's chosen from other programs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty networking.nameservers wipes /etc/resolv.conf after a rebuild
4 participants