Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/networking: Add the FQDN and hostname to /etc/hosts #76542

Merged
merged 3 commits into from May 25, 2020

Conversation

primeos
Copy link
Member

@primeos primeos commented Dec 26, 2019

This fixes the output of "hostname --fqdn" (previously the domain name
was not appended). Additionally it's now possible to use the FQDN.

This works by unconditionally adding two entries to /etc/hosts:
127.0.0.1 localhost
::1 localhost

These are the first two entries and therefore gethostbyaddr() will
always resolve "127.0.0.1" and "::1" to "localhost".
This works because nscd (or rather the nss-files module) returns the
first matching row from /etc/hosts (and ignores the rest).

The FQDN and hostname entries are appended later to /etc/hosts, e.g.:
127.0.0.2 nixos-unstable.test.tld nixos-unstable
::1 nixos-unstable.test.tld nixos-unstable
Note: We use 127.0.0.2 here to follow nss-myhostname (systemd) as close
as possible. This has the advantage that 127.0.0.2 can be resolved back
to the FQDN but also the drawback that applications that only listen to
127.0.0.1 (and not additionally ::1) cannot be reached via the FQDN.
If you would like this to work you can use the following configuration:

networking.hosts."127.0.0.1" = [
  "${config.networking.hostName}.${config.networking.domain}"
  config.networking.hostName
];

Therefore gethostbyname() resolves "nixos-unstable" to the FQDN
(canonical name): "nixos-unstable.test.tld".

Advantages to the previous behaviour:

  • The FQDN will now also be resolved correctly (the entry was missing).
  • E.g. the command "hostname --fqdn" will now work as expected.

Drawbacks:

  • Overrides entries form the DNS (an issue if e.g. $FQDN should resolve to the public IP address instead of 127.0.0.1)
    • Note: This was already partly an issue as there's an entry for $HOSTNAME (without the domain part) that resolves to 127.0.1.1 (!= 127.0.0.1).
  • Unknown (could potentially cause other unexpected issues, but special
    care was taken).

Optional TODOs:

  • Document this in the changelog (not really optional!)
  • nixos/tests/hostname: Also check that 127.0.0.1 and ::1 still resolve back to localhost as this is apparently required by some applications (see c578924)
  • Print a warning/error if networking.hostName contains a dot: fc7a9b8
  • Investigate edge cases (e.g. FQDN in /etc/hostname)

Future improvements:

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @

@primeos
Copy link
Member Author

primeos commented Dec 26, 2019

I've tested/verified this with the following (very hacky) C program: hostname-test.c

Result:

[root@nixos-unstable:~]# cat /etc/hosts 
127.0.0.1 localhost
::1 localhost
127.0.0.1 nixos-unstable.test.tld nixos-unstable
::1 nixos-unstable.test.tld nixos-unstable


[root@nixos-unstable:~]# ./hostname-test 
Hostname (IPv4): nixos-unstable
- Canonical name: nixos-unstable.test.tld
- Aliases: nixos-unstable nixos-unstable
Hostname (IPv6): nixos-unstable
- Canonical name: nixos-unstable.test.tld
- Aliases: nixos-unstable
Address: 127.0.0.1
- Canonical name: localhost
- Aliases:
Address: ::1
- Canonical name: localhost
- Aliases:

[root@nixos-unstable:~]# hostname -f
nixos-unstable.test.tld

Note: With networking.enableIPv6 = false; this will not print the FQDN for hostname -f:

[root@nixos-unstable:~]# cat /etc/hosts
127.0.0.1 localhost

127.0.0.1 nixos-unstable.test.tld nixos-unstable


[root@nixos-unstable:~]# ./hostname-test 
Hostname (IPv4): nixos-unstable
- Canonical name: nixos-unstable.test.tld
- Aliases: nixos-unstable
Hostname (IPv6): nixos-unstable
- Canonical name: nixos-unstable
- Aliases: localhost
Address: 127.0.0.1
- Canonical name: localhost
- Aliases:
Address: ::1
- Canonical name: localhost
- Aliases: nixos-unstable

[root@nixos-unstable:~]# hostname -f
nixos-unstable

But that could be fixed by adding the entry for ::1 unconditionally.

@flokli
Copy link
Contributor

flokli commented Jan 1, 2020

cc @zimbatm #72077

@zimbatm
Copy link
Member

zimbatm commented Jan 3, 2020

We should really look at other distros to see what they are doing. Eg: Ubuntu 18.04 cloud image's /etc/hosts:

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

It looks like they don't even map the hostname.

There are so many use-cases that it's hard to make it work for all of them.

  1. Mapping localhost first. I think this is a good thing 👍
  2. FQDN mapping. I use DNS. I think FQDN mapping predates the use of DNS. Even when I use /etc/hosts I would map it to the external address.

/cc @globin and @fpletz who know much more about networking.

@zimbatm
Copy link
Member

zimbatm commented Jan 3, 2020

/cc @oxij

@markuskowa
Copy link
Member

This may also solve the problem with kerberos: #10183

@flokli
Copy link
Contributor

flokli commented Feb 5, 2020

If I get this right, it's just nss being broken, and it'd preferrable to have these records provided by nss-myhostname, as discussed in #10183?

@primeos
Copy link
Member Author

primeos commented May 22, 2020

I've rebased this PR to resolve the merge conflict.

If I get this right, it's just nss being broken, and it'd preferrable to have these records provided by nss-myhostname, as discussed in #10183?

@flokli could you explain that a bit more? I've only had a quick look at #10183 but that looks to me like the changes to /etc/hosts (as in this PR) are the important part that fixes the issue.

Also: It shouldn't be possible to fix this via myhostname as it is queried after /etc/hosts:

$ cat /etc/nsswitch.conf | grep hosts
hosts:     files mymachines dns myhostname

Personally I think that this approach is the best way to go and it should (almost) always work (of course there are corner cases if users make changes that negate this PR).
Additionally I have yet to notice any drawbacks of this approach apart from the fact that it looks a bit strange to have "duplicate" entries in /etc/hosts (of someone is aware of any drawbacks please let me know!).

I'd really like to finally resolve this long-standing issue. If someone noticed a bug/regression with this PR I'm happy to take a look and try to find a solution. If someone doesn't like this approach than please make your own PR so that we can properly discuss this. Thanks.

Edit:

  • I just realized that fixing this via myhostname should actually be possible (but I currently don't really want to waste multiple hours again by having a closer look at this)
  • "Note: With networking.enableIPv6 = false; this will not print the FQDN for hostname -f" that's something to be aware of (but should be fixable).

@flokli
Copy link
Contributor

flokli commented May 22, 2020

@primeos Sorry, I mistakenly assumed myhostname would also return the fqdn. Seems it doesn't.
It does support resolving the "localhost" entries, but it's probably fine to keep those in /etc/hosts too.

Can you cherry-pick blitz@efd537c into this PR (and re-enable the fqdn-specific test)?

That way, we could tinker around with the PR a bit, and easier spot breakages.

nixos/modules/config/networking.nix Outdated Show resolved Hide resolved
nixos/modules/config/networking.nix Outdated Show resolved Hide resolved
@primeos
Copy link
Member Author

primeos commented May 22, 2020

@flokli

Can you cherry-pick blitz/nixpkgs@efd537c into this PR (and re-enable the fqdn-specific test)?

Done

@primeos Sorry, I mistakenly assumed myhostname would also return the fqdn. Seems it doesn't.
It does support resolving the "localhost" entries, but it's probably fine to keep those in /etc/hosts too.

I'm also sorry, I was pretty angry (not at you but this PR cost me multiple hours to research and test and this topic is IMO pretty annoying to deal with as every distribution does it's own thing and there isn't good documentation for this) and replied a bit too "soon".

I decided to look into nss-myhostname and it does seem like the cleanest solution but unfortunately I don't think it's really feasible and this approach has better chances at succeeding. I added my findings about nss-myhostname in the PR description ("Future improvements:") as well as one potential drawback of my approach (conflict with DNS- I'm not sure if / how many users rely on this).

@primeos
Copy link
Member Author

primeos commented May 22, 2020

@GrahamcOfBorg test hostname

2

@flokli
Copy link
Contributor

flokli commented May 22, 2020

currently staging fails to build due to #86954 (comment). I poked in that issue.

Anyways, as this is mostly a module update, and not a world rebuild, it can probably target master just fine.

@primeos
Copy link
Member Author

primeos commented May 22, 2020

  1. FQDN mapping. I use DNS. I think FQDN mapping predates the use of DNS. Even when I use /etc/hosts I would map it to the external address.

@zimbatm do you rely on the FQDN being mapped to the public IP address or would 127.0.0.1/::1 also be ok? Currently this is probably the main potential issue of this PR as this could cause regressions to a small subset of users (hopefully an extremely small subset since this is an edge case and we already map the hostname). It is possible to override or even disable the FQDN entries, but that would require a manual change.

Anyways, as this is mostly a module update, and not a world rebuild, it can probably target master just fine.

@flokli I also think that master should be fine but I was hoping that staging might provide some extra security against unexpected bugs (hopefully none but one never knows), e.g. if a test fails that isn't channel critical. But I can also target master if you want.

@flokli
Copy link
Contributor

flokli commented May 23, 2020

We can't know all "public IP adresses" at configuration time anyways, as they can also be provided dynamically via DHCP or router advertisements.

Mapping hostname and fqdn to 127.0.0.1 (anything in 127.0.0.0/8 really) and ::1 should be fine. That's also what nss-myhostname is mostly doing (except they map to 127.0.0.2 and don't handle the fqdn in all cases)

In NixOS, we already do have nss-resolve enabled if services.resolved.enable = true;. In these cases, you'd see hosts: … resolve in /etc/nsswitch.conf.

However, I still think keeping nss-myhostname is useful - There's valid reasons on why people don't want to use resolved as their resolver, for example, when it comes to routing traffic, including DNS requests via a VPN. In these cases, people should still be able to look up their local hostname and fqdn without having to bake it into /etc/hosts.

@primeos
Copy link
Member Author

primeos commented May 23, 2020

We can't know all "public IP adresses" at configuration time anyways, as they can also be provided dynamically via DHCP or router advertisements.

@flokli agreed, this was only about potential regressions, I didn't intend to implement something for this use case.

I've also documented this change in the changelog and squashed the commits. From my point of view this should be ready now.

nixos/doc/manual/release-notes/rl-2009.xml Outdated Show resolved Hide resolved
nixos/doc/manual/release-notes/rl-2009.xml Show resolved Hide resolved
@primeos primeos force-pushed the etc-hosts-fqdn-fix branch 2 times, most recently from dee70d0 to 7195c84 Compare May 24, 2020 21:33
@delroth
Copy link
Contributor

delroth commented May 31, 2020

My 2 cents since I'm now rebasing my network on top of this change: it's very common to have to refer to the machine's FQDN in various config locations. Previously I was able to just pull in config.networking.hostName (which was defined to be the FQDN on my machines), now I have to do the concat manually in 10+ locations throughout my config.

The release notes also don't really mention how to migrate configs to comply to the new requirements (e.g. the rlnotes item doesn't even list networking.domain).

@primeos
Copy link
Member Author

primeos commented May 31, 2020

[...] now I have to do the concat manually in 10+ locations throughout my config.

Good point. Maybe we should add an option like networking.fqdn with readOnly = true; to simplify this?

The release notes also don't really mention how to migrate configs to comply to the new requirements (e.g. the rlnotes item doesn't even list networking.domain).

Right, seems like we forgot this in the second item. I can open a followup PR (probably in a few days, if no one else does this before then).

@flokli
Copy link
Contributor

flokli commented May 31, 2020

Yeah, networking.fqdn might be an option - but I wonder what should be in there if networking.domain isn't set.

Regarding the release notes, the paragraph before that already explicitly mentions networking.domain and how things are combined, but it probably wouldn't hurt to add a specific action item to paragraph below.

@primeos
Copy link
Member Author

primeos commented May 31, 2020

Yeah, networking.fqdn might be an option - but I wonder what should be in there if networking.domain isn't set.

Good question. I initially thought the hostname and optionally (i.e. if set) the domain. Thinking more about it I would set it to null if the domain is null (then the type matches networking.domain and it should be obvious right away if the domain isn't set (the hostname cannot be null)). Does that sound ok? Another alternative might be to append e.g. .localdomain if networking.domain is null (a bit similar to nss-myhostname) but I don't think that's a good idea (there isn't a standard/definition for a default suffix AFAIK and we currently do not add such a fallback FQDN to /etc/hosts).

@flokli
Copy link
Contributor

flokli commented May 31, 2020

Yeah, I'd personally prefer null over ".local" / ".localdomain", as it is something the user notices during config evaluation vs. "suddenly a weird fqdn appearing in various places".

Looking further to nixpkgs, and how networking.domain is used in some places, we might want to update some of the logic there (see the matomo and dokuwiki modules, for example)

@vcunat
Copy link
Member

vcunat commented Jun 1, 2020

I'd certainly most likely want to avoid .local (as default, at least), as that's a reserved top-level for the mDNS protocol.

@zhenyavinogradov
Copy link
Contributor

It looks like the regex does not allow to set the hostName to an empty string anymore, while the description implies that it is still supported

primeos added a commit to primeos/nixpkgs that referenced this pull request Jun 3, 2020
This fixes a regression from 993baa5 which requires
networking.hostName to be a valid DNS label [0].
Unfortunately we missed the fact that the hostnames may also be empty,
if the user wants to obtain it from a DHCP server. This is even required
by a few modules/images (e.g. Amazon EC2, Azure, and Google Compute).

[0]: NixOS#76542 (comment)
@primeos primeos mentioned this pull request Jun 3, 2020
10 tasks
@primeos
Copy link
Member Author

primeos commented Jun 3, 2020

@zhenyavinogradov thanks, I missed that case :o I've opened #89407 to fix this.

Note to self: I should also try to add a test case for empty hostnames (I'll do that together with the documentation improvements and read-only fqdn option as this is probably a bit time consuming since we should use a DHCP server as well - help is obviously welcome if someone is interested).

@delroth
Copy link
Contributor

delroth commented Jun 5, 2020

One more issue I noticed: postfix seems to rely on gethostname() returning a fqdn, and now doesn't infer the correct myhostname/mydomain anymore:

$ hostname -f
chaos.delroth.net
$ postconf -d
...
mydomain = localdomain
myhostname = chaos.localdomain
...

This is documented in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=214741 where it seems like at some point Debian was carrying a custom patch. I can't find that patch in their current version though... wonder if it got lost or something.

https://git.launchpad.net/postfix/commit/?id=ebe0c0010f461ed5986a09ea686f191d601a4fd7 seems like the patch they were carrying in the past.

@Izorkin
Copy link
Contributor

Izorkin commented Jun 5, 2020

@delroth after this update, my mail works fine. I use the parameter myhostname - http://www.postfix.org/postconf.5.html#myhostname

https://git.launchpad.net/postfix/commit/?id=ebe0c0010f461ed5986a09ea686f191d601a4fd7 seems like the patch they were carrying in the past.

Checked with this patch, postconf -d checking correctly.
But my hostname parameter is different for mail and system.

postconf -d
mydomain = _my_domain_.pw
myhostname = mail._my_domain_.pw

hostname --fqdn
mail._my_domain_.pw

Is this fix needing?

@flokli
Copy link
Contributor

flokli commented Jun 5, 2020

If it's a patch debian still applies, we can definitely fetchpatch https://git.launchpad.net/postfix/patch/?id=ebe0c0010f461ed5986a09ea686f191d601a4fd7 - feel free to file a PR doing this.

If debian doesn't apply this anymore, we should research why they don't anymore - maybe it already landed in upstream one way or another, and we only need to bump the package.

@vcunat
Copy link
Member

vcunat commented Jun 5, 2020

It's been such a long time – I'm puzzled why it hasn't been addressed upstream.

@grahamc
Copy link
Member

grahamc commented Jun 19, 2020

I wonder if we could first deprecate and then break? 993baa5 is a bit heavy-handed, and might be a major breaking change for an org.

@vcunat
Copy link
Member

vcunat commented Jun 19, 2020

Deprecate... meaning to throw an eval-time warning that some stuff won't work correctly? (when hostname contains dots)

EDIT: I hope not something crazy like networking.dontValidateHostname = true;

@primeos
Copy link
Member Author

primeos commented Jun 19, 2020

993baa5 is a bit heavy-handed, and might be a major breaking change for an org.

Not sure if it's really that much of a problem considering our other braking changes and that it should be pretty easy to find and revert that commit if necessary. But then again I'm also a bit concerned as some applications might still make wrong assumptions on Linux (e.g. Bash, ZSH, and possibly Postfix where mentioned here). And technically there aren't any strict requirements for the hostname apart from the maximum length, which is probably the biggest problem.

If we deem it necessary/useful I wouldn't mind turning the evaluation error into a warning (actually I don't really have a preference here).

@flokli
Copy link
Contributor

flokli commented Jun 19, 2020

I kinda agree with @primeos here. It's in the release notes, it produces a meaningful error during configuration, and most people set the fqdn in their postfix/apache/… configurations too anyways.

@primeos
Copy link
Member Author

primeos commented Jul 27, 2020

FYI: The discussion if we should require networking.hostName to be a "valid DNS label" (no dots, etc.) is now being re-evaluated in #94011 (PR: #94022).

primeos added a commit to primeos/nixpkgs that referenced this pull request Oct 10, 2020
Since NixOS#76542 this workaround is required to use a FQDN as hostname. See
NixOS#94011 and NixOS#94022 for the related discussion. Due to some
potential/unresolved issues (legacy software, backward compatibility,
etc.) we're documenting this workaround [0].

[0]: NixOS#94011 (comment)
jonringer pushed a commit that referenced this pull request Oct 10, 2020
Since #76542 this workaround is required to use a FQDN as hostname. See
#94011 and #94022 for the related discussion. Due to some
potential/unresolved issues (legacy software, backward compatibility,
etc.) we're documenting this workaround [0].

[0]: #94011 (comment)
jonringer pushed a commit to jonringer/nixpkgs that referenced this pull request Oct 10, 2020
Since NixOS#76542 this workaround is required to use a FQDN as hostname. See
NixOS#94011 and NixOS#94022 for the related discussion. Due to some
potential/unresolved issues (legacy software, backward compatibility,
etc.) we're documenting this workaround [0].

[0]: NixOS#94011 (comment)

(cherry picked from commit 4a600af)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants