Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nscd: DynamicUser and other cleanups #64268

Merged
merged 7 commits into from Jul 22, 2019

Conversation

jameysharp
Copy link
Contributor

Motivation for this change

I was looking at which users and groups have a null value for uid or gid, and there was only one in a default-configuration boot.isContainer build of NixOS: nscd. As far as I can tell, that service can use systemd's DynamicUser directive instead.

While I was checking that, I found that some very old code in this module doesn't need to exist any more, if I'm not mistaken. So I've included separate commits for those dead-code removals.

Things done

I ran:

  • nix-build release.nix -A tests.login.x86_64-linux (passed)
  • nix-build release.nix -A tests.systemd.x86_64-linux (passed except the "file system with x-initrd.mount is not unmounted" test fails for me, but it does that without my patches too)

I also built the following expression against this branch and booted it with systemd-nspawn:

(import <nixpkgs/nixos> {
  configuration = { config, pkgs, ... }: {
    boot.isContainer = true;
    services.mingetty.autologinUser = "root";
  };
}).config.system.build.toplevel

Within that container, I verified that:

  • getent can talk to nscd,
  • nscd is running under a dynamic UID,
  • /run/nscd and its contents are owned by that UID,
  • /var/db/nscd doesn't exist but everything still works,
  • and the generated users-groups.json has no "uid" or "gid" that is null.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

This postStart step was introduced on 2014-04-24 with the comment that
"Nscd forks into the background before it's ready to accept
connections."

However, that was fixed upstream almost two months earlier, on
2014-03-03, with the comment that "This, along with setting the nscd
service type to forking in its systemd configuration file, allows
systemd to be certain that the nscd service is ready and is accepting
connections."

The fix was released several months later in glibc 2.20, which was
merged in NixOS sometime before 15.09, so it certainly should be safe to
remove this workaround by now.
Previously this module created both /var/db/nscd and /run/nscd using
shell commands in a preStart script. Note that both of these paths are
hard-coded in the nscd source. (Well, the latter is actually
/var/run/nscd but /var/run is a symlink to /run so it works out the
same.)

/var/db/nscd is only used if the nscd.conf "persistent" option is turned
on for one or more databases, which it is not in our default config
file. I'm not even sure persistent mode can work under systemd, since
`nscd --shutdown` is not synchronous so systemd will always
unceremoniously kill nscd without reliably giving it time to mark the
databases as unused. Nonetheless, if someone wants to use that option,
they can ensure the directory exists using systemd.tmpfiles.rules.

systemd can create /run/nscd for us with the RuntimeDirectory directive,
with the added benefit of causing systemd to delete the directory on
service stop or restart. The default value of RuntimeDirectoryMode is
755, the same as the mode which this module was using before.

I don't think the `rm -f /run/nscd/nscd.pid` was necessary after NixOS
switched to systemd and used its PIDFile directive, because systemd
deletes the specified file after the service stops, and because the file
can't persist across reboots since /run is a tmpfs. Even if the file
still exists when nscd starts, it's only a problem if the pid it
contains has been reused by another process, which is unlikely. Anyway,
this change makes that deletion even less necessary, because now systemd
deletes the entire /run/nscd directory when the service stops.
nscd doesn't create any files outside of /run/nscd unless the nscd.conf
"persistent" option is used, which we don't do by default. Therefore it
doesn't matter what UID/GID we run this service as, so long as it isn't
shared with any other running processes.

/run/nscd does need to be owned by the same UID that the service is
running as, but systemd takes care of that for us thanks to the
RuntimeDirectory directive.

If someone wants to turn on the "persistent" option, they need to
manually configure users.users.nscd and systemd.tmpfiles.rules so that
/var/db/nscd is owned by the same user that nscd runs as.

In an all-defaults boot.isContainer configuration of NixOS, this removes
the only user which did not have a pre-assigned UID.
These options were being set to the same value as the defaults that are
hardcoded in nscd. Delete them so it's clear which settings are actually
important for NixOS.

One exception is `threads 1`, which is different from the built-in
default of 4. However, both values are equivalent because nscd forces
the number of threads to be at least as many as the number of kinds of
databases it supports, which is 5.
@arianvp
Copy link
Member

arianvp commented Jul 4, 2019

On first glance, this looks fine to me.

@@ -67,6 +55,9 @@ in
serviceConfig =
{ ExecStart = "@${pkgs.glibc.bin}/sbin/nscd nscd";
Type = "forking";
User = "nscd";
DynamicUser = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means that systemd will no longer try to use nscd to check the dynamic user? How do they implement that?

Copy link
Member

@arianvp arianvp Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, DynamicUser should only work when nscd is on, as nscd dispatches to the correct nss_systemd.so . So this indeed sounds like something that is not gonna work, given that nscd is not running when nscd is starting up :P

However, note I think that nscd bypasses delegating to nscd when doing any nss related syscalls itself (for good reason; otherwise it would end up in an endless loop), so maybe all the nss related syscalls in the nscd binary hit the same codepath as requests from others programs that go through the nscd socket? Because in that case, this will work.

Non of this is documented so we could find out by reading the C source code... Or just disable DynamicUser for nscd so that we're on the safe side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glibc always has to be able to answer passwd queries and such whether nscd is running or not. How does it know? It tries opening the socket that nscd listens on, and if that fails, it does the lookup itself. (There's also a glibc-internal function which nscd itself calls to prevent even checking for the socket, as you noted.)

So this works fine: systemd does the lookup of a user and group named nscd before it has started the service, the socket doesn't work, so glibc falls back to having libnss_files.so handle the query in-process. If you want to configure a static user for nscd, you can do it in /etc/passwd and systemd will find it.

Also: any code in the systemd package doesn't need nscd to find its own nsswitch modules, because they're already in its library path. So without nscd, systemd would only get the wrong answers for these passwd/group lookups on systems configured with nsswitch modules that are neither the glibc builtins nor systemd's extras.

That means the real constraint here is that you can't override this DynamicUser setting by defining an nscd user in LDAP, Google OS Login, etc. Maybe that's worth documenting, but it's really a nonsense thing to even try doing...

This does make me want to try getting the username nscd on some large LDAP-based network somewhere and see how much I can break though. 😈

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Any service started under DynamicUser must have either libnss_systemd.so in its LD_LIBRARY_PATH or have access to the nscd socket in order for DynamicUser to work.

My worry was because nscd itself doesn't access the socket it wouldn't work. But obviously nscd has libnss_systemd.so in LD_LIBRARY_PATH so it will indeed work.
So indeed, it seems we can leave DynamicUser = true with no issues.

On a related note, I think you can drop the User = nscd option by the way. Because the service name is nscd.service and DynamicUser = true, systemd already automatically sets User = nscd

The user and group name to use may be configured via
User= and Group= (see above). If these options are not used and
dynamic user/group allocation is enabled for a unit, the name of
the dynamic user/group is implicitly derived from the unit name. If
the unit name without the type suffix qualifies as valid user name
it is used directly, otherwise a name incorporating a hash of it is
used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about removing User, thanks! I've retested with that change and indeed the selected username is still "nscd", as expected.

Thanks to @arianvp for pointing out that when DynamicUser is true,
systemd defaults the value of User to be the name of the unit, which in
this case is already "nscd".
@jameysharp
Copy link
Contributor Author

I just tried running nixos/tests/ldap.nix and my patches break that test somehow. ☹️ So let's not merge this yet.

Those tests didn't instantiate at all on master; I've submitted a fix for that in #64387. But after fixing that, those tests all pass on master but two of them fail on my branch. In particular, running this in the test driver:

print($client2->execute('id -u test-ldap-user'))

returns:

client2: running command: id -u test-ldap-user
client2# id: ‘test-ldap-user’: no such user
client2: exit status 1
(0.01 seconds)

So the client configuration where users.ldap.daemon.enable = false doesn't manage to look up a passwd entry that's in LDAP, but the client where the LDAP daemon is enabled succeeds.

The LD_LIBRARY_PATH for client2's nscd looks okay to me:

> print($client2->execute('xargs -n1 -0a /proc/$(pgrep nscd)/environ | grep LD_LIBRARY_PATH'))
...
LD_LIBRARY_PATH=/nix/store/hach8csknszq07jfv8bdgqhgwmj7bcz7-systemd-242/lib:/nix/store/glbcpzjy98dmafzaws3wyhysns85y1x1-nss_ldap-265/lib

So I'm... puzzled.

@jameysharp
Copy link
Contributor Author

Okay, I got something vaguely useful from:

print($client2->execute('/nix/store/.../bin/strace -f -p $(pgrep nscd) & sleep 1; getent passwd test-ldap-user; kill -INT %1'))

Specifically:

client2# [pid   753] openat(AT_FDCWD, "/etc/ldap.conf", O_RDONLY) = -1 EACCES (Permission denied)

So apparently when nscd is run as root and drops privileges due to server-user, its threads can retain access to root-only files like /etc/ldap.conf that it can't read if it's started as the low-privilege user to begin with?

I want to investigate this a bit further, but if nscd absolutely must be started as root, then I think I'll have to drop the DynamicUser patch for that reason.

NixOS usually needs nscd just to have a single place where
LD_LIBRARY_PATH can be set to include all NSS modules, but nscd is also
useful if some of the NSS modules need to read files which are only
accessible by root.

For example, nixos/modules/config/ldap.nix needs this when
  users.ldap.enable = true;
  users.ldap.daemon.enable = false;
and users.ldap.bind.passwordFile exists. In that case, the module
creates an /etc/ldap.conf which is only readable by root, but which the
NSS module needs to read in order to find out what LDAP server to
connect to and with what credentials.

If nscd is started as root and configured with the server-user option in
nscd.conf, then it gives each NSS module the opportunity to initialize
itself before dropping privileges. The initialization happens in the
glibc-internal __nss_disable_nscd function, which pre-loads all the
configured NSS modules for passwd, group, hosts, and services (but not
netgroup for some reason?) and, for each loaded module, calls an init
function if one is defined. After that finishes, nscd's main() calls
nscd_init() which ends by calling finish_drop_privileges().

There are provisions in systemd for using DynamicUser with a service
which needs to drop privileges itself, so this patch does that.
@jameysharp
Copy link
Contributor Author

Okay, I fixed it! Today I learned that systemd has an option for using DynamicUser but still starting the service as root and letting the service drop privileges when it's ready to. Now all the LDAP tests pass, as do the other tests I tried previously.

So I think this PR is ready to merge now.

jameysharp added a commit to jameysharp/nixpkgs that referenced this pull request Jul 10, 2019
When mutableUsers = false and all users and groups have assigned
UID/GIDs, we can precompute exactly what update-users-groups.pl would
output for /etc/passwd and /etc/group. In those cases we don't need to
run it at activation time; we can generate the right files at build time
and just symlink them into place.

Note that most NixOS configs can't currently trigger this optimization,
because normally everyone uses the nscd module, which defines a user
with a null UID. I've separately proposed fixes for that issue in pull
request NixOS#64268.

There's no point generating /etc/shadow this way because the generated
files are already world-readable in the Nix store, but that isn't a big
deal as long as suitably strong passwords, salts, and hashes are used.

That said, for users with 'passwordFile' or 'password' set, we can't
construct the password hash at build time. But we can limit the work at
activation time to building /etc/shadow, and only for users who need it.

Tested by instantiating a minimal configuration.nix with:

    boot.isContainer = true;
    services.nscd.enable = false;
    users.mutableUsers = false;
    users.users.root.initialHashedPassword = "";

and running the resulting container. It builds and boots fine, and I can
log in as root without giving a password, as expected.

I also tested with several permutations of users with 'password',
'passwordFile', etc set.
@arianvp
Copy link
Member

arianvp commented Jul 12, 2019

Okay, I fixed it! Today I learned that systemd has an option for using DynamicUser but still starting the service as root and letting the service drop privileges when it's ready to. Now all the LDAP tests pass, as do the other tests I tried previously.

Great! that looks very handy. Could you perhaps add a comment to that ExecStart line explaining what ! does very briefly? Just for the concerned reader?

Also, LGTM

@jameysharp
Copy link
Contributor Author

Good call @arianvp; I've added a commit with a little detail about why it uses the combination of DynamicUser and "!".

Now I just need somebody to merge it 😅

@jameysharp jameysharp mentioned this pull request Jul 15, 2019
10 tasks
Copy link
Member

@abbradar abbradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice set of changes. I'll test this on my machine.

@GrahamcOfBorg test login systemd

@abbradar
Copy link
Member

Seems you can't order our trusty bot around from a review.

@GrahamcOfBorg test login systemd

@jameysharp
Copy link
Contributor Author

@abbradar, thanks for reviewing!

FWIW: tests.systemd failed on aarch64-linux for the bot in the same way that it does on my machine for x86_64-linux. For me, it fails with or without these patches. But it doesn't fail for the bot on x86_64-linux. I don't understand why the must fail: dumpe2fs /dev/vdb | grep -q "^Last mount time: *n/a" step instead unexpectedly succeeds sometimes, but I don't think it has anything to do with this PR, so I'm ignoring it.

@abbradar
Copy link
Member

Yeah, I tested it locally and it succeeded for me.

@abbradar
Copy link
Member

abbradar commented Jul 17, 2019

I tested it on several machines and it works nicely; @jameysharp given that you work on our LDAP support I suppose you tried random PAM modules with this too?

@jameysharp
Copy link
Contributor Author

I think you mean NSS, not PAM, right? As far as I know nscd doesn't interact with PAM at all.

I don't want to give any false impressions: I don't personally use LDAP. It just got in my way while trying to eliminate activation scripts, which I'm doing as part of something else entirely... I'm several layers down in yak shaving at this point.

So beyond running the existing NixOS LDAP tests, no, I have not tried other NSS (or PAM) modules with these patches.

That said, I expect that if the LDAP modules can work, any other NSS modules will too, because:

  • NSS modules can't be relying on creating files in /run/nscd because they're supposed to work when you aren't using nscd as well, so anything we do to that directory can't matter to them;
  • and the LDAP modules sometimes rely on reading config files as root, so if they can do that, other modules can access any resources they need at startup too.

PAM configurations can be a lot more complicated so I'd definitely be more concerned if these patches touched that subsystem.

@abbradar
Copy link
Member

abbradar commented Jul 17, 2019 via email

@abbradar abbradar merged commit a0ba42e into NixOS:master Jul 22, 2019
@jameysharp jameysharp deleted the nscd-dynamicuser branch July 28, 2019 20:07
jameysharp added a commit to jameysharp/nixpkgs that referenced this pull request Jul 28, 2019
Per review comments from @arcnmx and @infinisil on NixOS#64594, don't use
auto-detection to determine whether to prebuild /etc/passwd and
/etc/group.

Instead offer a new config option, by analogy with
users.enforceIdUniqueness: users.enforceStaticIds provides an assertion
that all users and groups have an ID statically assigned.

If this option is turned on and the assertions pass, _and_
users.mutableUsers is turned off, then /etc/passwd and /etc/group will
be generated at configuration build time instead of activation time.

This allows admins to declare their intent and get a build failure if
that intent is violated.

Adding a configuration option also provided a natural place to document
this feature (thanks again to @arcnmx for pointing that out) so this
commit adds some documentation accordingly.

I didn't want to rebase this PR but I've tested this commit both as-is
and after a merge from current git master, by instantiating a minimal
configuration.nix with several variations on:

    boot.isContainer = true;
    users.mutableUsers = false;
    users.enforceStaticIds = true;
    users.users.root.initialHashedPassword = "";

On this commit, the new assertions fire unless this option is also set:

    services.nscd.enable = false;

Since NixOS#64268 has been merged to master, after merging master onto this
branch, that option is no longer necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants