-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/ceph: run unprivileged, use state directories, handle non-initialized clusters without config switch #72603
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with these changes, I've been meaning to refactor the module similar to this as the module in its initial shape was "just" mapping from the official systemd .service files.
This takes the module a step closer being better integrated into both systemd and nixos.
++ (map (daemonName: "d /var/lib/ceph/mgr/${cfg.global.clusterName}-${daemonName} - ceph ceph - -") cfg.mgr.daemons) | ||
++ (map (daemonName: "d /var/lib/ceph/osd/${cfg.global.clusterName}-${daemonName} - ceph ceph - -") cfg.osd.daemons); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing a map for the mon daemons, they follow the same pattern as the other daemons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are somewhat redundant with the StateDirectory
directives in the individual units, which should be created by systemd when starting the units.
However, as some work needs to be done to initialize the cluster before these units can be started, just relying on StateDirectory
isn't enough, as the initialization tools would fail otherwise.
I didn't want to duplicate everything from StateDirectory
- happy for better suggestions w.r.t. cluster initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the mon daemons are the most important daemons in the cluster. They are the first that need to be initialized in a cluster, so I think that we sadly need to duplicate everything from StateDirectory in tmpfiles too.
@lejonet you're fine with merging this in? |
Does this mean mean any user facing changes if say the directories are nukes and the daemon restarted? Particularly I'd like to know whether recreation is now tied to activation rather than unit restart. |
Both the The The initialization process currently is a very manual thing by itself, and apparently only "documented" by the nixos vm tests. You first switch to a generation with available, but disabled ceph services, do the initialization, start services, then switch to a generation autostarting these services. We might want to change this in the future, to provide a better out-of the box experience. One option would be to have some generators generating individual ceph-osd services that do the initialization and startup depending on the actual disks being attached, @srhb I'm fine with also removing the tmpfile rules and adding back the mkdir -p commands in the ceph tests doing the initialization, if you prefer that. |
First and foremost, the initialisation process is not documented by us because the tests use the manual deploy from here. Secondly, its mainly out of convenience that we disable autostart from the beginning, so that the testlog isn't spammed with failed startups of daemons before we've even started the whole initialization process (honestly tho, the comment could be better at explaining that to be completely fair). Its not a requirement to do so, just makes the log easier to read for the one running the test. Thirdly, the way you initialize stuff in a ceph cluster highly depends on how the sysadmins want to do it, there is at least 3 different ways to do it (with the manual one being the one that is easiest to put in a test, because it doesn't depend on a specific ceph tool or other automation tool). The ceph module was written to accomodate the running of a cluster, and leaving all the initialization up to the sysadmins to fix before enabling the running of daemons.
Sadly, the initialization is quite stateful so its most likely not possible for us to automate the initialization process ourselves, especially as cephx (auth) is default nowdays and its not recommended to run a cluster without it (which is what would be required for us to be able to automate the initialization so to speak).
|
We seem to be relying on those being present during runtime anyways.
…pass extraServiceConfig Don't pass user and group to ceph, and rely on it to drop ceps, but let systemd handle running it as the appropriate user. This also inlines the extraServiceConfig into the makeService function, as we have conditionals depending on daemonType there anyways. Use StateDirectory to create directories in /var/lib/ceph/${daemonType}/${clusterName}-${daemonId}. There previously was a condition on daemonType being one of mds,mon,rgw or mgr. We only instantiate makeServices with these types, and "osd" was special. In the osd case, test examples suggest it'd be in something like /var/lib/ceph/osd/ceph-${cfg.osd0.name} - so it's not special at all, but exactly like the pattern for the others. During initialization, we also need these folders, before the unit is started up. Move the mkdir -p commands in the vm tests to the line immediately before they're required.
This prevents services to be started before they're initialized, and renders the `systemd.targets.ceph.wantedBy = lib.mkForce [];` hack in the vm tests obsolete - The config now starts up ceph after a reboot, too. Let's take advantage of that, crash all VMs, and boot them up again.
I agree automating parts of the initialization is tricky - reading up on the manual installation guide, they document the state directories should be created during initialization, so I'm fine with having them in our initialization aswell, instead of in a tmpfiles rule. I tinkered a bit with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its much cleaner to ensure that a daemon doesn't start when not initialized by actually coupling the unit with its state directory than simply let the daemon crash.
Also as @flokli stated in a discussion on IRC, systemctl status will explicitly tell the user now why it didn't start, instead of hoping that the user can deduce that from the errors.
@GrahamcOfBorg test ceph-single-node ceph-multi-node |
@GrahamcOfBorg test ceph-single-node ceph-multi-node |
…e bootstraping method
I added some more cleanup to the ceph derivation itself in #73187. |
This looks good to me! I am a little concerned about the tmpfiles setup, because I've previously been bitten by the impedance mismatch in initialization in systemd units versus initialization during activation. That said, this isn't the first module to start doing it this way, and I'm not sure I actually have a coherent criticism of it yet, so let's just see how it works out. I might be able to find time to deploy this to our test cluster soonish, but probably not in time to hold up the merge. |
Motivation for this change
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)Notify maintainers
cc @