-
-
Notifications
You must be signed in to change notification settings - Fork 15.5k
nixos/systemd|filesystems: mount and evacuate /sys/fs/pstore using systemd-pstore #85073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
For some reason the dmsg's cleaned up by this are not showing up in the journal but only in |
I marked this as stale due to inactivity. → More info |
I haven't really figured out what's going on with things not showing up in the journal (and I don't really know enough about the systemd journal to easily do so), but I'm certain enough that it's not an issue caused by this PR that I feel it should be merged. |
Sorry about the review requests, I had accidentally changed identity of a commit that's part of master when splitting up the PR into commits. EDIT: Did it again. At least now I know why. |
Okay, I'm pretty sure this is the one. I will be staying far away from git and systemd for at least the next month -_- |
According to the release notes of systemd 198:
this should already be mounted automatically. I wonder what's doing this on other distros? Or does it only happen with systemd-in-initrd? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification!
nixos/modules/tasks/filesystems.nix
Outdated
in listToAttrs (map formatDevice (filter (fs: fs.autoFormat) fileSystems)); | ||
in listToAttrs (map formatDevice (filter (fs: fs.autoFormat) fileSystems)) // { | ||
# Mount /sys/fs/pstore for evacuating panic logs and crashdumps from persistent storage onto the disk using systemd-pstore. | ||
# This cannot be done with the other special filesystems because the pstore module is not loaded at that point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another comment line explaining this is usually set up by systemd in initrd, so we know we can remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly I don't understand enough about how and why special filesystems are done the way they are in NixOS so I'm not sure what kind of explanatory comment to add that wouldn't be possibly misleading. I think the current comment indicates well enough that it is like the other special filesystems, and save for the mentioned exception it should be handled the same way.
After bisecting my system configuration it does look like this PR broke something. The error messages I get from switching are:
|
You have a service mounting the persistent store already? Alternatively you can disable whatever is mounting the persistent store (if feasible, curious what that is) and leave it to this service. |
Let me see if I can figure out what is mounting pstore. I wasn't even aware of pstore until this morning! |
Interesting, it doesn't look like anything depends on it, at least from the systemd perspective:
|
The only other thing that could be going wrong is that the |
That really shouldn't be. If nothing depends on it, how did it get started?? |
@hyperfekt Is there a way to use |
Ok, after setting
and deploying, and then setting it back to A couple of things I noticed:
|
I also experience the same bug where mount-pstore fails to start. I have not attempted to fix it yet. |
I encountered the same problem as well on my machines. If I'm understanding correctly, it appears that systemd mounts pstore automatically. |
Seems like systemd added support for |
It affected me on unstable and master. On
Tried workaround: Shouldn't a new issue be opened to track this? This PR is merged and we have an issue. |
In case decision is made to revert this, I've created this #123889 |
I'm currently investigating this. |
As your PR targets master, you should be testing it/your system on master. |
I agree, I probably should have tested the activation itself in a VM instead of relying on the ISO (where no switch occurs) to show any problem. |
I can't reproduce this in the VM (pstore is not present there). And I fixed it just by |
It's been common in the past for some units to fail upon activation after a nixos-rebuild switch, which is why I wasn't concerned about avoiding every instance of it. Of course it shouldn't be happening to every user of NixOS, which it appears this does. |
For me, the failure also occurs on boot, so it's not just happening on a switch for "people with existing |
Wouldn't be better to reverse it? So you can calmly investigate it? |
I don't see any reason to revert it if the issue is solved by the new PR. In general no breakage is involved, just log noise. And if there is a solution necessary as a result of the investigation, it would be additive, not alternative. |
On the workaround that |
If you check #123902 you will see that's not what I did. |
@zhaofengli If you find out what is responsible for having /sys/fs/pstore mounted the first time, I'd be glad. I am still curious about any instance where this is the case and it's from a package instead of an individual modification. |
Finally have some time to poke at this. It appears that The machine I initially encountered the on-boot failure is an aarch64 host running a custom kernel with
On the regular kernel, pstore is built as a module:
On a machine with Audit configurations{ boot.kernelParams = [ "audit=1" "audit_backlog_limit=500" ]; security.audit = { enable = true; rules = [ "-a exit,always -S mount" ]; }; security.auditd.enable = true; # Add rule before systemd starts boot.initrd.postDeviceCommands = '' ${pkgs.audit}/bin/auditctl -a exit,always -S mount ''; } Looking at the audit logs I couldn't find a call that specifically mounted However, this doesn't explain why the initial activation will cause the failure. Remounting |
Motivation for this change
Currently limited size pstores eventually fill up and logs of any panics stop being saved, making it harder to diagnose them. This enables the systemd-pstore service, which frees them automatically and adds them to the journal.
Design
We can't mount /sys/fs/pstore with the other special file systems, because at that time the kernel hasn't loaded the pstore module yet.
systemd cannot mount virtual file systems that expose APIs, so we can't use a mount unit.
Instead we just create a
very simplesystemd service that does the job.On systems without pstore I expect the modprobe service to fail, which doesn't look nice in the journal, but I'm not aware of a good alternative.
Left for another time is making usage of the EFI variable pstore optional, which it currently is not even without this PR (this one just adds the vacuuming instead of only dumping into it).
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)