-
-
Notifications
You must be signed in to change notification settings - Fork 15.4k
nixos/rasdaemon: init with 0.6.7 #85039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7d80a8d
to
095983e
Compare
8d83494
to
949ef01
Compare
90045ff
to
d217c09
Compare
The format is:
Have never actually bothered decoding these, but a rough example would be:
Though those labels are nonsensical, but they do provide output:
Both I hope this helps a bit. |
that's mostly a note to myself, the dependencies need some config too and i should summarize them for when i get around to writing the nixos test
I wasn't sure if the content at I'll probably link to the dell file and any user supplied entries in @maxeaubrey thanks for the |
The provided examplels are not useful without models (or a mainboard file to overwrite the name, I guess? Haven't actually double checked the code, or verified how that works in practice), which is why I assume they do not get installed - they do not work as is in my experience at least. I've not seen
And here's a somewhat more useful Supermicro example.
I've not seen Dell/HP servers return more useful info than that, though that's with the caveat of only trying a limited set of Dell and HP servers. |
i consider the state of |
I marked this as stale due to inactivity. → More info |
What's missing for this to get merged? |
for just for the tests, i'm not sure, i'll see about that |
61c34b5
to
b73c012
Compare
changed almost everything a bit switched to rasdaemon to v0.6.6-18-gc329012 edac and aer tests look like they may work, except the machine features required for them are not present in the nixosTest VM the mce test seems to assume i'll try to run the tests on bare metal, though that may take me a while to get around to |
added release notes entry to the module init commit |
rebased on master where |
Often Sando's reviews are for style and consistency, but not strictly blocking. Merging of a new module usually requires a more functional review and someone familiar with the service or software in question. |
FWIW: software builds and runs. Test has the following output:
|
rebased onto master due to release notes merge conflict |
fixed typo in the ras-mc-ctl service ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing tonight (+8-10h from now) on a couple of Xeon systems and a 5th gen Ryzen. Thanks for putting this together!
# edac_core and amd64_edac should get loaded automatically | ||
# i7core_edac may not be, and may not be required, but should load successfully | ||
"edac_core" | ||
"amd64_edac" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amd64_edac here makes initrd building fail for me when I set hardware.rasdaemon.testing = true
. AFAICT, CONFIG_EDAC_AMD64 isn't enabled on the standard NixOS kernels, so initrd building fails when it tries to incorporate it. Tried with both a 4.19 and a 5.10 kernel, both failed in the same way.
Doing another build run now with a kernel patch to enable EDAC_AMD64, to see if that fixes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem to have fixed it :( AFAICT amd64_edac just doesn't get built for me, and as a result building the initrd fails down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd, besides my generated hardware-configuration.nix
having kvm-amd
, i don't think there's anything AMD specific in my config (nixos-unstable on ryzen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If hydra doesn't complain while building, I think it's fine to assume my system is wonky somehow. This is only relevant when building for fault injection testing, so, meh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, at some point i had to change the module name from amd64_edac_mod
to amd64_edac
i'm not sure when/where that changed, but maybe it's still the old name for you?
Checked linux 4.19, 5.10 and 5.14:
In both cases, the errors are benign: rasdaemon is correctly reporting that some kernels are missing some features that it could monitor, if you enabled them. But it keeps logging everything else it can. So, LGTM on my machines. It can see a whole bunch of hardware stuff, maps my DIMM layout correctly, and generally works as well as the running kernel allows it to. I look forward to seeing this merged so I can enable it on my servers :) |
and passthru test in the package
needed for mce-test
resolved merge conflict due to release notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been using this module for over half a year and it LGTM.
I think it needs to be @SuperSandro2000 to approve as they requested changes before ? |
We don't handle that like this. I don't have the time to double check if my suggestions where applied all the time. If someone else thinks the PR is in a good state and can be merged he/she can go ahead and merge it. |
Motivation for this change
closes #42592
based on #73149
Things done
packaged rasdaemon
wrote rasdaemon module
wrote rasdaemon module test
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)/nix/store/05rr2kjhn6dim1cfb1lbhxyjy5wj46pm-rasdaemon-0.6.6-21-gb4764d4 99608016
/nix/store/kshci3w68i6w8wcs3s0zxm366id7c2i9-rasdaemon-0.6.6-21-gb4764d4-dev 99675864
/nix/store/ni6m29zk553d32icvb090k512dg15yn1-rasdaemon-0.6.6-21-gb4764d4-man 4280
/nix/store/ngya58z9dmkzwba4vw0jxgm039565yl9-rasdaemon-0.6.6-21-gb4764d4-inject 34278856
/nix/store/p9xlqzy0sk1szk3rakxc8vlj8q16inqi-aer-inject-9bd5e2c 32976656
/nix/store/kpkq3h16m9mblcs62ncwyq7ansmrl4xz-mce-inject-4cbe463 32983512
/nix/store/hg0pjs961ha70s2pjhj9ssxxx21r4giz-vm-tools-5.10.30 33091824
Status
help wanted
rasdaemon
package and its module workrasdaemon.labels = " ";
rasdaemon.labels.vendor = " ";
as welledac
,aer
,mce
error injection tools packagedmce-test
packaging (translating this to something that works with nix looks tricky)vm-tools
andmce-inject
packaged