-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not kill udev during boot #40230
Do not kill udev during boot #40230
Conversation
[ $try -ne 0 ] | ||
fi | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file looks like you just moved waitDevice()
around without making any real change. What's the point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i needed it to be declared before @postDeviceCommands@
so I can use the func in growPartition
Looks good. FWIW the original problem is probably related to udev having the block device open for probing filesystems on it (like described at https://groups.google.com/forum/#!topic/scylladb-dev/u87yHgo3ylU). |
@dezgeg so is not killing udev likely to bite us in other subtle ways later? It definitely seemed to be hurting our NVMe discovery on EC2, but it's unclear to me how to make sure the other issue doesn't pop up again. We already ask it to settle at various points along the way. |
Yeah, probably the |
Does this need a backport to 18.03 as well? |
Looks good to me. I would hold off on backporting to 18.03 until we feel confident that the race with udev that prompted the addition of the |
I would think the udev blkid race could be avoided by wrapping certain parts in |
For reference (to confirm that I am not talking crazy stuff :), these are the occasional failures I see in the installer tests which I believe to be related to the udev race: https://nix-cache.s3.amazonaws.com/log/rwfcwv4p18sbbnd7pna2r2m9aqz7s2m5-vm-test-run-installer-luksroot.drv
|
Fixes #39867
So main culprit was
udevadm control --exit || true
during initrd stage. I don't know what kind of fix was that, but it is definitely outdated, and in generally killing udev is not a good idea.Basically what was happening - when nvme device did not appear immediately and udev was killed there were no links to device created in
/dev/device-by*
that was causing inability to mount root volume.NVME devices may appear with some delay, so I had to add
waitDevice
to growpartition script, to ensure that root volume is resized even when device is delayed.Tested on m5 and m4 instances.