Do not kill udev during boot #40230

ngortheone · 2018-05-09T15:48:54Z

So main culprit was udevadm control --exit || true during initrd stage. I don't know what kind of fix was that, but it is definitely outdated, and in generally killing udev is not a good idea.
Basically what was happening - when nvme device did not appear immediately and udev was killed there were no links to device created in /dev/device-by* that was causing inability to mount root volume.

NVME devices may appear with some delay, so I had to add waitDevice to growpartition script, to ensure that root volume is resized even when device is delayed.

Tested on m5 and m4 instances.

…ot volume

xeji · 2018-05-09T16:51:53Z

nixos/modules/system/boot/stage-1-init.sh

+        [ $try -ne 0 ]
+    fi
+}
+


This file looks like you just moved waitDevice() around without making any real change. What's the point?

yes, i needed it to be declared before @postDeviceCommands@ so I can use the func in growPartition

ngortheone · 2018-05-09T17:17:58Z

@edolstra @dezgeg

dezgeg · 2018-05-09T17:22:40Z

Looks good.

FWIW the original problem is probably related to udev having the block device open for probing filesystems on it (like described at https://groups.google.com/forum/#!topic/scylladb-dev/u87yHgo3ylU).

copumpkin · 2018-05-09T18:52:02Z

@dezgeg so is not killing udev likely to bite us in other subtle ways later? It definitely seemed to be hurting our NVMe discovery on EC2, but it's unclear to me how to make sure the other issue doesn't pop up again. We already ask it to settle at various points along the way.

dezgeg · 2018-05-09T18:56:56Z

Yeah, probably the settles that are added are now enough.

xeji · 2018-05-09T19:16:25Z

Does this need a backport to 18.03 as well?

edolstra · 2018-05-10T10:15:12Z

Looks good to me.

I would hold off on backporting to 18.03 until we feel confident that the race with udev that prompted the addition of the udevadm control --exit is really gone.

dezgeg · 2018-05-10T13:49:26Z

I would think the udev blkid race could be avoided by wrapping certain parts in udevadm control --stop-exec-queue and udevadm control --start-exec-queue as was proposed in the scylladb thread (and in fact would be the only way to fix the similar races that sometimes happen in NixOS installer tests).

dezgeg · 2018-05-12T00:01:02Z

For reference (to confirm that I am not talking crazy stuff :), these are the occasional failures I see in the installer tests which I believe to be related to the udev race: https://nix-cache.s3.amazonaws.com/log/rwfcwv4p18sbbnd7pna2r2m9aqz7s2m5-vm-test-run-installer-luksroot.drv

machine: must succeed: parted --script /dev/vda -- mkpart primary linux-swap 50M 1024M
machine# [   35.973121] systemd[1]: Started Networking Setup.
machine# [   35.978131] systemd[1]: Starting Extra networking commands....
machine# [   36.549789]  vda: vda1 vda2
machine# Error: Partition(s) 2 on /dev/vda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.

Ihor Antonov added 2 commits May 2, 2018 11:16

Fix kernel panic on ec2 kvm instances caused by io timeout on nvme ro…

2370440

…ot volume

Fix kernel crash caused by absent root device

1dd7057

GrahamcOfBorg added 6.topic: nixos 8.has: module (update) 10.rebuild-darwin: 0 10.rebuild-linux: 0 labels May 9, 2018

ngortheone mentioned this pull request May 9, 2018

Do not kill udev during boot 17.09 #40233

Closed

xeji reviewed May 9, 2018

View reviewed changes

dezgeg merged commit 08ebd83 into NixOS:master May 11, 2018

dezgeg mentioned this pull request Sep 22, 2018

nixos/tests/installer: prevent race between parted and udev #47155

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not kill udev during boot #40230

Do not kill udev during boot #40230

ngortheone commented May 9, 2018 •

edited

Loading

xeji May 9, 2018

ngortheone May 9, 2018

ngortheone commented May 9, 2018

dezgeg commented May 9, 2018

copumpkin commented May 9, 2018

dezgeg commented May 9, 2018

xeji commented May 9, 2018

edolstra commented May 10, 2018

dezgeg commented May 10, 2018

dezgeg commented May 12, 2018

Do not kill udev during boot #40230

Do not kill udev during boot #40230

Conversation

ngortheone commented May 9, 2018 • edited Loading

xeji May 9, 2018

Choose a reason for hiding this comment

ngortheone May 9, 2018

Choose a reason for hiding this comment

ngortheone commented May 9, 2018

dezgeg commented May 9, 2018

copumpkin commented May 9, 2018

dezgeg commented May 9, 2018

xeji commented May 9, 2018

edolstra commented May 10, 2018

dezgeg commented May 10, 2018

dezgeg commented May 12, 2018

ngortheone commented May 9, 2018 •

edited

Loading