Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/tests/systemd: Fix x-initrd-mount flakiness #67798

Merged
merged 1 commit into from Aug 31, 2019

Conversation

aszlig
Copy link
Member

@aszlig aszlig commented Aug 30, 2019

It turns out that checking for the last mount time of an ext4 file system isn't a very reliable way to check whether the file system was properly unmounted.

When creating that test in the first place (88530e0), I was reluctant to inspect the file system when the VM is down and was searching for a way to check for a clean unmount after the file system was mounted again to make sure we don't need to create a 512 MB raw image on the host.

Fortunately however, when converting from qcow2, qemu-img actually writes a sparse file, so for most file systems (that is, file systems supporting sparse files) this shouldn't waste a lot of disk space.

So when investigating the flakiness, I found that whenever the test is failing, the unmount of /test-x-initrd-mount was done before the final step during which systemd remounts+unmounts all the remaining file systems.

I haven't investigated why this is the case, but the test is a regression test for #35268, which actually didn't unmount the file system at all, so really all we need to take care here is whether the unmount has happened and not how.

To make sure that checking the filesystem state is enough for this, I temporarily replaced the $machine->shutdown call with $machine->crash and verified that the file system state is not clean.

Fixes: #67555

It turns out that checking for the last mount time of an ext4 file
system isn't a very reliable way to check whether the file system was
properly unmounted.

When creating that test in the first place (88530e0),
I was reluctant to inspect the file system when the VM is down and was
searching for a way to check for a clean unmount *after* the file system
was mounted again to make sure we don't need to create a 512 MB raw
image on the host.

Fortunately however, when converting from qcow2, qemu-img actually
writes a sparse file, so for most file systems (that is, file systems
supporting sparse files) this shouldn't waste a lot of disk space.

So when investigating the flakiness, I found that whenever the test is
failing, the unmount of /test-x-initrd-mount was done *before* the final
step during which systemd remounts+unmounts all the remaining file
systems.

I haven't investigated why this is the case, but the test is a
regression test for NixOS#35268, which
actually didn't unmount the file system *at* *all*, so really all we
need to take care here is whether the unmount has happened and not
*how*.

To make sure that checking the filesystem state is enough for this, I
temporarily replaced the $machine->shutdown call with $machine->crash
and verified that the file system state is "not clean".

Signed-off-by: aszlig <aszlig@nix.build>
Fixes: NixOS#67555
@aszlig aszlig requested a review from flokli August 30, 2019 22:29
@aszlig
Copy link
Member Author

aszlig commented Aug 30, 2019

@GrahamcOfBorg test systemd

@aszlig
Copy link
Member Author

aszlig commented Aug 30, 2019

The test failure on aarch64-linux is unrelated to this and the failing subtest was introduced in 8e923df (#66482).

@disassembler disassembler merged commit d7c7fc4 into NixOS:master Aug 31, 2019
@disassembler
Copy link
Member

as the failure is unrelated and already in master, merging.

dtzWill pushed a commit to dtzWill/nixpkgs that referenced this pull request Sep 11, 2019
It turns out that checking for the last mount time of an ext4 file
system isn't a very reliable way to check whether the file system was
properly unmounted.

When creating that test in the first place (88530e0),
I was reluctant to inspect the file system when the VM is down and was
searching for a way to check for a clean unmount *after* the file system
was mounted again to make sure we don't need to create a 512 MB raw
image on the host.

Fortunately however, when converting from qcow2, qemu-img actually
writes a sparse file, so for most file systems (that is, file systems
supporting sparse files) this shouldn't waste a lot of disk space.

So when investigating the flakiness, I found that whenever the test is
failing, the unmount of /test-x-initrd-mount was done *before* the final
step during which systemd remounts+unmounts all the remaining file
systems.

I haven't investigated why this is the case, but the test is a
regression test for NixOS#35268, which
actually didn't unmount the file system *at* *all*, so really all we
need to take care here is whether the unmount has happened and not
*how*.

To make sure that checking the filesystem state is enough for this, I
temporarily replaced the $machine->shutdown call with $machine->crash
and verified that the file system state is "not clean".

Signed-off-by: aszlig <aszlig@nix.build>
Fixes: NixOS#67555
(cherry picked from commit d7c7fc4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nixos/tests/systemd.nix is broken
2 participants