Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/lib/make-ext4-fs: always allow execution of resize2fs #86366

Closed
wants to merge 1 commit into from

Conversation

Robertof
Copy link

This PR is the result of an extensive investigation I have done when having an unusual build failure when building an SD card image for AArch64 with QEMU. Specifically, I was getting this:

building '/nix/store/q4kcsy4f1jcxxa2kc6x02rjhg8z1911y-ext4-fs.img.zst.drv'...
[...]
copying store paths to image...
copying files to image...
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
NIXOS_SD: 93738/177408 files (0.1% non-contiguous), 581365/708701 blocks
Resizing to minimum allowed size
resize2fs 1.45.5 (07-Jan-2020)
Please run 'e2fsck -f temp.img' first.
builder for '/nix/store/q4kcsy4f1jcxxa2kc6x02rjhg8z1911y-ext4-fs.img.zst.drv' failed with exit code 1

The reason for this failure is due to the combination of three different causes:

  • fsck.ext4, executed before resize2fs, is executed with the -n flag which opens the image read-only. In turn, this leads the image to have a non-updated "last check time", a timestamp present in EXT filesystems which contains the date of the last successful fsck. By default, resize2fs checks whether last_check >= last_mount_time is true -- if it isn't, it doesn't run.
  • LKL, the Linux kernel fork which hosts cptofs which we use to copy files to the target image, has introduced a regression two years ago which broke the wall clock time, making it always start from Jan 1 1970 (UNIX epoch). Before that, the wall clock time was synchronized with the host system's time.
  • On slower systems, such as an emulated one, cptofs takes a while to run which leads it to set the mount time of the image to Jan 1 1970 00:00:02. This is enough to break resize2fs, since the filesystem is created with a timestamp of Jan 1 1970 00:00:01 and the check mentioned above fails, since the last mount time is greater than the time of last check.

Thus there are two approaches to fix this:

  • Remove -n from e2fsck in favor of -y or something else. This would fix the issue as it advances the last check time, but isn't really necessary as the builder fails if fsck fails.
  • Add -f to resize2fs which forces it to skip the last time check. This is safe, because fsck is invoked immediately before and after and it would fail the builder if something is wrong. Thus, the checks are redundant (and in this case would break the build).

I have chosen to proceed with the second option.

Motivation for this change

Without the -f flag, resize2fs checks that the last time where the
filesystem was checked is greater or equal than the last time is mounted.
This currently works due to a bug in cptofs
(lkl/linux#488) and it breaks on slower systems
when the last mount time is greater than 1970-01-01 00:00:01.

To fix this, either we allow fsck to update the "last checked" timestamp
on the filesystem, which requires to replace the -n option with -y,
or we force resize2fs to skip these checks. Since we're only dealing
with an image file which is checked immediately before resizing, it is
safe to bypass this check as fsck will fail if the filesystem is unclean.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/) tested the resulting image
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Without the `-f` flag, `resize2fs` checks that the last time where the
filesystem was checked is greater or equal than the last time is mounted.
This currently works due to a bug in `cptofs`
(lkl/linux#488) and it breaks on slower systems
when the last mount time is greater than 1970-01-01 00:00:01.

To fix this, either we allow `fsck` to update the "last checked" timestamp
on the filesystem, which requires to replace the `-n` option with `-y`,
or we force `resize2fs` to skip these checks. Since we're only dealing
with an image file which is checked immediately before resizing, it is
safe to bypass this check as `fsck` will fail if the filesystem is unclean.
@misuzu
Copy link
Contributor

misuzu commented Apr 30, 2020

Related: #82718

@Robertof
Copy link
Author

@misuzu thanks! I didn't see that PR, and I think it's definitely the right long-term solution. I had many issues with cptofs so getting rid of it is definitely going to be beneficial (also because it got to 8 GiB of RAM usage in my case, which is absurd).

I think resize2fs will be happy if executed immediately after mkfs.ext4 so this probably won't be required anymore after that is merged.

(if you got two notifications for this, apologies, my fault -- I forgot to switch to the personal account)

@Robertof
Copy link
Author

This isn't needed anymore since #82718 was merged and it solved the issue in another way (by not using cptofs anymore). In the event that the image generation logic is changed again I'll reopen this, but I think it makes sense to not keep this unnecessarily open if not needed!

Thanks again @misuzu for the great work with your original patch 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants