New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stage-1: modprobe ext{2,3,4} before resizing (so resizing takes less than 45 minutes) #102171
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I noticed booting a system with an ext4 root which expanded to 5T took quite a long time (12 minutes in some cases, 43(!) in others.) I changed stage-1 to run `resize2fs -d 62` for extra debug output and timing information. It revealed the adjust_superblock step taking almost all of the time: [Fri Oct 30 11:10:15 UTC 2020] zero_high_bits_in_metadata: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 [Fri Oct 30 11:21:09 UTC 2020] adjust_superblock: Memory used: 396k/4556k (295k/102k), time: 654.21/ 0.59/ 5.13 but when I ran resize2fs on a disk with the identical content growing to the identical target size, it would only take about 30 seconds. I looked at what happened between those two steps in the fast case with strace and found: ``` 235 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=1795}, ru_stime={tv_sec=0, tv_usec=3590}, ...}) = 0 236 write(1, "zero_high_bits_in_metadata: Memo"..., 84zero_high_bits_in_metadata: Memory used: 132k/0k (72k/61k), time: 0.00/ 0.00/ 0.00 237 ) = 84 238 gettimeofday({tv_sec=1604061278, tv_usec=480147}, NULL) = 0 239 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=1802}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 240 gettimeofday({tv_sec=1604061278, tv_usec=480192}, NULL) = 0 241 mmap(NULL, 2564096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c7355000 242 access("/sys/fs/ext4/features/lazy_itable_init", F_OK) = 0 243 brk(0xf85000) = 0xf85000 244 brk(0xfa6000) = 0xfa6000 245 gettimeofday({tv_sec=1604061278, tv_usec=538828}, NULL) = 0 246 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=58720}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 247 write(1, "adjust_superblock: Memory used: "..., 79adjust_superblock: Memory used: 396k/2504k (305k/92k), time: 0.06/ 0.06/ 0.00 248 ) = 79 249 gettimeofday({tv_sec=1604061278, tv_usec=539119}, NULL) = 0 250 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=58812}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 251 gettimeofday({tv_sec=1604061279, tv_usec=939}, NULL) = 0 252 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=520411}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 253 write(1, "fix_uninit_block_bitmaps 2: Memo"..., 88fix_uninit_block_bitmaps 2: Memory used: 396k/2504k (305k/92k), time: 0.46/ 0.46/ 0.00 254 ) = 88 ``` In particular the access to /sys/fs seemed interesting. Looking at the source of resize2fs: ``` [root@ip-172-31-22-182:~/e2fsprogs-1.45.5]# rg -B2 -A1 /sys/fs/ext4/features/lazy_itable_init . ./resize/resize2fs.c 923- if (getenv("RESIZE2FS_FORCE_LAZY_ITABLE_INIT") || 924- (!getenv("RESIZE2FS_FORCE_ITABLE_INIT") && 925: access("/sys/fs/ext4/features/lazy_itable_init", F_OK) == 0)) 926- lazy_itable_init = 1; ``` I confirmed /sys is mounted, and then found a bug suggesting the ext4 module is maybe not loaded: https://bugzilla.redhat.com/show_bug.cgi?id=1071909 My home server doesn't have ext4 loaded and had 3T to play with, so I tried (and succeeded with) replicating the issue locally: ``` [root@kif:/scratch]# lsmod | grep -i ext [root@kif:/scratch]# zfs create -V 3G rpool/scratch/ext4 [root@kif:/scratch]# time mkfs.ext4 /dev/zvol/rpool/scratch/ext4 mke2fs 1.45.5 (07-Jan-2020) Discarding device blocks: done Creating filesystem with 786432 4k blocks and 196608 inodes Filesystem UUID: 560a4a8f-93dc-40cc-97a5-f10049bf801f Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done real 0m2.261s user 0m0.000s sys 0m0.025s [root@kif:/scratch]# zfs set volsize=3T rpool/scratch/ext4 [root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4 resize2fs 1.45.5 (07-Jan-2020) fs has 11 inodes, 1 groups required. fs requires 16390 data blocks. With 1 group(s), we have 22234 blocks available. Last group's overhead is 10534 Need 16390 data blocks in last group Final size of last group is 26924 Estimated blocks needed: 26924 Extents safety margin: 49 Resizing the filesystem on /dev/zvol/rpool/scratch/ext4 to 805306368 (4k) blocks. read_bitmaps: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 read_bitmaps: I/O read: 1MB, write: 0MB, rate: 3802.28MB/s fix_uninit_block_bitmaps 1: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 resize_group_descriptors: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 move_bg_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 zero_high_bits_in_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 ``` here it got stuck for quite some time ... straceing this 20 minutes in revealed this in a tight loop: ``` getuid() = 0 geteuid() = 0 getgid() = 0 getegid() = 0 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER) fallocate(3, FALLOC_FL_ZERO_RANGE, 2222649901056, 2097152) = 0 fsync(3) = 0 ``` it finally ended 43(!) minutes later: ``` adjust_superblock: Memory used: 264k/3592k (210k/55k), time: 2554.03/ 0.16/15.07 fix_uninit_block_bitmaps 2: Memory used: 264k/3592k (210k/55k), time: 0.16/ 0.16/ 0.00 blocks_to_move: Memory used: 264k/3592k (211k/54k), time: 0.00/ 0.00/ 0.00 Number of free blocks: 755396/780023556, Needed: 0 block_mover: Memory used: 264k/3592k (216k/49k), time: 0.05/ 0.01/ 0.00 block_mover: I/O read: 1MB, write: 0MB, rate: 18.68MB/s inode_scan_and_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 inode_ref_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 move_itables: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 calculate_summary_stats: Memory used: 264k/3592k (216k/49k), time: 16.35/16.35/ 0.00 fix_resize_inode: Memory used: 264k/3592k (222k/43k), time: 0.04/ 0.00/ 0.00 fix_resize_inode: I/O read: 1MB, write: 0MB, rate: 22.80MB/s fix_sb_journal_backup: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00 overall resize2fs: Memory used: 264k/3592k (222k/43k), time: 2570.90/16.68/15.07 overall resize2fs: I/O read: 1MB, write: 1MB, rate: 0.00MB/s The filesystem on /dev/zvol/rpool/scratch/ext4 is now 805306368 (4k) blocks long. real 43m1.943s user 0m16.761s sys 0m15.069s ``` I then cleaned up and recreated the zvol, loaded the ext4 module, created the ext4 fs, resized the volume, and resize2fs'd and it went quite quickly: ``` [root@kif:/scratch]# zfs destroy rpool/scratch/ext4 [root@kif:/scratch]# zfs create -V 3G rpool/scratch/ext4 [root@kif:/scratch]# modprobe ext4 [root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4 [root@kif:/scratch]# time mkfs.ext4 /dev/zvol/rpool/scratch/ext4 mke2fs 1.45.5 (07-Jan-2020) Discarding device blocks: done Creating filesystem with 786432 4k blocks and 196608 inodes Filesystem UUID: 5b415f2f-a8c4-4ba0-ac1d-78860de77610 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done real 0m1.013s user 0m0.001s sys 0m0.023s [root@kif:/scratch]# zfs set volsize=3T rpool/scratch/ext4 [root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4 resize2fs 1.45.5 (07-Jan-2020) fs has 11 inodes, 1 groups required. fs requires 16390 data blocks. With 1 group(s), we have 22234 blocks available. Last group's overhead is 10534 Need 16390 data blocks in last group Final size of last group is 26924 Estimated blocks needed: 26924 Extents safety margin: 49 Resizing the filesystem on /dev/zvol/rpool/scratch/ext4 to 805306368 (4k) blocks. read_bitmaps: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 read_bitmaps: I/O read: 1MB, write: 0MB, rate: 3389.83MB/s fix_uninit_block_bitmaps 1: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 resize_group_descriptors: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 move_bg_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 zero_high_bits_in_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 adjust_superblock: Memory used: 264k/1540k (210k/55k), time: 0.02/ 0.02/ 0.00 fix_uninit_block_bitmaps 2: Memory used: 264k/1540k (210k/55k), time: 0.15/ 0.15/ 0.00 blocks_to_move: Memory used: 264k/1540k (211k/54k), time: 0.00/ 0.00/ 0.00 Number of free blocks: 755396/780023556, Needed: 0 block_mover: Memory used: 264k/3592k (216k/49k), time: 0.01/ 0.01/ 0.00 block_mover: I/O read: 1MB, write: 0MB, rate: 157.11MB/s inode_scan_and_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 inode_ref_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 move_itables: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 calculate_summary_stats: Memory used: 264k/3592k (216k/49k), time: 16.20/16.20/ 0.00 fix_resize_inode: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00 fix_resize_inode: I/O read: 1MB, write: 0MB, rate: 5319.15MB/s fix_sb_journal_backup: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00 overall resize2fs: Memory used: 264k/3592k (222k/43k), time: 16.45/16.38/ 0.00 overall resize2fs: I/O read: 1MB, write: 1MB, rate: 0.06MB/s The filesystem on /dev/zvol/rpool/scratch/ext4 is now 805306368 (4k) blocks long. real 0m17.908s user 0m16.386s sys 0m0.079s ``` Success!
5 tasks
samueldr
approved these changes
Oct 30, 2020
10 tasks
Very nice 👍 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation for this change
I noticed booting a system with an ext4 root which expanded to 5T took
quite a long time (12 minutes in some cases, 43(!) in others.)
I changed stage-1 to run
resize2fs -d 62
for extra debug output andtiming information. It revealed the adjust_superblock step taking
almost all of the time:
but when I ran resize2fs on a disk with the identical content growing
to the identical target size, it would only take about 30 seconds. I
looked at what happened between those two steps in the fast case with
strace and found:
In particular the access to /sys/fs seemed interesting. Looking
at the source of resize2fs:
I confirmed /sys is mounted, and then found a bug suggesting the
ext4 module is maybe not loaded:
https://bugzilla.redhat.com/show_bug.cgi?id=1071909
My home server doesn't have ext4 loaded and had 3T to play with, so
I tried (and succeeded with) replicating the issue locally:
here it got stuck for quite some time ... straceing this 20 minutes in revealed this in a tight loop:
it finally ended 43(!) minutes later:
I then cleaned up and recreated the zvol, loaded the ext4 module, created the ext4 fs,
resized the volume, and resize2fs'd and it went quite quickly:
Success!
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)