Comparing changes
Open a pull request
base repository: NixOS/nixpkgs
base: 6b2df0109d90
head repository: NixOS/nixpkgs
compare: 56e5f2a3518c
- 15 commits
- 5 files changed
- 1 contributor
Commits on Oct 30, 2020
-
stage-1: modprobe ext{2,3,4} before resizing
I noticed booting a system with an ext4 root which expanded to 5T took quite a long time (12 minutes in some cases, 43(!) in others.) I changed stage-1 to run `resize2fs -d 62` for extra debug output and timing information. It revealed the adjust_superblock step taking almost all of the time: [Fri Oct 30 11:10:15 UTC 2020] zero_high_bits_in_metadata: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 [Fri Oct 30 11:21:09 UTC 2020] adjust_superblock: Memory used: 396k/4556k (295k/102k), time: 654.21/ 0.59/ 5.13 but when I ran resize2fs on a disk with the identical content growing to the identical target size, it would only take about 30 seconds. I looked at what happened between those two steps in the fast case with strace and found: ``` 235 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=1795}, ru_stime={tv_sec=0, tv_usec=3590}, ...}) = 0 236 write(1, "zero_high_bits_in_metadata: Memo"..., 84zero_high_bits_in_metadata: Memory used: 132k/0k (72k/61k), time: 0.00/ 0.00/ 0.00 237 ) = 84 238 gettimeofday({tv_sec=1604061278, tv_usec=480147}, NULL) = 0 239 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=1802}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 240 gettimeofday({tv_sec=1604061278, tv_usec=480192}, NULL) = 0 241 mmap(NULL, 2564096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c7355000 242 access("/sys/fs/ext4/features/lazy_itable_init", F_OK) = 0 243 brk(0xf85000) = 0xf85000 244 brk(0xfa6000) = 0xfa6000 245 gettimeofday({tv_sec=1604061278, tv_usec=538828}, NULL) = 0 246 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=58720}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 247 write(1, "adjust_superblock: Memory used: "..., 79adjust_superblock: Memory used: 396k/2504k (305k/92k), time: 0.06/ 0.06/ 0.00 248 ) = 79 249 gettimeofday({tv_sec=1604061278, tv_usec=539119}, NULL) = 0 250 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=58812}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 251 gettimeofday({tv_sec=1604061279, tv_usec=939}, NULL) = 0 252 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=520411}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0 253 write(1, "fix_uninit_block_bitmaps 2: Memo"..., 88fix_uninit_block_bitmaps 2: Memory used: 396k/2504k (305k/92k), time: 0.46/ 0.46/ 0.00 254 ) = 88 ``` In particular the access to /sys/fs seemed interesting. Looking at the source of resize2fs: ``` [root@ip-172-31-22-182:~/e2fsprogs-1.45.5]# rg -B2 -A1 /sys/fs/ext4/features/lazy_itable_init . ./resize/resize2fs.c 923- if (getenv("RESIZE2FS_FORCE_LAZY_ITABLE_INIT") || 924- (!getenv("RESIZE2FS_FORCE_ITABLE_INIT") && 925: access("/sys/fs/ext4/features/lazy_itable_init", F_OK) == 0)) 926- lazy_itable_init = 1; ``` I confirmed /sys is mounted, and then found a bug suggesting the ext4 module is maybe not loaded: https://bugzilla.redhat.com/show_bug.cgi?id=1071909 My home server doesn't have ext4 loaded and had 3T to play with, so I tried (and succeeded with) replicating the issue locally: ``` [root@kif:/scratch]# lsmod | grep -i ext [root@kif:/scratch]# zfs create -V 3G rpool/scratch/ext4 [root@kif:/scratch]# time mkfs.ext4 /dev/zvol/rpool/scratch/ext4 mke2fs 1.45.5 (07-Jan-2020) Discarding device blocks: done Creating filesystem with 786432 4k blocks and 196608 inodes Filesystem UUID: 560a4a8f-93dc-40cc-97a5-f10049bf801f Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done real 0m2.261s user 0m0.000s sys 0m0.025s [root@kif:/scratch]# zfs set volsize=3T rpool/scratch/ext4 [root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4 resize2fs 1.45.5 (07-Jan-2020) fs has 11 inodes, 1 groups required. fs requires 16390 data blocks. With 1 group(s), we have 22234 blocks available. Last group's overhead is 10534 Need 16390 data blocks in last group Final size of last group is 26924 Estimated blocks needed: 26924 Extents safety margin: 49 Resizing the filesystem on /dev/zvol/rpool/scratch/ext4 to 805306368 (4k) blocks. read_bitmaps: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 read_bitmaps: I/O read: 1MB, write: 0MB, rate: 3802.28MB/s fix_uninit_block_bitmaps 1: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 resize_group_descriptors: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 move_bg_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 zero_high_bits_in_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 ``` here it got stuck for quite some time ... straceing this 20 minutes in revealed this in a tight loop: ``` getuid() = 0 geteuid() = 0 getgid() = 0 getegid() = 0 prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER) fallocate(3, FALLOC_FL_ZERO_RANGE, 2222649901056, 2097152) = 0 fsync(3) = 0 ``` it finally ended 43(!) minutes later: ``` adjust_superblock: Memory used: 264k/3592k (210k/55k), time: 2554.03/ 0.16/15.07 fix_uninit_block_bitmaps 2: Memory used: 264k/3592k (210k/55k), time: 0.16/ 0.16/ 0.00 blocks_to_move: Memory used: 264k/3592k (211k/54k), time: 0.00/ 0.00/ 0.00 Number of free blocks: 755396/780023556, Needed: 0 block_mover: Memory used: 264k/3592k (216k/49k), time: 0.05/ 0.01/ 0.00 block_mover: I/O read: 1MB, write: 0MB, rate: 18.68MB/s inode_scan_and_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 inode_ref_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 move_itables: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 calculate_summary_stats: Memory used: 264k/3592k (216k/49k), time: 16.35/16.35/ 0.00 fix_resize_inode: Memory used: 264k/3592k (222k/43k), time: 0.04/ 0.00/ 0.00 fix_resize_inode: I/O read: 1MB, write: 0MB, rate: 22.80MB/s fix_sb_journal_backup: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00 overall resize2fs: Memory used: 264k/3592k (222k/43k), time: 2570.90/16.68/15.07 overall resize2fs: I/O read: 1MB, write: 1MB, rate: 0.00MB/s The filesystem on /dev/zvol/rpool/scratch/ext4 is now 805306368 (4k) blocks long. real 43m1.943s user 0m16.761s sys 0m15.069s ``` I then cleaned up and recreated the zvol, loaded the ext4 module, created the ext4 fs, resized the volume, and resize2fs'd and it went quite quickly: ``` [root@kif:/scratch]# zfs destroy rpool/scratch/ext4 [root@kif:/scratch]# zfs create -V 3G rpool/scratch/ext4 [root@kif:/scratch]# modprobe ext4 [root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4 [root@kif:/scratch]# time mkfs.ext4 /dev/zvol/rpool/scratch/ext4 mke2fs 1.45.5 (07-Jan-2020) Discarding device blocks: done Creating filesystem with 786432 4k blocks and 196608 inodes Filesystem UUID: 5b415f2f-a8c4-4ba0-ac1d-78860de77610 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done real 0m1.013s user 0m0.001s sys 0m0.023s [root@kif:/scratch]# zfs set volsize=3T rpool/scratch/ext4 [root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4 resize2fs 1.45.5 (07-Jan-2020) fs has 11 inodes, 1 groups required. fs requires 16390 data blocks. With 1 group(s), we have 22234 blocks available. Last group's overhead is 10534 Need 16390 data blocks in last group Final size of last group is 26924 Estimated blocks needed: 26924 Extents safety margin: 49 Resizing the filesystem on /dev/zvol/rpool/scratch/ext4 to 805306368 (4k) blocks. read_bitmaps: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 read_bitmaps: I/O read: 1MB, write: 0MB, rate: 3389.83MB/s fix_uninit_block_bitmaps 1: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00 resize_group_descriptors: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 move_bg_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 zero_high_bits_in_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00 adjust_superblock: Memory used: 264k/1540k (210k/55k), time: 0.02/ 0.02/ 0.00 fix_uninit_block_bitmaps 2: Memory used: 264k/1540k (210k/55k), time: 0.15/ 0.15/ 0.00 blocks_to_move: Memory used: 264k/1540k (211k/54k), time: 0.00/ 0.00/ 0.00 Number of free blocks: 755396/780023556, Needed: 0 block_mover: Memory used: 264k/3592k (216k/49k), time: 0.01/ 0.01/ 0.00 block_mover: I/O read: 1MB, write: 0MB, rate: 157.11MB/s inode_scan_and_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 inode_ref_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 move_itables: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00 calculate_summary_stats: Memory used: 264k/3592k (216k/49k), time: 16.20/16.20/ 0.00 fix_resize_inode: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00 fix_resize_inode: I/O read: 1MB, write: 0MB, rate: 5319.15MB/s fix_sb_journal_backup: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00 overall resize2fs: Memory used: 264k/3592k (222k/43k), time: 16.45/16.38/ 0.00 overall resize2fs: I/O read: 1MB, write: 1MB, rate: 0.06MB/s The filesystem on /dev/zvol/rpool/scratch/ext4 is now 805306368 (4k) blocks long. real 0m17.908s user 0m16.386s sys 0m0.079s ``` Success! (cherry picked from commit ece5c0f)
-
stage-1: add datestamps to logs
When the stage-1 logs get imported in to the journal, they all get loaded with the same timestamp. This makes it difficult to identify what might be taking a long time in early boot. (cherry picked from commit a179781)
-
nixos ec2/create-amis.sh: shellcheck: quote region references
(cherry picked from commit c766921)
-
nixos ec2/create-amis.sh: shellcheck: quote state_dir reference
(cherry picked from commit f5994c2)
-
nixos ec2/create-amis.sh: shellcheck: SC2155: Declare and assign sepa…
…rately to avoid masking return values. (cherry picked from commit baf7ed3)
-
nixos ec2/create-amis.sh: shellcheck: read without -r mangles backsla…
…shes (cherry picked from commit a66a22c)
-
nixos ec2/create-amis.sh: shellcheck: explicitly make the additions t…
…o block_device_mappings single strings (cherry picked from commit 7dac847)
-
nixos ec2/create-amis.sh: shellcheck: $ is not needed in arithmetic
(cherry picked from commit f92a883)
-
create-amis.sh: log the full response if describing the import snapsh…
…ot tasks fails (cherry picked from commit e253de8)
-
create-amis: allow customizing the service role name
The complete setup on the AWS end can be configured with the following Terraform configuration. It generates a ./credentials.sh which I just copy/pasted in to the create-amis.sh script near the top. Note: the entire stack of users and bucket can be destroyed at the end of the import. variable "region" { type = string } variable "availability_zone" { type = string } provider "aws" { region = var.region } resource "aws_s3_bucket" "nixos-amis" { bucket_prefix = "nixos-amis-" lifecycle_rule { enabled = true abort_incomplete_multipart_upload_days = 1 expiration { days = 7 } } } resource "local_file" "credential-file" { file_permission = "0700" filename = "${path.module}/credentials.sh" sensitive_content = <<SCRIPT export service_role_name="${aws_iam_role.vmimport.name}" export bucket="${aws_s3_bucket.nixos-amis.bucket}" export AWS_ACCESS_KEY_ID="${aws_iam_access_key.uploader.id}" export AWS_SECRET_ACCESS_KEY="${aws_iam_access_key.uploader.secret}" SCRIPT } # The following resources are for the *uploader* resource "aws_iam_user" "uploader" { name = "nixos-amis-uploader" } resource "aws_iam_access_key" "uploader" { user = aws_iam_user.uploader.name } resource "aws_iam_user_policy" "upload-to-nixos-amis" { user = aws_iam_user.uploader.name policy = data.aws_iam_policy_document.upload-policy-document.json } data "aws_iam_policy_document" "upload-policy-document" { statement { effect = "Allow" actions = [ "s3:ListBucket", "s3:GetBucketLocation", ] resources = [ aws_s3_bucket.nixos-amis.arn ] } statement { effect = "Allow" actions = [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", ] resources = [ "${aws_s3_bucket.nixos-amis.arn}/*" ] } statement { effect = "Allow" actions = [ "ec2:ImportSnapshot", "ec2:DescribeImportSnapshotTasks", "ec2:DescribeImportSnapshotTasks", "ec2:RegisterImage", "ec2:DescribeImages" ] resources = [ "*" ] } } # The following resources are for the *vmimport service user* # See: https://docs.aws.amazon.com/vm-import/latest/userguide/vmie_prereqs.html#vmimport-role resource "aws_iam_role" "vmimport" { assume_role_policy = data.aws_iam_policy_document.vmimport-trust.json } resource "aws_iam_role_policy" "vmimport-access" { role = aws_iam_role.vmimport.id policy = data.aws_iam_policy_document.vmimport-access.json } data "aws_iam_policy_document" "vmimport-access" { statement { effect = "Allow" actions = [ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", ] resources = [ aws_s3_bucket.nixos-amis.arn, "${aws_s3_bucket.nixos-amis.arn}/*" ] } statement { effect = "Allow" actions = [ "ec2:ModifySnapshotAttribute", "ec2:CopySnapshot", "ec2:RegisterImage", "ec2:Describe*" ] resources = [ "*" ] } } data "aws_iam_policy_document" "vmimport-trust" { statement { effect = "Allow" principals { type = "Service" identifiers = [ "vmie.amazonaws.com" ] } actions = [ "sts:AssumeRole" ] condition { test = "StringEquals" variable = "sts:ExternalId" values = [ "vmimport" ] } } } (cherry picked from commit 2bf1fc0)
-
create-amis: improve wording around the service name's IAM role
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com> (cherry picked from commit 74a577b)
-
make-disk-image: support legacy+gpt
(cherry picked from commit d78aa08)
-
nixos.amazonAmi: use legacy+gpt disk images to support partitions >2T
(cherry picked from commit d77ddf2)
-
amazon-image: random.trust_cpu=on to cut 10s from boot
Ubuntu and other distros already have this set via kernel config. (cherry picked from commit c851030)
-
Merge pull request #102182 from grahamc/backport-ami-improvements
[20.09] Backport AMI Boot Time and Disk Improvements
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff 6b2df0109d90...56e5f2a3518c