Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release.nix: remove aarch64 as blocker, fixes #104550 #104679

Closed
wants to merge 1 commit into from

Conversation

FRidh
Copy link
Member

@FRidh FRidh commented Nov 23, 2020

The aarch64 build capacity has dropped significantly, making it
impossible to keep up with the x86_64 linux and darwin builders. This
blocks the advancing of nixpkgs-unstable.

At the time of writing, even though Hydra has hardly been occupied on
other branches, the aarch64 part of nixpkgs:trunk is hardly progressing;
only 3300 jobs have finished and still 28000 are queued.

Closes #104550.

https://hydra.nixos.org/job/nixpkgs/trunk/unstable#tabs-constituents

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@FRidh
Copy link
Member Author

FRidh commented Nov 23, 2020

Note I think there is a possibility to increase the priority of a derivation (meta.schedulingPriority). In that case we could prioritize this specific blocking job so that whatever capacity there is, is used first for this and then for whatever remains.

@FRidh FRidh changed the title release.nix: remove aarch64 as blocker release.nix: remove aarch64 as blocker, fixes #104550 Nov 23, 2020
@grahamc
Copy link
Member

grahamc commented Nov 23, 2020

I think I can commit some time to this today, please consider holding this PR for a few hours.

@vcunat
Copy link
Member

vcunat commented Nov 25, 2020

pkgs/top-level/release.nix is for nixpkgs-unstable channel, not nixos-unstable, so the lagging doesn't seem that urgent (to me at least). As for nixos-unstable itself, it doesn't seem lagging that much so far, thanks to only containing few aarch64 jobs (and to having elevated builder shares).

For nixos-unstable I usually observe lag (between commit date and current date) about half a week; it seems rare to get around a full week.

Personally I think nixpkgs-unstable is a weird mix with limited use – perhaps we should instead have nixpkgs-unstable-darwin alike to nixpkgs-NN.NN-darwin. (I mean channel here; having a jobset over all platforms is still quite useful, e.g. for estimating mass rebuild regressions in branches like staging.)

Given that aarch64 is lagging that much behind, perhaps users would appreciate separate -aarch64 channels, but we don't have any of those yet... and of course, getting more aarch64 builders would be a nicer solution.

@FRidh
Copy link
Member Author

FRidh commented Nov 25, 2020

so the lagging doesn't seem that urgent (to me at least)

We offer the channel, so it should be reliable.

Personally I think nixpkgs-unstable is a weird mix with limited use

We do have Nix on non-NixOS users, aside from Darwin. Or users on NixOS that want to run their system on stable and use "the latest" for other things.

@vcunat
Copy link
Member

vcunat commented Nov 25, 2020

I wouldn't recommend nixpkgs-unstable to Linux users even if they don't use NixOS, but perhaps it's just personal preference. I feel that the basic NixOS tests check also the packages themselves.

Still, I do agree that we should improve the situation.

@grahamc
Copy link
Member

grahamc commented Nov 26, 2020

I wonder if we can close this now?

@blitz
Copy link
Contributor

blitz commented Nov 26, 2020

Is it possible to donate some money to get AArch64 instances on AWS or something?

@vcunat
Copy link
Member

vcunat commented Nov 27, 2020

So far I'm only aware of donations that don't specify the platform. I don't know how difficult it would be to arrange something like that.

@domenkozar domenkozar closed this Nov 27, 2020
@FRidh FRidh reopened this Dec 2, 2020
@FRidh
Copy link
Member Author

FRidh commented Dec 2, 2020

Opening again because its lagging. This time we haven't been able to get the bootstrap tools to build since November 21st. That's over 10 days now.

The aarch64 build capacity has dropped significantly, making it
impossible to keep up with the x86_64 linux and darwin builders. This
blocks the advancing of nixpkgs-unstable.

At the time of writing, even though Hydra has hardly been occupied on
other branches, the aarch64 part of nixpkgs:trunk is hardly progressing;
only 3300 jobs have finished and still 28000 are queued.
@andir
Copy link
Member

andir commented Dec 4, 2020

Are we doing a lot more staging builds these days? If memory serves well I didn't see as many during previous relases/years. Maybe just being a bit more conservative there would help? It does not solve the issue but also our capacity doesn't just continue to grow with demand.

@grahamc did you and @lheckemann figure out (and fix?) what was wrong with a few builders?

@FRidh
Copy link
Member Author

FRidh commented Dec 4, 2020

Are we doing a lot more staging builds these days? If memory serves well I didn't see as many during previous relases/years. Maybe just being a bit more conservative there would help? It does not solve the issue but also our capacity doesn't just continue to grow with demand.

I aim to have a weekly staging iteration. I think the average time of a cycle has dropped by several days. Even so, we have plenty of x86_64-linux building capacity, and the darwin builders also keep up, to a degree that is. The capacity of aarch64 has simply dropped spectacularly.

Also to consider is that, in the month leading to a release and after a release the staging branch of the stable branch is typically more occupied. However, it is also my impression that we have now more active people backporting mass-rebuilding changes.

@FRidh FRidh removed the 1.severity: channel blocker Blocks a channel label Dec 7, 2020
@lheckemann
Copy link
Member

Yes, the builders should be working again, running an older kernel. Is the capacity still lower than before?

@grahamc
Copy link
Member

grahamc commented Dec 14, 2020

Thanks to @lheckemann we have about 20 working builders now (50/50 split big-parallel (22 cores, 2 jobs) and not (22 jobs, 2 cores)) after getting the worker image booting on some slightly weird hardware.

@FRidh
Copy link
Member Author

FRidh commented Dec 14, 2020

Capacity is all good again, thanks!

@FRidh FRidh closed this Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aarch64: missing packages binary cache
7 participants