New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for build users when none are available #3564
Conversation
Thank you a thousand times! |
I think this also resolves the fact that we make 32 build users in the installer, this was mostly the reason for it. |
This is my next nixpkgs PR, yes ;) |
This is not a good solution because it puts the entire build loop to sleep (since it's single-threaded). So no other builds/substitutions will make any progress until another process releases a build user. In fact, it seems that this will deadlock if this process itself has acquired all build users (since then it can't make progress to release any build users). For comparison, if we can't acquire a lock on an output path, the
|
@edolstra Thanks for the suggestion, I'll implement the same logic. |
Please, re-evaluate if possible. I have changed the behaviour to match that of output locks. However, now #2069 is even more of an issue. |
My testing methodology, for anyone interested:
If the last two commands succeed and use up all available build users while not crashing and not using users two times over, my patch is correct. I'd be very grateful if someone tested this and reported the results on non-NixOS linux and/or Mac. |
@domenkozar I had a temptation to do so here, but I think it merits a separate PR. I'll create one now. Actually, I think it's slightly more complex than I anticipated. We need to monitor for changes of the reason for waiting. |
I'll come back to #2069 when I have time. |
@domenkozar fixed (architecturally) in #3577, will rebase that if/when this PR is merged. |
@@ -530,6 +532,12 @@ class UserLock | |||
UserLock::UserLock() | |||
{ | |||
assert(settings.buildUsersGroup != ""); | |||
createDirs(settings.nixStateDir + "/userpool"); | |||
/* Mark that user is not enabled by default */ | |||
uid = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0
seems a risky choice as a sentinel value for UIDs given that 0 == root. So if there's ever any confusion, we might end up doing an operation as/to root. Might be better to use -1
or std::optional<uid_t>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, to ensure that UserLock
always represents a valid user, have a builder function that returns std::unique_ptr<UserLock>
so it can represent failure to acquire a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about adding a function that can represent a failure, and it does appear to be a better solution from some perspectives, but I didn't want to make the diff too big (since this is my first PR). Now that it's inevitably going to be massive, I'll wrap the result in optional and make it represent failure.
src/libstore/build.cc
Outdated
@@ -1391,6 +1394,30 @@ void DerivationGoal::tryToBuild() | |||
{ | |||
trace("trying to build"); | |||
|
|||
/* If `build-users-group' is not empty, then we have to build as | |||
one of the members of that group. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, it will try to acquire a build user even when we're doing a remote build, which is unnecessary. Maybe it's better to add a new state to the state machine, e.g. startBuild
, with a transition from tryToBuild
to startBuild
if we're doing a local build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, this sounds very much like something I didn't even think about.
I'll try to wrap my head around this and implement it next weekend.
Thank you again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a state and (hopefully) fixed this.
@edolstra I've (hopefully) fixed both of issues you've found, please look at it again when you have time. The CI seems to be slightly broken (it can't find a builder for macOS). I've tested this change on my NixOS desktop as explained above. |
Thanks! |
Sorry for reviving and old thread and n00b question, but, what does it mean if I'm still getting "all build users are currently in use" errors today? Does that mean my nix setup is too old and does not have this patch? Or could it be a configuration issue? |
This patch never became part of the 2.3 (stable) branch, so it's only available on master/2.4pre right now. @edolstra is there any chance to backport this to 2.3? |
Thanks for the quick response. Do we have a public URL for 2.4pre that I can use? I'm using https://github.com/cachix/install-nix-action which allows specifying install URL (default is https://nixos.org/nix/install). |
Please see https://github.com/cachix/install-nix-action#usage-with-flakes (except you don't need all the |
This version should include NixOS/nix#3564 and allow us to increase max-jobs. See #2621 for the previous attempt, reverted in 8365e73. The install URL is obtained from https://github.com/NixOS/nix/runs/2895674848 (NixOS/nix@323e545)
This version should include NixOS/nix#3564 and allow us to increase max-jobs. See #2621 for the previous attempt, reverted in 8365e73. The install URL is obtained from https://github.com/NixOS/nix/runs/2895674848 (NixOS/nix@323e545)
This version should include NixOS/nix#3564 and allow us to increase max-jobs. See #2621 for the previous attempt, reverted in 8365e73. The install URL is obtained from https://github.com/NixOS/nix/runs/2895674848 (NixOS/nix@323e545)
This version should include NixOS/nix#3564 and allow us to increase max-jobs. See #2621 for the previous attempt, reverted in 8365e73. The install URL is obtained from https://github.com/NixOS/nix/runs/2895674848 (NixOS/nix@323e545)
Solves #1216
Descriptions from there:
Additional implementation details: when build users are available, the behavior is exactly the same as before (choose the first available user, take a lock on it, use it). However, when no build users are available at the moment, repeat the search every two seconds. Whenever a build user becomes available, choose it and continue as usual. Previous behavior was to error out.
Tested on:
Note:
I have tried implementing a more "proper" solution with
libfswatch
(to only check the userpool when it changes), but for some reason the callback didn't trigger when lock files were released. If you believe that my work on implementing this withlibfswatch
is of interest, I can publish it.