Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the realisations of unresolved derivations #4340

Merged
merged 7 commits into from Feb 19, 2021

Conversation

thufschmitt
Copy link
Member

@thufschmitt thufschmitt commented Dec 9, 2020

When building a (non fully-input-addressed) derivation, we first resolve it to a BasicDerivation that depends directly on the output paths of its dependencies (rather than on the derivation outputs) − this is what enables early cutoff.

Currently we register the realisation of the resolved derivation, so that subsequent calls to nix-build can directly reuse this without rebuilding the derivation. However, we don't register the unresolved derivation, meaning that we have to re-do the resolution step each time.
This in turn means that we must either keep all the build inputs or keep the knowledge of their output path (without the content of the path itself).
The former is very costly in disk-space (it amounts to making every build-input a runtime dependency), and the latter comes with a big set of challenges in presence of non-determinism that makes this a too heavy-handed solution for the time being (though we might switch to it eventually if we can find a way to solve its issues as it's a rather tempting design).

To avoid these issues, we choose a third way which is to also register the realisation of the unresolved derivation − so we can completely sidestep the resolving phase, which makes these behave as classical input-addressed derivations.

Depends on #4330

@thufschmitt thufschmitt added the ca-derivations Derivations with content addressed outputs label Dec 9, 2020
@thufschmitt thufschmitt added this to the ca-derivations-mvp milestone Dec 11, 2020
@dpulls
Copy link

dpulls bot commented Dec 14, 2020

🎉 All dependencies have been resolved !

Base automatically changed from ca/properly-store-outputs to master December 14, 2020 14:01
@thufschmitt
Copy link
Member Author

I've updated it on top of latest master. @edolstra @Ericson2314 mind having a look?

src/libstore/derivations.cc Outdated Show resolved Hide resolved
src/libstore/derivations.cc Outdated Show resolved Hide resolved
src/libstore/local-store.cc Outdated Show resolved Hide resolved
Comment on lines 1025 to 1030
// If the derivation was originally a full `Derivation` (and not just
// a `BasicDerivation`, we must retrieve it because the `staticOutputHashes`
// will be wrong otherwise
Derivation fullDrv = *drv;
if (auto upcasted = dynamic_cast<Derivation *>(drv.get()))
fullDrv = *upcasted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit ugly... It makes it hard to reason about DerivationGoal if it sneakily does something different if the drv (which is a BasicDerivation) is not really a BasicDerivation. E.g. what does this mean for the remote build case (which is the case where we receive a BasicDerivation)?

Maybe it's cleaner to handle this in the DerivationGoal constructors and getDerivation method. I.e. add a field originalHashes to DerivationGoal, which gets initialized in DerivationGoal::DerivationGoal(... BasicDerivation ...) from the BasicDerivation, and in DerivationGoal::getDerivation from the full derivation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right, I didn't think about putting this in the constructor. That'd be much nicer indeed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3c8b629

fullDrv = *upcasted;

auto originalHashes = staticOutputHashes(worker.store, fullDrv);
auto resolvedHashes = staticOutputHashes(worker.store, *resolvedDrv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside I find the "hash" terminology confusing since it's not really clear what it is a hash of... We should find a better name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the terminology isn't really ideal. I couldn't think of anything better though…

Maybe we could just manipulate the DrvOutput (hash, outputName), so that we don't have to name this precise bit which doesn't make much sense by itself

@edolstra
Copy link
Member

edolstra commented Feb 5, 2021

@regnat Looks like the build is currently failing:

src/libstore/derivations.cc:762:21: error: expected ')'
                    store.printStorePath(input.first),

@thufschmitt
Copy link
Member Author

@regnat Looks like the build is currently failing:

src/libstore/derivations.cc:762:21: error: expected ')'
                    store.printStorePath(input.first),

Oh thanks, I ignored the CI warning thinking that it was only the transient OSX failure, but it was more than that indeed

@@ -124,6 +124,17 @@ DerivationGoal::DerivationGoal(const StorePath & drvPath, const BasicDerivation
, buildMode(buildMode)
{
this->drv = std::make_unique<BasicDerivation>(BasicDerivation(drv));

auto outputHashes = staticOutputHashes(worker.store, drv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, is there any way we can abstract over this stuff in haveDerivation where it usual set? It looks like that is called just after?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regnat and I talked about this a rather meandering conversation (my fault!). @regnat mentioned the awkwardness of downcasting, but we do that a number of times already. checkPathValidity is the other place too.

Also, we should double check this is OK in the case of input-addressed remote buildings. Since the sender will strip away the inputDrvs but keep the output hashes, the remote side will calculate a different drvHash. Is this OK? I'm not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regnat mentioned the awkwardness of downcasting, but we do that a number of times already. checkPathValidity is the other place too.

I honestly don't have much of an opinion on this (except that we should eventually fix the root cause that forces us to downcast… but that's work for another day).

Also, we should double check this is OK in the case of input-addressed remote buildings. Since the sender will strip away the inputDrvs but keep the output hashes, the remote side will calculate a different drvHash. Is this OK? I'm not sure.

I might obviously miss something, but it seems OK for me as this new drvHash is only used for registering the realisation remotely, meaning that

  1. It doesn't change the output path of the derivation
  2. It's never sent back to the sender

(and it's a correct output hash from the builder's pov, so at the very least it doesn't hurt to have it registered, although it's not strictly required for input-addressed drvs)

@thufschmitt
Copy link
Member Author

@Ericson2314 I've removed the now useless resolving cache.

@edolstra any blocker left for this PR?

thufschmitt and others added 7 commits February 16, 2021 08:29
Change the `nix-build` logic for linking/printing the output paths to allow for
some outputs to be missing. This might happen when the toplevel
derivation didn't have to be built, either because all the required
outputs were already there, or because they have all been substituted.
Simple test to ensure that `nix-build && nix-collect-garbage &&
nix-build -j0` works as it should
Once a build is done, get back to the original derivation, and register
all the newly built outputs for this derivation.

This allows Nix to work properly with derivations that don't have all
their build inputs available − thus allowing garbage collection and
(once it's implemented) binary substitution
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
It's not fixed nor useful atm, so better keep it hidden

Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
That way we
1. Don't have to recompute them several times
2. Can compute them in a place where we know the type of the parent
  derivation, meaning that we don't need the casting dance we had before
It isn't needed anymore now that don't need to eagerly resolve
everything like we used to do. So we can safely get rid of it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ca-derivations Derivations with content addressed outputs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants