Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow depending on content-addressed derivations #3776

Closed
wants to merge 19 commits into from

Conversation

thufschmitt
Copy link
Member

Builds on top of #3740, and extends it to allow depending on ca-derivations, with a proper cutoff in case two different derivations in the build chain produce the same output (see the added test)

@Ericson2314
Copy link
Member

Ericson2314 commented Jul 2, 2020

Between the first and #3740, This does seem like a bunch code doubling down on the "real and fake outputs" approach, which I do worry will be a source of bugs.

Now that derivation outputs are fully parsed, I'd like to exploit that, going from

struct DerivationOutput
{
    StorePath path;
    std::optional<FixedOutputHash> hash; /* hash used for expected hash computation */
};

to

struct DerivationOutputLegacy {
    // StorePath path;
    // regnat pointed out that when we depend on a ca-derivation,
    // we cannot calculate downstream paths until the upstream one is built.
    std::optional<StorePath> path;
};
struct DerivationOutputCAFixed {
    /* hash used for expected hash computation */
    FixedOutputHash hash;
    // no need to store output store path, as it can be computed on the fly.
};
struct DerivationOutputCAFloating {
    /* hash used for ca path computation */
    HashType hashType;
};
typedef std::variant<
    DerivationOutputLegacy,
    DerivationOutputCAFixed,
    DerivationOutputCAFloating
> DerivationOutput;

/* accessors to make it easier to handle multiple cases at once. */
std::optional<StorePath> outputPath(DerivationOutput) { ... }
std::optional<HashType> outputHashAlgo(DerivationOutput) { ... }

I really do think we can make that change, fix all the type errors "on autopilot', and be quite close to a working version.


As a first, preparatory step, we can do:

struct DerivationOutputLegacy {
    StorePath path;
};
struct DerivationOutputCAFixed {
    /* hash used for expected hash computation */
    FixedOutputHash hash;
    // no need to store output store path, as it can be computed on the fly.
};
typedef std::variant<
    DerivationOutputLegacy,
    DerivationOutputCAFixed
> DerivationOutput;

/* accessors to make it easier to handle multiple cases at once. */
StorePath outputPath(DerivationOutput) { ... }

I think we'll also need to make derivations store their own name.

which is exactly isomorphic to today, to get the churn out of the way, and then add new new variant in a separate PR.

@Ericson2314
Copy link
Member

https://github.com/NixOS/nix/pull/3754/files should also help. Every time in that PR I do something like

FixedOutputInfo {
    { .method = ..., .hash = ... },
    {} // empty references
}

the {} is practically a Mad Lib beginning to be filed in with richer reference data.

Move the logic from the `nix make-content-addressed` nix command to a
new method in the `Store` class
This is required so that we can call LocalStore::makeContentAddressed on
paths that haven't been registered
Very limited atm:

- Only `nix-build` works properly (`nix-store`, `nix build`, etc.. will
  print and create a symlink to a non-existent output path)
- It isn't possible to depend on a content-addressed derivation
Otherwise `resolveDerivation` changes the name of the derivation (and as
a consequence its hash)
Otherwise substituted or short-circuited paths are not properly
registered
bool isModified = resolveDerivation(worker.store, *(dynamic_cast<Derivation *>(drv.get())));
// Define the actual outputs of the build
for (auto & i: drv->outputs)
actualOutputs.emplace(i.first, i.second.path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing to me that here actualOutputs is initialized to what appears to be the non-actual outputs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is intentional, but this is the sort of foot-gun I want to avoid by making DerivationOutput use variants in my comment above.

// Replace the output path by the new CA one
// FIXME: Why can't this be done in one step?
actualOutputs.erase(i.first);
actualOutputs.emplace(i.first, info.path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced by insert_or_assign().

@@ -4248,6 +4286,7 @@ void DerivationGoal::done(BuildResult::Status status, std::optional<Error> ex)
mcRunningBuilds.reset();

if (result.success()) {
registerOutputMappings();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done here rather than in registerOutputs()? To me done() is a function that is only intended to interact with Worker to set error states and trigger dependent goals, so it shouldn't have side-effects like registering stuff in the database.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better place for this indeed.
I didn't put it in registerOutputs because we want it to be called even when no build took place (either because we substituted the output or because we've short-cutted the build after resolving the derivation).

@@ -79,8 +79,6 @@ ref<FSAccessor> LocalFSStore::getFSAccessor()

void LocalFSStore::narFromPath(const StorePath & path, Sink & sink)
{
if (!isValidPath(path))
throw Error("path '%s' is not valid", printStorePath(path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should stay since narFromPath is supposed to apply to valid paths only. For arbitrary paths you should use dumpPath().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the issue is that build.cc is calling Store::makeContentAddressed() on an invalid path, then maybe that function should take a separate argument for the path (so it can use dumpPath()).

@@ -576,6 +576,17 @@ void LocalStore::checkDerivationOutputs(const StorePath & drvPath, const Derivat
}
}

void LocalStore::linkDeriverToPath(const StorePath deriver, const string outputName, const StorePath output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be const StorePath & deriver etc. Otherwise we're passing by value which is less efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh forgot these, thanks

* (as given by `queryDerivationOutputMap`)
* **if this one differs from the one written in the derivation**
*/
bool resolveDerivation(Store & store, Derivation & drv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the boolean return value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad I forgot to document it.

It serves to indicate whether the resolving actually modified the derivation or not (to know whether to retry the substitution). But I'm not very happy with it, so I'd be happy if you have any suggestion on improving this

auto outputEnvVar = drv.env.find(i.first);
if (outputEnvVar != drv.env.end())
outputEnvVar->second = store.printStorePath(outPath);
debug(format("Rewrote output %1% to %2%")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format is obsolete, please do debug("rewrite output '%s' to '%s'", arg1, arg2).

for (auto & i : drv.outputs) {
// XXX: There's certainly a better and less error-prone way
// of getting the name than to look it up in the drv environment
string name = ParsedDerivation(StorePath::dummy, drv).getStringAttr("name").value_or("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be lifted out of the for loop.

const StringMap & extraRewrites
)
{
bool hasSelfReference = info.references.find(info.path) != info.references.end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> bool hasSelfReference = info.references.count(info.path).

/* Create a new path similar to the one referred to by `info`, but that's
* content-addressed.
*
* This doesn't alter the original path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should document whether info needs to denote a valid path.

Comment on lines 117 to 118
std::optional<StorePath> deriver;
std::optional<string> outputname;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent that we can have outputname whenever we have deriver, except for backwards compat with old infos?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW it looks like this field is not actually used anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I've added it for an experiment and forgot to remove it. This field doesn't really make any sense (even the deriver field should probably be deprecated as a store path can now have several derivers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah once we upload the trust mapping, we'll probably want to key it by the derivation not the output path, which also helps different signatures sign for different derivers.

@@ -115,6 +115,7 @@ struct ValidPathInfo
{
StorePath path;
std::optional<StorePath> deriver;
std::optional<string> outputname;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems incorrect to have an outputName field in ValidPathInfo. A CA store path can be produced by differently named output of different derivations (e.g. the "foo" output of derivation A could produce the same contents as the "bar" output of derivation B).

state->stmtAddDerivationOutput.use()
(queryValidPathId(*state, deriver))
(outputName)
(printStorePath(output))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output is an "actual" output here, right? In that case, doesn't this cause confusion wrt the meaning of the DerivationOutputs table?

narFromPath expects a valid path as input (and making it more general
would be complicated as it's a general method of the `Store` class), so
we just inline the bit of `makeContentAddressed` that we need (which is
bigger than what I though. Maybe I'm trying to retro-fit functional
idioms for which cpp isn't designed)
@thufschmitt thufschmitt added the ca-derivations Derivations with content addressed outputs label Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ca-derivations Derivations with content addressed outputs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants