Properly store the outputs of CA derivations − take 2 #4330

thufschmitt · 2020-12-08T15:05:24Z

Reopening of #4315 with a branch from the nixos/nix repo just to allow dpulls to work properly

Ericson2314 · 2020-12-08T15:39:33Z

Well this is a boring thing to bring up, but what do all you say we use JSON for the new binary cache files? It would be nice to avoid these bespoke parsers a bit.

thufschmitt · 2020-12-08T15:43:54Z

Well this is a boring thing to bring up, but what do all you say we use JSON for the new binary cache files? It would be nice to avoid these bespoke parsers a bit.

Good point… I kept the same syntax as the narinfo files for consistency, but I don't care about changing that − might even be nice to have a more structured format to store lists of things or the like.

I'd rather not have it block this PR though ;)

edolstra · 2020-12-08T15:47:11Z

Yes, I'm in favor of using JSON for new types of files (in fact we're using it for NAR listings in the binary cache).

Ericson2314 · 2020-12-08T17:00:59Z

src/libstore/binary-cache-store.cc

+    auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) +
+        "!" + info.id.outputName + ".doi";


what do you all think about nesting further like this?

Suggested change

auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) +

"!" + info.id.outputName + ".doi";

auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) +

"/" + info.id.outputName + ".doi";

rather than using !

I used ! because that's the already standard delimiter for derivation outputs, so it makes sense to use the same on binary caches − I've actually pushed 391f753 to make this use id.to_string() directly which makes this more natural.

That being said, I don't have a very strong opinion, so a / could be good to

We could even use a / for id.to_string(), but I like the fact that ! gives a clear message of this isn't a store path, don't try to use it as such

thufschmitt · 2020-12-08T17:24:27Z

Yes, I'm in favor of using JSON for new types of files (in fact we're using it for NAR listings in the binary cache).

Ok, I've opened #4332 to track this

src/libstore/build/derivation-goal.cc

Ericson2314 · 2020-12-08T20:21:58Z

@regnat does this make any changes w.r.t. storing the unresolved derivations? I think it doesn't but I am not yet sure. Of course, I'll be very happy of it doesn't :).

thufschmitt · 2020-12-09T05:38:14Z

@regnat does this make any changes w.r.t. storing the unresolved derivations? I think it doesn't but I am not yet sure. Of course, I'll be very happy of it doesn't :).

Nah that one doesn't

Ericson2314 · 2020-12-09T16:57:59Z

Nah that one doesn't

Great!

This introduces some redundancy on the remote-store side between
wopQueryDerivationOutputMap and wopQueryDrvOutputInfo.
However we might need to keep both (regardless of backwards compat)
because we sometimes need to get some infos for all the outputs of a
derivation (where wopQueryDerivationOutputMap is handy), but all the
stores can't implement it − because listing all the outputs of a
derivation isn't really possible for binary caches where the server
doesn't allow to list a directory.

What if we stored one map per derivation with all the outputs in the binary caches?

(Also, I think that was renamed wopQueryRealisation?)

thufschmitt · 2020-12-09T17:10:52Z

This introduces some redundancy on the remote-store side between
wopQueryDerivationOutputMap and wopQueryDrvOutputInfo.
However we might need to keep both (regardless of backwards compat)
because we sometimes need to get some infos for all the outputs of a
derivation (where wopQueryDerivationOutputMap is handy), but all the
stores can't implement it − because listing all the outputs of a
derivation isn't really possible for binary caches where the server
doesn't allow to list a directory.

What if we stored one map per derivation with all the outputs in the binary caches?

That was my initial idea, but it's possible to only push a subset of the outputs to the binary cache, meaning that if you add some more afterwards you'd have to update the entry, and that would break the − really nice property imho − that everything in the binary cache is immutable (we could also use a storing scheme like drvHash/outputName like you suggest above and list the directories, but not every cache permits listing directories − and I think not requiring that is also a very cool property that would be sad to loose).

(Also, I think that was renamed wopQueryRealisation?)

Someone should make a variant of sed that also updates the commit messages to make sure that they accurately reflect your changes 😒

Ericson2314 · 2020-12-09T17:16:52Z

@regnat well what if you had to push the mapping for all outputs whenever you pushed any of them? Fundamentally, the mappings all trace back to builds, and it's impossible to just build some of the derivations, so I think this could actually work.

I think unlike the local store, we might very much want to have a relation not function for the binary cache. This would allow pushing conflicting mappings when we have some outputs and other outputs from another, in a particularly honest fashion. I guess we could just do <drv-hash>/<mapping-file-checksum> or something to avoid collisions or the pain of dealing with incremented numbers.

thufschmitt · 2020-12-09T17:54:02Z

@regnat well what if you had to push the mapping for all outputs whenever you pushed any of them? Fundamentally, the mappings all trace back to builds, and it's impossible to just build some of the derivations, so I think this could actually work.

Up to a certain point, yes, except that you can nix-build foo.out && nix-collect-garbage && nix copy --to binary-cache foo.out (or just nix-build foo.out && nix copy --to binary-cache foo.out where foo.out is already in a different binary cache so you don't have to build foo.dev).

I think unlike the local store, we might very much want to have a relation not function for the binary cache. This would allow pushing conflicting mappings when we have some outputs and other outputs from another, in a particularly honest fashion. I guess we could just do <drv-hash>/<mapping-file-checksum> or something to avoid collisions or the pain of dealing with incremented numbers.

I'm not sure I get exactly how that would work… do you mind expanding a bit further how you consider things? (If possible in a new issue as this PR doesn't do much for binary caches except for a basic and probably not definitive implementation of queryRealisation)

Ericson2314 · 2020-12-09T18:18:36Z

@regnat Sure I can expand elsewhere, maybe let's just merge this, as long as we are happy of all manner of breaking changes elsewhere. queryRealisation is guarded by the experimental issue so I think we're good.

thufschmitt · 2020-12-09T18:30:47Z

@regnat Sure I can expand elsewhere, maybe let's just merge this, as long as we are happy of all manner of breaking changes elsewhere. queryRealisation is guarded by the experimental issue so I think we're good.

I'm definitely happy with any breaking change that migh (and will) happen!

I've rebased to fix the mention of wopQueryDrvOutputInfo commit message.

Ericson2314 · 2020-12-10T00:13:49Z

I made the multiple outputs issue: #4344. [Side note, I wonder if I could get enough perms to at label the issue and whatnot?]

edolstra · 2020-12-10T13:38:40Z

src/libstore/ca-specific-schema.sql

+-- Won't be loaded unless the experimental feature `ca-derivations`
+-- is enabled
+
+create table if not exists Realisations (


The word "realisations" seems a bit confusing. What does it realise? In Nix, "realise" is already a vague and overloaded term for "substituting or building a derivation". There is also a terminology mismatch between registerDrvOutput and queryRealisation - shouldn't that be registerRealisation or queryDrvOutput?

How about CADerivers or CADerivationOutputs?

There's indeed a terminology mismatch and we need to settle on one thing.

I really like Realisation tbh because these are precisely witnesses that a derivation has been realised (for the definition you give above). But it's indeed a quite overloaded term so if you don't like it we can find something else.

I tend to use DerivationOutput only for the “symbolic” foo.drv!out because it's also a very overloaded term and it's easy to get lost if we abuse of it.

(also, they aren't really Derivers (though they could be used as such) because the intent is to associate derivation outputs (foo.drv!out) to their output paths and not the other way around)

TL;DR, I'm fine changing the naming, but for my own sanity (and other's too) I think we need to find clear and distinct names for

foo.drv!out

The relation (foo.drv!out, /nix/store/123-foo-out)

edolstra · 2020-12-10T13:45:13Z

src/libstore/ca-specific-schema.sql

+    id integer primary key autoincrement not null,
+    drvPath text not null,
+    outputName text not null, -- symbolic output id, usually "out"


This doesn't guarantee that (drvPath, outputName) is unique. That seems okay (since we could build a derivation multiple times and get a different result) but if we want to ensure uniqueness, we could have (drvPath, outputName) as the primary key and drop id.

Oh thanks I should have caught this. Yes let's start as stringent as possible and only relax later if need be.

Indeed. I'll drop the id for now

edolstra · 2020-12-10T13:46:43Z

src/libstore/ca-specific-schema.sql

+    foreign key (outputPath) references ValidPaths(id) on delete cascade
+);
+
+create index if not exists IndexRealisations on Realisations(outputPath);


Don't we also need an index on (drvPath, outputName)?

Of course. Actually the index on outputPath is quite useless

src/libstore/local-store.cc

edolstra · 2020-12-10T13:51:25Z

src/libstore/local-store.hh

@@ -21,7 +21,7 @@ namespace nix {
   0.7.  Version 2 was Nix 0.8 and 0.9.  Version 3 is Nix 0.10.
   Version 4 is Nix 0.11.  Version 5 is Nix 0.12-0.16.  Version 6 is
   Nix 1.0.  Version 7 is Nix 1.3. Version 10 is 2.0. */
-const int nixSchemaVersion = 10;
+const int nixSchemaVersion = 11;


We don't need to bump the schema version now that we have a sub-version for the CA-related stuff, right?

Nope indeed

edolstra · 2020-12-10T13:59:50Z

src/libstore/binary-cache-store.hh

@@ -99,6 +102,10 @@ public:
    StorePath addTextToStore(const string & name, const string & s,
        const StorePathSet & references, RepairFlag repair) override;

+    void registerDrvOutput(const Realisation & info) override;
+
+    std::optional<const Realisation> queryRealisation(const DrvOutput &) override;


This should be a std::vector<Realisation> since there could be multiple realisations for a particular derivation+output-name. Even if we don't really support multiple realisations at the moment, it's probably good to express the possibility in the interface. When we need a particular realisation for a dependency (when building) we can just pick the first one for now. But in the future we could have an algorithm that tries to pick realisations that minimise the amount of closure rewriting needed to get a consistent input closure.

If you're okay with it, I'll rather just change the daemon protocol to allow passing a list as it's the only place in which we can't break backwards compat for free, and it makes more sense to me to make it clear that we can only return one realisation for the time being as long as there's nothing preventing us from changing that in the future

edolstra · 2020-12-10T14:01:50Z

src/libstore/binary-cache-store.cc

@@ -443,6 +443,23 @@ StorePath BinaryCacheStore::addTextToStore(const string & name, const string & s
    })->path;
 }

+std::optional<const Realisation> BinaryCacheStore::queryRealisation(const DrvOutput & id)
+{
+    auto outputInfoFilePath = realisationsPrefix + "/" + id.to_string() + ".doi";


What does doi mean?

BTW this URL format means that there can be only one mapping per realisation, which is probably fine for a binary cache, but it does mean you can't take the union of two binary caches (e.g. a mirror of multiple upstream binary caches) - or at least not of their /realisations directory.

doi stands for “derivation output info” (it's indeed quite obscure).

Yes, I agree on the uniqueness constraint that might be somehow too strong. But I think we'll have time to rework this schema − it's experimental anyways, so nothing is set in any way, and it's not really clear to me atm how we could have a sensible implementation if we drop it.

thufschmitt

@edolstra Just gave a pass after your comments, I'll let you have a second look

thufschmitt · 2020-12-10T16:49:54Z

src/libstore/binary-cache-store.cc

@@ -443,6 +443,23 @@ StorePath BinaryCacheStore::addTextToStore(const string & name, const string & s
    })->path;
 }

+std::optional<const Realisation> BinaryCacheStore::queryRealisation(const DrvOutput & id)
+{
+    auto outputInfoFilePath = realisationsPrefix + "/" + id.to_string() + ".doi";


doi stands for “derivation output info” (it's indeed quite obscure).

Yes, I agree on the uniqueness constraint that might be somehow too strong. But I think we'll have time to rework this schema − it's experimental anyways, so nothing is set in any way, and it's not really clear to me atm how we could have a sensible implementation if we drop it.

thufschmitt · 2020-12-10T16:58:06Z

src/libstore/binary-cache-store.hh

@@ -99,6 +102,10 @@ public:
    StorePath addTextToStore(const string & name, const string & s,
        const StorePathSet & references, RepairFlag repair) override;

+    void registerDrvOutput(const Realisation & info) override;
+
+    std::optional<const Realisation> queryRealisation(const DrvOutput &) override;


If you're okay with it, I'll rather just change the daemon protocol to allow passing a list as it's the only place in which we can't break backwards compat for free, and it makes more sense to me to make it clear that we can only return one realisation for the time being as long as there's nothing preventing us from changing that in the future

thufschmitt · 2020-12-10T17:10:11Z

src/libstore/local-store.hh

@@ -21,7 +21,7 @@ namespace nix {
   0.7.  Version 2 was Nix 0.8 and 0.9.  Version 3 is Nix 0.10.
   Version 4 is Nix 0.11.  Version 5 is Nix 0.12-0.16.  Version 6 is
   Nix 1.0.  Version 7 is Nix 1.3. Version 10 is 2.0. */
-const int nixSchemaVersion = 10;
+const int nixSchemaVersion = 11;


Nope indeed

Ericson2314 · 2020-12-10T17:51:49Z

as it's [queryRealization in the daemon protocol is] the only place in which we can't break backwards compat for free

How come? Can we change all implementations to assert the experimental feature before doing anything else?

thufschmitt · 2020-12-10T18:54:32Z

as it's [queryRealization in the daemon protocol is] the only place in which we can't break backwards compat for free

How come? Can we change all implementations to assert the experimental feature before doing anything else?

Oh yes, you're right. Well, now it's implemented and the code is actually cleaner that way, so might as well keep it

Ericson2314 · 2020-12-10T20:02:20Z

Yeah I'm not against this new version (though I haven't closely looked), just think if we did #4344 we definitely would change it again.

For each known realisation, store: - its output - its output path This comes with a set of needed changes: - New `realisations` module declaring the types needed for describing these mappings - New `Store::registerDrvOutput` method registering all the needed informations about a derivation output (also replaces `LocalStore::linkDeriverToPath`) - new `Store::queryRealisation` method to retrieve the informations for a derivations This introcudes some redundancy on the remote-store side between `wopQueryDerivationOutputMap` and `wopQueryRealisation`. However we might need to keep both (regardless of backwards compat) because we sometimes need to get some infos for all the outputs of a derivation (where `wopQueryDerivationOutputMap` is handy), but all the stores can't implement it − because listing all the outputs of a derivation isn't really possible for binary caches where the server doesn't allow to list a directory.

Add a new table for tracking the derivation output mappings. We used to hijack the `DerivationOutputs` table for that, but (despite its name), it isn't a really good fit: - Its entries depend on the drv being a valid path, making it play badly with garbage collection and preventing us to copy a drv output without copying the whole drv closure too; - It dosen't guaranty that the output path exists; By using a different table, we can experiment with a different schema better suited for tracking the output mappings of CA derivations. (incidentally, this also fixes #4138)

Fix #4332

thufschmitt requested a review from edolstra December 8, 2020 15:05

thufschmitt mentioned this pull request Dec 8, 2020

Properly store the outputs of CA derivations #4315

Closed

Ericson2314 reviewed Dec 8, 2020

View reviewed changes

thufschmitt mentioned this pull request Dec 8, 2020

Use json for storing the realisations in the binary cache #4332

Closed

Ericson2314 reviewed Dec 8, 2020

View reviewed changes

src/libstore/build/derivation-goal.cc Outdated Show resolved Hide resolved

thufschmitt force-pushed the ca/properly-store-outputs branch from 5ad79e5 to 5bce6ad Compare December 8, 2020 19:11

thufschmitt mentioned this pull request Dec 8, 2020

Store the realisations as JSON in the binary cache #4333

Merged

thufschmitt mentioned this pull request Dec 9, 2020

Store the realisations of unresolved derivations #4340

Merged

thufschmitt self-assigned this Dec 9, 2020

thufschmitt added the ca-derivations Derivations with content addressed outputs label Dec 9, 2020

thufschmitt force-pushed the ca/properly-store-outputs branch from 5bce6ad to 4b4fc7f Compare December 9, 2020 18:28

Ericson2314 mentioned this pull request Dec 10, 2020

Should we allow spitting the derivation output map per output? #4344

Open

edolstra reviewed Dec 10, 2020

View reviewed changes

thufschmitt commented Dec 10, 2020

View reviewed changes

thufschmitt added this to the ca-derivations-mvp milestone Dec 11, 2020

thufschmitt mentioned this pull request Dec 11, 2020

Use the hash modulo in the derivation outputs #4348

Merged

thufschmitt added 2 commits December 11, 2020 20:41

thufschmitt force-pushed the ca/properly-store-outputs branch from 2557e8a to 3ac9d74 Compare December 11, 2020 19:44

Store the realisations as JSON in the binary cache

8914e01

Fix #4332

edolstra merged commit f2f60bf into master Dec 14, 2020

edolstra deleted the ca/properly-store-outputs branch December 14, 2020 14:01

rgrunbla mentioned this pull request May 31, 2021

Broken symlink when trying to create a fixed output derivation using flakes #4859

Open

ldesgoui mentioned this pull request Aug 1, 2021

Write Nix 2.4 release notes #4521

Closed

		auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) +
		"!" + info.id.outputName + ".doi";

Properly store the outputs of CA derivations − take 2 #4330

Properly store the outputs of CA derivations − take 2 #4330

Conversation

thufschmitt commented Dec 8, 2020 • edited

Ericson2314 commented Dec 8, 2020

thufschmitt commented Dec 8, 2020

edolstra commented Dec 8, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thufschmitt commented Dec 8, 2020 • edited

Ericson2314 commented Dec 8, 2020

thufschmitt commented Dec 9, 2020

Ericson2314 commented Dec 9, 2020

thufschmitt commented Dec 9, 2020

Ericson2314 commented Dec 9, 2020

thufschmitt commented Dec 9, 2020

Ericson2314 commented Dec 9, 2020 • edited

thufschmitt commented Dec 9, 2020

Ericson2314 commented Dec 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ericson2314 Dec 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thufschmitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ericson2314 commented Dec 10, 2020 • edited

thufschmitt commented Dec 10, 2020

Ericson2314 commented Dec 10, 2020

thufschmitt commented Dec 8, 2020 •

edited

edolstra commented Dec 8, 2020 •

edited

thufschmitt commented Dec 8, 2020 •

edited

Ericson2314 commented Dec 9, 2020 •

edited

Ericson2314 Dec 10, 2020 •

edited

Ericson2314 commented Dec 10, 2020 •

edited