New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly store the outputs of CA derivations − take 2 #4330
Conversation
Well this is a boring thing to bring up, but what do all you say we use JSON for the new binary cache files? It would be nice to avoid these bespoke parsers a bit. |
Good point… I kept the same syntax as the narinfo files for consistency, but I don't care about changing that − might even be nice to have a more structured format to store lists of things or the like. I'd rather not have it block this PR though ;) |
Yes, I'm in favor of using JSON for new types of files (in fact we're using it for NAR listings in the binary cache). |
src/libstore/binary-cache-store.cc
Outdated
auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) + | ||
"!" + info.id.outputName + ".doi"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you all think about nesting further like this?
auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) + | |
"!" + info.id.outputName + ".doi"; | |
auto filePath = "/drvOutputs/" + std::string(info.id.drvPath.hashPart()) + | |
"/" + info.id.outputName + ".doi"; |
rather than using !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used !
because that's the already standard delimiter for derivation outputs, so it makes sense to use the same on binary caches − I've actually pushed 391f753 to make this use id.to_string()
directly which makes this more natural.
That being said, I don't have a very strong opinion, so a /
could be good to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even use a /
for id.to_string()
, but I like the fact that !
gives a clear message of this isn't a store path, don't try to use it as such
Ok, I've opened #4332 to track this |
5ad79e5
to
5bce6ad
Compare
@regnat does this make any changes w.r.t. storing the unresolved derivations? I think it doesn't but I am not yet sure. Of course, I'll be very happy of it doesn't :). |
Nah that one doesn't |
Great!
What if we stored one map per derivation with all the outputs in the binary caches? (Also, I think that was renamed |
That was my initial idea, but it's possible to only push a subset of the outputs to the binary cache, meaning that if you add some more afterwards you'd have to update the entry, and that would break the − really nice property imho − that everything in the binary cache is immutable (we could also use a storing scheme like
Someone should make a variant of |
@regnat well what if you had to push the mapping for all outputs whenever you pushed any of them? Fundamentally, the mappings all trace back to builds, and it's impossible to just build some of the derivations, so I think this could actually work. I think unlike the local store, we might very much want to have a relation not function for the binary cache. This would allow pushing conflicting mappings when we have some outputs and other outputs from another, in a particularly honest fashion. I guess we could just do |
Up to a certain point, yes, except that you can
I'm not sure I get exactly how that would work… do you mind expanding a bit further how you consider things? (If possible in a new issue as this PR doesn't do much for binary caches except for a basic and probably not definitive implementation of |
@regnat Sure I can expand elsewhere, maybe let's just merge this, as long as we are happy of all manner of breaking changes elsewhere. |
5bce6ad
to
4b4fc7f
Compare
I'm definitely happy with any breaking change that migh (and will) happen! I've rebased to fix the mention of |
I made the multiple outputs issue: #4344. [Side note, I wonder if I could get enough perms to at label the issue and whatnot?] |
-- Won't be loaded unless the experimental feature `ca-derivations` | ||
-- is enabled | ||
|
||
create table if not exists Realisations ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "realisations" seems a bit confusing. What does it realise? In Nix, "realise" is already a vague and overloaded term for "substituting or building a derivation". There is also a terminology mismatch between registerDrvOutput
and queryRealisation
- shouldn't that be registerRealisation
or queryDrvOutput
?
How about CADerivers
or CADerivationOutputs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's indeed a terminology mismatch and we need to settle on one thing.
I really like Realisation
tbh because these are precisely witnesses that a derivation has been realised (for the definition you give above). But it's indeed a quite overloaded term so if you don't like it we can find something else.
I tend to use DerivationOutput
only for the “symbolic” foo.drv!out
because it's also a very overloaded term and it's easy to get lost if we abuse of it.
(also, they aren't really Derivers
(though they could be used as such) because the intent is to associate derivation outputs (foo.drv!out
) to their output paths and not the other way around)
TL;DR, I'm fine changing the naming, but for my own sanity (and other's too) I think we need to find clear and distinct names for
foo.drv!out
- The relation
(foo.drv!out, /nix/store/123-foo-out)
src/libstore/ca-specific-schema.sql
Outdated
id integer primary key autoincrement not null, | ||
drvPath text not null, | ||
outputName text not null, -- symbolic output id, usually "out" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't guarantee that (drvPath, outputName)
is unique. That seems okay (since we could build a derivation multiple times and get a different result) but if we want to ensure uniqueness, we could have (drvPath, outputName)
as the primary key and drop id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks I should have caught this. Yes let's start as stringent as possible and only relax later if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I'll drop the id
for now
src/libstore/ca-specific-schema.sql
Outdated
foreign key (outputPath) references ValidPaths(id) on delete cascade | ||
); | ||
|
||
create index if not exists IndexRealisations on Realisations(outputPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need an index on (drvPath, outputName)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. Actually the index on outputPath
is quite useless
src/libstore/local-store.hh
Outdated
@@ -21,7 +21,7 @@ namespace nix { | |||
0.7. Version 2 was Nix 0.8 and 0.9. Version 3 is Nix 0.10. | |||
Version 4 is Nix 0.11. Version 5 is Nix 0.12-0.16. Version 6 is | |||
Nix 1.0. Version 7 is Nix 1.3. Version 10 is 2.0. */ | |||
const int nixSchemaVersion = 10; | |||
const int nixSchemaVersion = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to bump the schema version now that we have a sub-version for the CA-related stuff, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope indeed
@@ -99,6 +102,10 @@ public: | |||
StorePath addTextToStore(const string & name, const string & s, | |||
const StorePathSet & references, RepairFlag repair) override; | |||
|
|||
void registerDrvOutput(const Realisation & info) override; | |||
|
|||
std::optional<const Realisation> queryRealisation(const DrvOutput &) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a std::vector<Realisation>
since there could be multiple realisations for a particular derivation+output-name. Even if we don't really support multiple realisations at the moment, it's probably good to express the possibility in the interface. When we need a particular realisation for a dependency (when building) we can just pick the first one for now. But in the future we could have an algorithm that tries to pick realisations that minimise the amount of closure rewriting needed to get a consistent input closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're okay with it, I'll rather just change the daemon protocol to allow passing a list as it's the only place in which we can't break backwards compat for free, and it makes more sense to me to make it clear that we can only return one realisation for the time being as long as there's nothing preventing us from changing that in the future
@@ -443,6 +443,23 @@ StorePath BinaryCacheStore::addTextToStore(const string & name, const string & s | |||
})->path; | |||
} | |||
|
|||
std::optional<const Realisation> BinaryCacheStore::queryRealisation(const DrvOutput & id) | |||
{ | |||
auto outputInfoFilePath = realisationsPrefix + "/" + id.to_string() + ".doi"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does doi
mean?
BTW this URL format means that there can be only one mapping per realisation, which is probably fine for a binary cache, but it does mean you can't take the union of two binary caches (e.g. a mirror of multiple upstream binary caches) - or at least not of their /realisations
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doi
stands for “derivation output info” (it's indeed quite obscure).
Yes, I agree on the uniqueness constraint that might be somehow too strong. But I think we'll have time to rework this schema − it's experimental anyways, so nothing is set in any way, and it's not really clear to me atm how we could have a sensible implementation if we drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edolstra Just gave a pass after your comments, I'll let you have a second look
@@ -443,6 +443,23 @@ StorePath BinaryCacheStore::addTextToStore(const string & name, const string & s | |||
})->path; | |||
} | |||
|
|||
std::optional<const Realisation> BinaryCacheStore::queryRealisation(const DrvOutput & id) | |||
{ | |||
auto outputInfoFilePath = realisationsPrefix + "/" + id.to_string() + ".doi"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doi
stands for “derivation output info” (it's indeed quite obscure).
Yes, I agree on the uniqueness constraint that might be somehow too strong. But I think we'll have time to rework this schema − it's experimental anyways, so nothing is set in any way, and it's not really clear to me atm how we could have a sensible implementation if we drop it.
@@ -99,6 +102,10 @@ public: | |||
StorePath addTextToStore(const string & name, const string & s, | |||
const StorePathSet & references, RepairFlag repair) override; | |||
|
|||
void registerDrvOutput(const Realisation & info) override; | |||
|
|||
std::optional<const Realisation> queryRealisation(const DrvOutput &) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're okay with it, I'll rather just change the daemon protocol to allow passing a list as it's the only place in which we can't break backwards compat for free, and it makes more sense to me to make it clear that we can only return one realisation for the time being as long as there's nothing preventing us from changing that in the future
src/libstore/local-store.hh
Outdated
@@ -21,7 +21,7 @@ namespace nix { | |||
0.7. Version 2 was Nix 0.8 and 0.9. Version 3 is Nix 0.10. | |||
Version 4 is Nix 0.11. Version 5 is Nix 0.12-0.16. Version 6 is | |||
Nix 1.0. Version 7 is Nix 1.3. Version 10 is 2.0. */ | |||
const int nixSchemaVersion = 10; | |||
const int nixSchemaVersion = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope indeed
How come? Can we change all implementations to assert the experimental feature before doing anything else? |
Oh yes, you're right. Well, now it's implemented and the code is actually cleaner that way, so might as well keep it |
Yeah I'm not against this new version (though I haven't closely looked), just think if we did #4344 we definitely would change it again. |
For each known realisation, store: - its output - its output path This comes with a set of needed changes: - New `realisations` module declaring the types needed for describing these mappings - New `Store::registerDrvOutput` method registering all the needed informations about a derivation output (also replaces `LocalStore::linkDeriverToPath`) - new `Store::queryRealisation` method to retrieve the informations for a derivations This introcudes some redundancy on the remote-store side between `wopQueryDerivationOutputMap` and `wopQueryRealisation`. However we might need to keep both (regardless of backwards compat) because we sometimes need to get some infos for all the outputs of a derivation (where `wopQueryDerivationOutputMap` is handy), but all the stores can't implement it − because listing all the outputs of a derivation isn't really possible for binary caches where the server doesn't allow to list a directory.
Add a new table for tracking the derivation output mappings. We used to hijack the `DerivationOutputs` table for that, but (despite its name), it isn't a really good fit: - Its entries depend on the drv being a valid path, making it play badly with garbage collection and preventing us to copy a drv output without copying the whole drv closure too; - It dosen't guaranty that the output path exists; By using a different table, we can experiment with a different schema better suited for tracking the output mappings of CA derivations. (incidentally, this also fixes #4138)
2557e8a
to
3ac9d74
Compare
Reopening of #4315 with a branch from the nixos/nix repo just to allow dpulls to work properly
Fix #4138