Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

narinfo: Change NAR URLs to be addressed on the NAR hash instead of the compressed hash #4464

Merged
merged 1 commit into from Feb 9, 2021

Conversation

adisbladis
Copy link
Member

@adisbladis adisbladis commented Jan 18, 2021

This change is to simplify Trustix indexing and makes it possible to reconstruct this URL regardless of the compression used.

In particular this means that https://github.com/tweag/trustix/blob/7c2e9ca597de233846e0b265fb081626ca6c59d8/contrib/nix/nar/nar.go#L61-L71 can be removed and only the bits that are required to establish trust needs to be published in the Trustix build logs.

A notable change is that the compressed files no longer comes with a file extension indicating their compression. This should be fine as the associated narinfo file has the compression metadata.
For applications where this is not feasible (like Trustix) the application could check the compression by the magic numbers.

I have manually tested if my local version of Nix (2.3.10) can still substitute from a binary cache created after this change and everything still works.

Related issues

Acknowledgements

This work is done within the context of Trustix and has been sponsored by a NLNet Foundation NGI-0 PET grant.

cc @Ericson2314 @domenkozar

@zimbatm
Copy link
Member

zimbatm commented Jan 19, 2021

Nice. Did you check what content-types were being used on upload to S3?

@adisbladis
Copy link
Member Author

Nice. Did you check what content-types were being used on upload to S3?

The content type is always set to application/x-nix-nar regardless of the compression used.
The compression can either be fetched from the narinfo or inferred by looking at the magic numbers.

…he compressed hash

This change is to simplify [Trustix](https://github.com/tweag/trustix) indexing and makes it possible to reconstruct this URL regardless of the compression used.

In particular this means that https://github.com/tweag/trustix/blob/7c2e9ca597de233846e0b265fb081626ca6c59d8/contrib/nix/nar/nar.go#L61-L71 can be removed and only the bits that are required to establish trust needs to be published in the Trustix build logs.
@edolstra
Copy link
Member

The problem with this scheme is that it makes the files in nar/ no longer content-addressed (i.e. they're addressed by the uncompressed contents rather than their actual contents). One result is that if you re-upload a NAR with a different compression algorithm or settings, it can overwrite a previously existing NAR. Since .narinfo files have a FileHash field that specifies the hash of the compressed file, this invalidates the old .narinfo files. So we would have to get rid of the FileHash, FileSize and Compression fields. (It could be that the current implementation no longer checks FileHash, but Compression would still be a problem.)

@adisbladis
Copy link
Member Author

adisbladis commented Jan 22, 2021

The problem with this scheme is that it makes the files in nar/ no longer content-addressed (i.e. they're addressed by the uncompressed contents rather than their actual contents).

I don't think this is semantically meaningful.

One result is that if you re-upload a NAR with a different compression algorithm or settings, it can overwrite a previously existing NAR.
Since .narinfo files have a FileHash field that specifies the hash of the compressed file, this invalidates the old .narinfo files.

Afaik Nix checks for the remote store narinfo before copying. This should be a no-op on existing narinfos.
Confirmed: Re-copying does not overwrite existing narinfo.

So we would have to get rid of the FileHash, FileSize and Compression fields. (It could be that the current implementation no longer checks FileHash, but Compression would still be a problem.)

I'm not proposing getting rid of these entirely, at least not in this iteration.
What I want to do is to be able to ignore them and uncompress NARs in a binary cache proxy, setting these fields:

FileHash: NarHash
FileSize: FileHash
Compression: none

@grahamc
Copy link
Member

grahamc commented Jan 22, 2021

I'm not sure on this, the file being named for what it is makes it easy to tell if the file is valid from an operational point of view. Additionally, I'm not sure it is a good idea: the structure presented in Nix is just one possible layout of NARs in the cache. It is very easy to imagine a binary cache which segments its data in a nar/68/sm/67/lc/d4/68sm67lcd4pnmyhijpyh134a7ykgyjhq... structure. The narinfo file gives cache operators this flexibility, and I'm not keen on Trustix solidifying it in stone.

@edolstra
Copy link
Member

edolstra commented Feb 9, 2021

I just had a discussion with @adisbladis about this and decided to merge. Since Trustix stores trust mappings from store paths to NAR hashes, there is no way to get the URL of the compressed file in arbitrary binary caches if it's not named after the NAR hash.

The main remaining issue is that we need to start ignoring the Compression field but we can do that in a future PR.

@edolstra edolstra merged commit ee3846b into NixOS:master Feb 9, 2021
@edolstra edolstra deleted the nar-narhash-addressed branch February 9, 2021 13:47
@grahamc
Copy link
Member

grahamc commented Feb 9, 2021

I feel this is a big loss, but okay. Does this mean binary caches which don't store this way can't participate in trustix? I guess I wish Trustix didn't ad-hoc change properties of Nix's binary caches without a bit more discussion, and I feel disappointed I couldn't participate in representing my concerns further.

@adisbladis
Copy link
Member Author

adisbladis commented Feb 9, 2021

Does this mean binary caches which don't store this way can't participate in trustix

You could participate purely to track reproducibility, but actually using it as an upstream would be unwieldy.

Without a canonical address to access the NARs it's not possible for a binary cache proxy to serve NARs from multiple caches as compression schemes may differ, and therefore URLs are unpredictable, even though the actual unpacked contents are identical.

This isn't just an issue for Trustix, but for anyone who wants to build a highly available binary cache proxy substituting from multiple upstreams.

@grahamc
Copy link
Member

grahamc commented Feb 9, 2021

My main concerns here are how quickly the Nix codebase adopted this change, and how little discussion there was in how this could impact other use cases. This could easily spread as an assumption in other parts of the codebase, breaking use cases that are important to me. I'm not interested in building on sand, and I feel like my world has become sand.

@domenkozar
Copy link
Member

domenkozar commented Feb 9, 2021

I want to raise a few problems with this:

  1. backwards compatibility becomes harder, since all code now needs to handle both cases

  2. it's not possible to have two compressions for the same nar inside the binary cache anymore (because nar URL is now pre-compression hash addressed and contains no extension)

  3. it's now expensive (decompress & hash) to verify hash of nar file

Not saying that those are blockers, but they do affect our future decisions.

@adisbladis
Copy link
Member Author

adisbladis commented Feb 9, 2021

To elaborate a bit further:

When a request for a NAR comes in, and that URL is not strictly reproducible our cache proxy would have to store a substantial amount of state to be able to go from the non-reproducible URL back to the store path prefix so we can actively poll known binary caches for this content to get the actual URLs.
I'm not even sure this is practically feasible.

@grahamc
Copy link
Member

grahamc commented Feb 9, 2021

Additionally:

  • it is very convenient to have builders upload directly to the cache using pre-signed S3 tokens which authorize writes to a single content-addressed URL using s3:x-amz-content-sha256
  • this puts another road block in to having NARs located on separate domains, which is something I've wanted to have support for
  • it implies cache content can never move, and that narinfos are never updated

I would much prefer binary cache proxies follow what I feel is the spec,: query the upstream cache for the narinfo to locate the nar, instead of trying to play games reconstructing URLs.

@grahamc
Copy link
Member

grahamc commented Feb 9, 2021

substantial amount of state to be able to go from the non-reproducible URL back to the store path prefix so we can actively poll known binary caches

You only need to know the derivation's hash, since that is how you look up the narinfo to find the nar... which is how binary caches work.

@adisbladis
Copy link
Member Author

I would much prefer binary cache proxies follow what I feel is the spec,: query the upstream cache for the narinfo to locate the nar, instead of trying to play games reconstructing URLs.

That's not practical at all as I've already explained because of the massive amounts of state required.
Once you've served up the narinfo and the client has cached that NAR URL you cannot do high availability across caches.

this puts another road block in to having NARs located on separate domains, which is something I've wanted to have support for

You could implement this with redirects, additionally this doesn't need to invalidate any client caches.

@grahamc
Copy link
Member

grahamc commented Feb 9, 2021

I don't understand why you need a lot of state to look up the narinfo file on a binary cache.

@adisbladis
Copy link
Member Author

You only need to know the derivation's hash, since that is how you look up the narinfo to find the nar... which is how binary caches work.

There is an unbounded time period between when a client requests (and caches) a narinfo and when it (possibly) requests a NAR.
Caches may have uploaded the package in question in the mean time and uploaded them using a variety of compression schemes, resulting in different NAR URLs.

When the request for the NAR finally comes in we have lost the context of which narinfo this NAR is associated with, so it's not possible to poll other binary caches for the same NAR but with a different compression.
This is where we'd be required to store state mapping NAR URLs back into store path prefixes, otherwise we're lacking enough metadata.

@rbvermaa
Copy link
Member

rbvermaa commented Feb 9, 2021

It looks to me that this change might be a bit more controversial was orginally thought, given the activity post-merge on this PR.

Given that @grahamc and @domenkozar have both expressed (some of) their hesitations/problems with the change, would it be perhaps a good idea to revert this change for now and start an RFC for this?

@grahamc
Copy link
Member

grahamc commented Feb 9, 2021

I would like for that to happen, @rbvermaa -- changing the semantics and expectations of the binary cache like this seems significant enough to warrant an RFC. To that end, I've opened a revert PR: #4535

@rbvermaa
Copy link
Member

It looks to me that this change might be a bit more controversial was orginally thought, given the activity post-merge on this PR.

Given that @grahamc and @domenkozar have both expressed (some of) their hesitations/problems with the change, would it be perhaps a good idea to revert this change for now and start an RFC for this?

@adisbladis I noticed you reacted with a thumbs down, which means you do not agree with what I suggest. Could you please elaborate on why you do not agree? Do you think the change is not controversial, or perhaps you have another reason?

@edolstra Would you mind commenting on your view?

@adisbladis
Copy link
Member Author

adisbladis commented Feb 10, 2021

@adisbladis I noticed you reacted with a thumbs down, which means you do not agree with what I suggest. Could you please elaborate on why you do not agree? Do you think the change is not controversial, or perhaps you have another reason?

Going straight from "needs more discussion" to "write an RFC" is extreme and an excellent way to quash dissenting view points as it's puts an unreasonable burden on the author to push their work forward.

At this moment (this is just a quick summary, any values should be seen as rough estimates) the average time to get an RFC merged is 113 days and a median of 48 days.
This time is obviously not including the time to actually write an RFC.
That's a significant hurdle only a handful of people are willing to go through, out of the currently 22 merged RFCs there were only 13 authors.

I think reverting with the reason "needs more discussion" and report an issue is annoying but within the reasonable realm of responses, going straight for an RFC isn't.
If it turns out an RFC is needed after some discussion, then so be it.

@grahamc
Copy link
Member

grahamc commented Feb 10, 2021

I don't follow the reasoning. RFCs are to explore the design space and come up with a great solution. I don't see it being extreme, but I suppose I also don't see the RFC process as punishment. I think big changes should get proper consideration and time for the large community of users to contribute to it. I would hope that we as a project can follow the process we have agreed to for exploring significant changes.

@rbvermaa
Copy link
Member

rbvermaa commented Feb 10, 2021

@adisbladis I noticed you reacted with a thumbs down, which means you do not agree with what I suggest. Could you please elaborate on why you do not agree? Do you think the change is not controversial, or perhaps you have another reason?

Going straight from "needs more discussion" to "write an RFC" is extreme and an excellent way to quash dissenting view points as it's puts an unreasonable burden on the author to push their work forward.

At this moment (this is just a quick summary, any values should be seen as rough estimates) the average time to get an RFC merged is 113 days and a median of 48 days.
This time is obviously not including the time to actually write an RFC.
That's a significant hurdle only a handful of people are willing to go through, out of the currently 22 merged RFCs there were only 13 authors.

I think reverting with the reason "needs more discussion" and report an issue is annoying but within the reasonable realm of responses, going straight for an RFC isn't.
If it turns out an RFC is needed after some discussion, then so be it.

Perhaps I should have suggested two things in stead:

  • revert the change so that more discussion can happen, as the change was more controversial than originally anticipated
  • ask for a discussion whether an RFC is needed for the change during the previously mentioned discussion

@adisbladis Does that make more sense, and do we agree on that path forward? (edited)

@adisbladis
Copy link
Member Author

Perhaps I should have suggested two things in stead: …

This sounds like a much more reasonable path forwards.

@edolstra
Copy link
Member

Okay, let's revert until after the 2.4 branch-off for now. We can continue to discuss it here.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-7/11552/1

adisbladis added a commit to nix-community/trustix that referenced this pull request Mar 2, 2021
We can work around the deficiency that I tried to fix in
NixOS/nix#4464 by constructing URLs which
embed all the metadata until such a time that upstream Nix has a truly
content-addressed NAR solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants