buildBazelPackage: poison all the fixed output derivations #80953

andir · 2020-02-24T12:54:45Z

Motivation for this change

This is an attempt at solving a long annoyance with buildBazelPackage.

Currently each fixed output derivation can happily be used throughout many different bazel versions. That might be good as the fetched containt might not have changed betwen (minor) bazel version changes. The risk here is that we update bazel, rebuild everything but are not longer able to actually reproduce the fixed output. This is currently the case with the Tensorflow expression.

We did update TF and/or bazel and right now (on master) it still builds but if you try to reproduce the fixed output content it doesn't work anymore.

With this PR I attempt to mitigate some of the issues with those fetchers. Each of the fixed output derivations will contain the name string of the bazel package that was used to build them. On the consuming end within buildBazelPackage that version string is compared against the current bazel name. If those do not match the build fails. This will avoid silently loosing the ability to reproduce the fixed output paths (in some cases).

I also added the ability to override the Bazel package that should be used for each build. This will be use in some future work.

Currently the TF build does not produce a valid fixed output (anymore?). This will likely require another Bazel version or yet another patch to make it work with the newer Bazel version. I would prefer investing time into having another Bazel version within Nixpkgs.

Opinions?

Previous discussions on this topic were in #76851

Future work

We should verify that all the store paths that are being referenced from within the fixed output derivation are still valid. If they are not valid we should fail the build.
This is feasible and very naïve but would fail basically the day (after) the updated derivation hit Nixpkgs as something along the dependency path has been changed and thus invalidated all the store paths.
Another line of work is to have multiple versions of bazel within nixpkgs. This avoid us having to patch upstream sources to work with our always changing version of Bazel. It shouldn't harm anyone if we support multiple Bazel versions. We are doing this with compilers and various other things already.

Things done

Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
Built on platform(s)
- NixOS
- macOS
Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
Tested execution of all binary files (usually in ./result/bin/)
Determined the impact on package closure size (by running nix path-info -S before and after)
Ensured that relevant documentation is up to date
[] Fits CONTRIBUTING.md.

Profpatsch · 2020-02-24T14:50:55Z

The risk here is that we update bazel, rebuild everything but are not longer able to actually reproduce the fixed output.

Does this mean that if you try to build it instead of fetching it from binary cache, the output hashes don’t match?

* We should verify that all the store paths that are being referenced from within the fixed output derivation are still valid. If they are not valid we should fail the build.

I’m assuming “hashes still valid” means that if you rebuild a fixed-output drv, it produces the specified output hash.

If those do not match the build fails. This will avoid silently loosing the ability to reproduce the fixed output paths (in some cases).

So this is just a faster error path, because it disallows using a fixed-output package that was specified for a different bazel version?

andir · 2020-02-24T15:34:59Z

The risk here is that we update bazel, rebuild everything but are not longer able to actually reproduce the fixed output.

Does this mean that if you try to build it instead of fetching it from binary cache, the output hashes don’t match?

Yeah, that is the exact case we are running into with TF right now. You would probably have to go back to Bazel 0.x (and the correct nixpkgs revision?) to reproduce the output.

* We should verify that all the store paths that are being referenced from within the fixed output derivation are still valid. If they are not valid we should fail the build.
I’m assuming “hashes still valid” means that if you rebuild a fixed-output drv, it produces the specified output hash.

Yes, currently we never check that since we are just happily fetching from the binary cache. There might be references to the store that were valid during "fixed output build time" (e.g. a year ago) but do not match with the currently available paths. This is hard to solve as e.g. bash might be referenced in the fixed output and that regularly changes whenever the the stdenv changes. That is also why I would like to disallow any store path references in the fixed output. I do not know how feasible that is tho…

If those do not match the build fails. This will avoid silently loosing the ability to reproduce the fixed output paths (in some cases).

So this is just a faster error path, because it disallows using a fixed-output package that was specified for a different bazel version?

Yes, exactly. We now get a fast error as soon as (the) one of the obvious issue(s) arises.

uri-canva · 2020-02-24T21:04:02Z

This is hard to solve as e.g. bash might be referenced in the fixed output and that regularly changes whenever the the stdenv changes.

I thought buildBazelPackage already disallowed all references to store paths in the fixed output, since it sets allowedRequisites = []. At least I know it's screamed at me often when I had store paths in it.

andir · 2020-02-24T21:49:34Z

On 13:04 24.02.20, Uri Baghin wrote: > This is hard to solve as e.g. bash might be referenced in the fixed > output and that regularly changes whenever the the stdenv changes. I thought `buildBazelPackage` already disallowed all references to store paths in the fixed output, since it sets `allowedRequisites = []`. At least I know it's screamed at me often when I had store paths in it.

You are probably right. I confused that with some other behaviour that I was looking at. Then the situation isn't as bad as I expected. Less work to be done. :-)

uri-canva · 2020-02-25T00:09:02Z

Isn't this an issue with fixed output derivations in general? Since they check the output hash before checking whether the inputs are the same you can change the inputs and still get the old cached value. This affects more than just bazel version changes, for example other buildInputs changes would also be potentially ignored like that. I think this issue should be addressed in fixed output derivations in general, not worked around with a specific hack like this.

andir · 2020-02-25T10:32:05Z

(This turned out to be a longer response then initially anticipated. Just thought giving more context ot my thoughts is probably not a bad idea.)

On 16:09 24.02.20, Uri Baghin wrote: Isn't this an issue with fixed output derivations in general? Since they check the output hash before checking whether the inputs are the same you can change the inputs and still get the old cached value.

Yes that is true. Do you have an alternative approach to implementing this? Keep in mind that all the usual fetchers used within Nixpkgs are also just fixed output derivations. There we do not care what version of curl was used. We only care about the resulting hash being the same. Fixed output derivations being abused like they are in `buildBazelPackage` and `buildRustPackage` is the issue here. They shouldn't be used to generate code. They should just fetch something and be done.

This affects more than just bazel version changes, for example other `buildInputs` changes would also be potentially ignored like that. I think this issue should be addressed in fixed output derivations in general, not worked around with a specific hack like this.

Yes that is an issue. IMO the entire "fixed output" fetching using bazel (or cargo or whatever) is a hack. They do not operate on a single URL that they fetch but they fetch a complex tree of directories (with generated files?). As I tried to explain in the PR description tracking all the inputs can go both ways. It can either mean that whenever someone changes stdenv you'll have to update all the fixed output hashes. This yields a great amount of useless work nobody wants to do. The other extreme is the current situation (before this PR) where nothing actually invalidates the fixed outputs and thus we continue using stuff that isn't reproducible anymore. In the past we took a similar route (to this PR) within the `buildRustPackage` expression. There `cargo vendor` is used to download all the dependencies before the build is executed. In contrast to what bazel seems to be doing `cargo vendor` is just a fancy way to iterate through URLs and writing those files to a folder. In the rust world we are comparing the lockfile that is part of the fixed output to the lockfile that the sources ship: https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/rust/default.nix#L118 While in the rust ecosystem the way to fetch things is trivial (just invoke `cargo vendor`) and gives you the same outputs every time the same doesn't seem to be true for Bazel. As you at least pass a few arguments to the specific bazel version. I am not trying to solve the entire problem space here. I am trying to make our lifes better by not having silently rotting expressions in nixpkgs. I lost many days of my life by debugging why tensorflow wouldn't build anymore. It doesn't help if the fixed outputs aren't reproducible and you only notice that a year later. The build system should at least yell at you if you generate them from another bazel version. Arguable they should probably also bark at you if the `src` attribute of the package changes. This PR is about finding an intermediate solution. If you have a great idea to fix all of the fixed output pains I am all ears and I am very happy to look into it. For the time being something like this is IMHO eagerly needed.

Profpatsch · 2020-02-25T11:13:13Z

Isn't this an issue with fixed output derivations in general?

Yes, it’s inherent in their design. The person using them has to guarantee referential transparency, which in this case means “running the fixed-output builder is always going to produce the same result, up to some networking issues”. You can kinda assume that for fetchurl (for reasonably stable URLs, but that’s why nixpkgs has a mirror cache for that).

This means the onus here is on the creator of buildBazelPackage, and they have done a bad job by mis-/abusing the power of fixed output derivations, sacrificing correctness for temporary gain (and other people’s lifetime …).

The right solution is having a generator which goes from WORKSPACE files (which contain the relevant dependency’s hashes!) to a nix file with fetchurl calls, like we do for the build of bazel itself.

Profpatsch · 2020-02-25T11:15:00Z

@andir this is still a draft, do you still want to change something?

andir · 2020-02-25T11:21:12Z

On 03:15 25.02.20, Profpatsch wrote: @andir this is still a draft, do you still want to change something?

It would probably make sense to get Tensorflow working again (notice that commit with ^FIXME:). I will try to get an ancient bazel version reincarnated throughout the day.

uri-canva · 2020-02-25T11:40:17Z

I see, that makes sense, thanks for the context. I believe https://docs.bazel.build/versions/2.0.0/command-line-reference.html#flag--repository_cache might be helpful here: the fixed output derivation currently contains all the external repositories, which contain a lot of files that aren't downloaded. That's why there's so much work the derivations that use buildBazelPackage have to do to delete directories from it. The repository cache on the other hand only contains downloaded files, and should be compatible across versions of both bazel and its dependencies. We can then point bazel to the repository cache we create in the fixed output derivation, and it should be able to recreate all of the external repositories without hitting the network (assuming all the repository rules use methods that support the repository cache to download files, for example that won't work for bazel projects that use rules_docker, since that has its own caching mechanism). See https://bazel.build/designs/2016/09/30/repository-cache.html for the design of it.

That should behave much more like we expect fixed output derivations to behave.

To be clear I'm not suggesting this be done instead of this PR, I understand this PR is also helpful in the short term and I don't mean to block it, just suggesting how this could be made better going forward.

andir · 2020-02-25T11:50:56Z

On 03:40 25.02.20, Uri Baghin wrote: I see, that makes sense, thanks for the context. I believe https://docs.bazel.build/versions/2.0.0/command-line-reference.html#flag--repository_cache might be helpful here: the fixed output derivation currently contains all the external repositories, which contain a lot of files that aren't downloaded. That's why there's so much work the derivations that use `buildBazelPackage` have to do to delete directories from it. The repository cache on the other hand only contains downloaded files, and should be compatible across versions of both bazel and its dependencies. We can then point bazel to the repository cache we create in the fixed output derivation, and it should be able to recreate all of the external repositories without hitting the network (assuming all the repository rules use methods that support the repository cache to download files, for example that won't work for bazel projects that use `rules_docker`, since that has its own caching mechanism). See https://bazel.build/designs/2016/09/30/repository-cache.html for the design of it. That should behave much more like we expect fixed output derivations to behave.

Yeah, that really looks like a good alternative approach. Looking at the documentation of older bazel commands makes me feel confident that we could just use that across all versions (for now). Whats the policy for experimental commands in Bazel? Can they go away at any (major) version change? I wouldn't want to put in time for that if that goes away in a version or two without a proper replacement.

To be clear I'm not suggesting this be done instead of this PR, I understand this PR is also helpful in the short term and I don't mean to block it, just suggesting how this could be made better going forward.

Understood. I think the way you pointed out is the right way forward in the long term.

uri-canva · 2020-02-25T11:53:11Z

I assume it hasn't been experimental for a while now, the doc refers to it as experimental, but that was back in 2016.

Profpatsch · 2020-02-25T13:10:37Z

It would probably make sense to get Tensorflow working again (notice that commit with ^FIXME:). I will try to get an ancient bazel version reincarnated throughout the day.

Doesn’t need to be in this PR though, let’s merge it as is if possible

Profpatsch · 2020-02-25T13:14:37Z

We can then point bazel to the repository cache we create in the fixed output derivation

What is the repository cache generated from? Iirc it doesn’t have to be from pinned stuff in the WORKSPACE file. In that case we have the same problem again, so I don’t think that’s a good solution.

We need a lockfile, similar to how all other modern language ecosystems handle it. There has been some work in that direction: https://blog.bazel.build/2018/09/28/first-class-resolved-file.html

andir · 2020-02-25T14:28:25Z

I will try to get an ancient bazel version reincarnated throughout the day.

Just saw the work that @mjlbach did on his fork trying to get bazel_1 to work additionally (https://github.com/mjlbach/nixpkgs/commits/bazel_1). I would recommend we join forces here and land multiple bazel versions before we merge this PR. @mjlbach: Let me know how we can collaborate on this. You changes look good already. What is still missing?

mjlbach · 2020-02-25T15:55:20Z

@andir sounds good :) see pr #81033. One thing to note is I had to add a patch to glibc to get bazel building. I also have tensorflow working with a glibc patch I will submit (https://github.com/mjlbach/nixpkgs/tree/tensorflow_1_compat) see here for the fix (still need to update the sha on non-cuda builds) mjlbach@a5cca6b. But this requires making the version of Bazel for buildBazel configurable, as currently I left bazel_2 as the default bazel package.

mjlbach · 2020-02-25T16:04:22Z

@andir Also see #81035 The one last piece is configuring buildbazel to build with either bazel_1 or bazel (bazel_2) which I have not yet had time to explore. comments on the PRs welcome.

Profpatsch · 2020-02-26T12:10:32Z

One thing to note is I had to add a patch to glibc to get bazel building.

This sounds like it’s waaaay out of scope for this PR. Can we merge what we have?

mjlbach · 2020-02-26T12:11:41Z

Yes, sorry for the cross pollination from #81033

andir · 2020-03-02T16:28:49Z

Ready for final review @Profpatsch @mjlbach

I had to add an older version of tensorflow-estimator as that had been updated to an incompatible version in #81278.

pkgs/build-support/build-bazel-package/default.nix

All bazel fixed output derivations should be specific to the bazel version that was used to generate them. There is not guarantee that the build will still succeed or reproduces (without the cached fixed output) if the fetch phase wasn't rerun with a different bazel version. In the past bazel had been bumped but not all those packages that have fixed outputs from bazel builds. This lead to compiling and somewhat working TF versions that couldn't be reproduced without the cached fixed outputs.

TF 1.15 still needs an older version of the tensorflow-estimator package.

andir mentioned this pull request Feb 24, 2020

bazel: 2.0.0 -> 2.1.0 #80739

Merged

10 tasks

ofborg bot added the 6.topic: python label Feb 24, 2020

ofborg bot requested review from andrew-d, abbradar, timokau, uri-canva and kalbasit and removed request for abbradar February 24, 2020 13:10

ofborg bot added 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 labels Feb 24, 2020

Profpatsch mentioned this pull request Feb 28, 2020

Rust Application Migration to fetchCargoTarball Implementation #79975

Closed

andir force-pushed the bazel-poison branch from cdd65dc to b5a921e Compare March 2, 2020 16:26

andir marked this pull request as ready for review March 2, 2020 16:26

andir requested review from FRidh, jonringer and Profpatsch as code owners March 2, 2020 16:26

Profpatsch approved these changes Mar 2, 2020

View reviewed changes

pkgs/build-support/build-bazel-package/default.nix Outdated Show resolved Hide resolved

ofborg bot added the 8.has: package (new) label Mar 2, 2020

andir force-pushed the bazel-poison branch from b5a921e to 6e699c6 Compare March 2, 2020 16:34

ofborg bot requested a review from abbradar March 2, 2020 16:34

uri-canva reviewed Mar 2, 2020

View reviewed changes

pkgs/build-support/build-bazel-package/default.nix Outdated Show resolved Hide resolved

andir added 8 commits March 3, 2020 01:12

gvisor: update fixed output hash

abd312b

bazel-watcher: update fixed output hash

9683826

tensorflow-probabilit: update fixed output hash

fa7b4d5

bazel-remote: update fixed output hash

0ff376c

pythonPackages.dm-sonned: update fixed output hash

84b3678

pythonPackages.tensorflow-estimator_1_15_1: init at 1.15.1

c539f93

TF 1.15 still needs an older version of the tensorflow-estimator package.

tensorflow: update fixed output hashes

642bea6

andir force-pushed the bazel-poison branch from 6e699c6 to 642bea6 Compare March 3, 2020 00:12

ofborg bot requested a review from uri-canva March 3, 2020 00:24

uri-canva approved these changes Mar 3, 2020

View reviewed changes

andir merged commit 846f300 into NixOS:master Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buildBazelPackage: poison all the fixed output derivations #80953

buildBazelPackage: poison all the fixed output derivations #80953

andir commented Feb 24, 2020

Profpatsch commented Feb 24, 2020

andir commented Feb 24, 2020

uri-canva commented Feb 24, 2020

andir commented Feb 24, 2020 via email

uri-canva commented Feb 25, 2020

andir commented Feb 25, 2020 via email

Profpatsch commented Feb 25, 2020

Profpatsch commented Feb 25, 2020

andir commented Feb 25, 2020 via email

uri-canva commented Feb 25, 2020

andir commented Feb 25, 2020 via email

uri-canva commented Feb 25, 2020

Profpatsch commented Feb 25, 2020 •

edited

Profpatsch commented Feb 25, 2020

andir commented Feb 25, 2020 via email

mjlbach commented Feb 25, 2020 •

edited

mjlbach commented Feb 25, 2020

Profpatsch commented Feb 26, 2020

mjlbach commented Feb 26, 2020

andir commented Mar 2, 2020

buildBazelPackage: poison all the fixed output derivations #80953

buildBazelPackage: poison all the fixed output derivations #80953

Conversation

andir commented Feb 24, 2020

Motivation for this change

Future work

Things done

Profpatsch commented Feb 24, 2020

andir commented Feb 24, 2020

uri-canva commented Feb 24, 2020

andir commented Feb 24, 2020 via email

uri-canva commented Feb 25, 2020

andir commented Feb 25, 2020 via email

Profpatsch commented Feb 25, 2020

Profpatsch commented Feb 25, 2020

andir commented Feb 25, 2020 via email

uri-canva commented Feb 25, 2020

andir commented Feb 25, 2020 via email

uri-canva commented Feb 25, 2020

Profpatsch commented Feb 25, 2020 • edited

Profpatsch commented Feb 25, 2020

andir commented Feb 25, 2020 via email

mjlbach commented Feb 25, 2020 • edited

mjlbach commented Feb 25, 2020

Profpatsch commented Feb 26, 2020

mjlbach commented Feb 26, 2020

andir commented Mar 2, 2020

Profpatsch commented Feb 25, 2020 •

edited

mjlbach commented Feb 25, 2020 •

edited