Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel: 1.2.1 -> 2.0.0 #76851

Merged
merged 2 commits into from Jan 7, 2020
Merged

bazel: 1.2.1 -> 2.0.0 #76851

merged 2 commits into from Jan 7, 2020

Conversation

danjuv
Copy link
Contributor

@danjuv danjuv commented Jan 3, 2020

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @

Copy link
Member

@Br1ght0ne Br1ght0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff LGTM (did not check thoroughly)
bazel --help works
nixpkgs-review doesn't pass

Missing dependencies:

  • python3.7-annoy-1.16.3: h5py
  • python3.7-gym-0.15.4, python3.8-gym-0.15.4: opencv-python

Check failures:

  • python3.7-minio-5.0.5:
    File "/build/minio-5.0.5/minio/helpers.py", line 292, in is_valid_endpoint
      raise InvalidEndpointError('Hostname cannot have a scheme.')
  minio.error.InvalidEndpointError: InvalidEndpointError: message: Hostname cannot have a scheme.

  ----------------------------------------------------------------------
  Ran 133 tests in 0.037s

  FAILED (errors=80)
  Test failed: <unittest.runner.TextTestResult run=133 errors=80 failures=0>
  error: Test failed: <unittest.runner.TextTestResult run=133 errors=80 failures=0>

@Profpatsch
Copy link
Member

@GrahamcOfBorg build bazel.tests

@Profpatsch
Copy link
Member

@GrahamcOfBorg build bazel.tests.protobuf

@Profpatsch
Copy link
Member

Looks like all tests pass. Merging.

@Profpatsch Profpatsch merged commit 0a6551f into NixOS:master Jan 7, 2020
@prusnak
Copy link
Member

prusnak commented Jan 18, 2020

This broke the tensorflowWithCuda build in the master branch. tensorflowWithCuda is not built by default by hydra, because it has a non-free nvidia dependency.

I would also expect to keep both versions of the package. We are talking about the build system here, not some random leaf package where major version update is most probably fine.

@Profpatsch
Copy link
Member

This broke the tensorflowWithCuda build in the master branch. tensorflowWithCuda is not built by default by hydra, because it has a non-free nvidia dependency.

We can’t block on tensorflow when merging bazel updates, because that’s a much too complex package (that is to say from what I hear it’s a shitfest to maintain). We couldn’t add it to downstream tests in the first place anyway, because of the nonfree blobs.

I would also expect to keep both versions of the package. We are talking about the build system here, not some random leaf package where major version update is most probably fine.

Bazel just changed their versioning, they are going to release a new major version every three months for the foreseeable future as far as I can see.

nixpkgs is a snapshot, so if you need a workling tensorflow you can pin to an older version of nixpkgs until the tensorflow maintainers fix it to work with the current bazel. If you really need tensorflow to work on master, you can revert this patchset on top of e.g. a master checkout or a fork.

@timokau
Copy link
Member

timokau commented Jan 26, 2020

We can’t block on tensorflow when merging bazel updates

I think we can, and we should. It is a bit frustrating that tensorflow breaks every couple of months (hopefully now with the new release cadence every couple of months) because yet another bazel update was merged without testing tensorflow.

That said, this is a bit of an outlier since this time its actually tensorflowWithCuda which fails. I don't think we need to test that on every bazel update, since I think in 99% of the cases it should be fine as long as tensorflow works. Since its unfree, I can understand why people wouldn't want to test it.

@Profpatsch
Copy link
Member

I think we can, and we should.

Well, I (personally) certainly cannot. But if you want, I can add you as maintainer to bazel so that you are pinged and can test it before we merge.

@timokau
Copy link
Member

timokau commented Jan 28, 2020

Unfortunately I'm not willing to take on that responsibility alone either. I already probably shouldn't be in tensorflow's maintainer list, as I only maintain it on an as-needed (as in when I personally need it) basis. Still, I think we, as a community, should. Tensorflow is the biggest consumer of bazel in nixpkgs. In fact, its close to the only one. Its also very popular.

Hopefully someone interested in keeping tensorflow running will take you up on the offer and add themselves to the bazel maintainer list. They can then test tensorflow on new bazel releases, but they should get reasonable time to do so before merging a new bazel release.

@Profpatsch
Copy link
Member

Profpatsch commented Jan 28, 2020

I’m afraid I must echo @justinwoo’s sentiment here:

https://twitter.com/jusrin00/status/1221371774497181698

The “community” is just people doing things because they want to, everything else is either paid or abuse.

I’m maintaining bazel in nixpkgs for https://github.com/tweag, I’m sure we can help with tensorflow, too, for reasonable rates.

@deliciouslytyped
Copy link
Contributor

Can't you guys just have a "stable" version of bazel for tensorflow, and the updated bazel?
Nixpkgs has done things like this before.

@andir
Copy link
Member

andir commented Jan 28, 2020

I agree with @deliciouslytyped.

Just because Bazel releases a new version that doesn't mean everyone should stop what they are doing and update to the latest Bazel version.

Is there a reason we can't have multiple bazel versions? I mean we've done that for others as well and it would probably not hurt allowing consumers of nixpkgs use older versions of Bazel. Especially if Bazel (rightfully) breaks things on the next major version it might be wise to just keep the old version around. For how long? I don't know. Does it hurt us carrying multiple versions? Probably not, we just provide the same loose guarantees as we currently do. Stuff might just not be use able (as it is right now).

@Profpatsch
Copy link
Member

If we know exactly which version tensorflow works with, I guess we could keep one around for that. It would have to be maintained by tensorflow maintainers though.

@andir
Copy link
Member

andir commented Jan 30, 2020

Now we just need someone that is willing to do that work.. I am certainly not. Already have my fair share of pain every day. :)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/january-2020-in-nixos/5771/1

@mjlbach
Copy link
Contributor

mjlbach commented Feb 23, 2020

I'm willing to do the work, but I'm having trouble building bazel when I cherry pick 4fdea73 into master.

[42 / 1,642] 8 actions running
//src/main/java/com/google/devtools/build/lib:bazel/BazelServer; 0s local 
Action .../google/devtools/build/lib/runtime/commands/LICENSE; 0s local 
Compiling external/bazel_tools/.../zlib/zutil.c [for host]; 0s local
Compiling external/bazel_tools/.../ijar/zip_main.cc [for host]; 0s local
@com_google_protobuf//:protoc_lib; 0s local 
Compiling third_party/grpc/src/core/lib/gpr/log_linux.cc; 0s local  
@com_google_protobuf//:protoc_lib; 0s local
@com_google_protobuf//:protoc_lib; 0s local 
ERROR: /build/bazel_src/third_party/grpc/BUILD:388:1: C++ compilation of rule '//third_party/grpc:gpr_base' failed (Exit 1) 
[44 / 1,642] 7 actions running
//src/main/java/com/google/devtools/build/lib:bazel/BazelServer; 0s local 
Action .../google/devtools/build/lib/runtime/commands/LICENSE; 0s local  
Compiling external/bazel_tools/.../ijar/zip_main.cc [for host]; 0s local
@com_google_protobuf//:protoc_lib; 0s local 
@com_google_protobuf//:protoc_lib; 0s local 
@com_google_protobuf//:protoc_lib; 0s local
 JavacBootstrap .../lib/shell/libshell-skylark.jar [for host]; 0s local third_party/grpc/src/core/lib/gpr/log_linux.cc:43:13: error: ambiguating new declaration of 'long int gettid()'    43 | static long gettid(void) { return syscall(__NR_gettid); }       |             ^~~~~~ In file included from /nix/store/6gv97caz9j97bgq4kb7wjxs75znzd6bb-glibc-2.30-dev/include/unistd.h:1170,                  from third_party/grpc/src/core/lib/gpr/log_linux.cc:41: /nix/store/6gv97caz9j97bgq4kb7wjxs75znzd6bb-glibc-2.30-dev/include/bits/unistd_ext.h:34:16: note: old declaration '__pid_t gettid()'    34 | extern __pid_t gettid (void) __THROW;       |                ^~~~~~ third_party/grpc/src/core/lib/gpr/log_linux.cc:43:13: warning: 'long int gettid()' defined but not used [-Wunused-function]    43 | static long gettid(void) { return syscall(__NR_gettid); }       |             ^~~~~~ [44 / 1,642] 7 actions running     //src/main/java/com/google/devtools/build/lib:bazel/BazelServer; 0s local     Action .../google/devtools/build/lib/runtime/commands/LICENSE; 0s local     Compiling external/bazel_tools/.../ijar/zip_main.cc [for host]; 0s local     @com_google_protobuf//:protoc_lib; 0s local     @com_google_protobuf//:protoc_lib; 0s local     @com_google_protobuf//:protoc_lib; 0s local     JavacBootstrap .../lib/shell/libshell-skylark.jar [for host]; 0s local Target //src:bazel_nojdk failed to build 
[51 / 1,642] checking cached actions INFO: Elapsed time: 5.687s, Critical Path: 0.22s
[51 / 1,642] checking cached actions INFO: 1 process: 1 local.
[51 / 1,642] checking cached actions 

FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully
ERROR: Could not build Bazel builder for '/nix/store/bippk1116v8gaf7085bb9m9z4v367fj5-bazel-1.2.1.drv' failed with exit code 1 error: build of '/nix/store/bippk1116v8gaf7085bb9m9z4v367fj5-bazel-1.2.1.drv' failed

Here's my very sloppy prototype branch for adding a bazel 1 https://github.com/mjlbach/nixpkgs/tree/bazel_1

@andir
Copy link
Member

andir commented Feb 24, 2020

I briefly looked into this and also ran in the above error. I've also tried downgrading GCC but that also didn't really help :/

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Feb 24, 2020

_TF_MIN_BAZEL_VERSION = '0.24.1'
_TF_MAX_BAZEL_VERSION = '0.26.1'

that's for 1.15.2

@mjlbach
Copy link
Contributor

mjlbach commented Feb 24, 2020

Yes, but it does build correctly on 4fdea73 with 1.2.1

@mjlbach
Copy link
Contributor

mjlbach commented Feb 24, 2020

After bisect e80a85a seems to break bazel 1.2.1, 748c42b is the last commit I can build bazel 1.2.1 on maybe we should open a separate tracking issue?

@Profpatsch
Copy link
Member

maybe we should open a separate tracking issue?

yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants