Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Tensorflow from source #30434

Merged
merged 3 commits into from Oct 19, 2017
Merged

Build Tensorflow from source #30434

merged 3 commits into from Oct 19, 2017

Conversation

abbradar
Copy link
Member

@abbradar abbradar commented Oct 15, 2017

Motivation for this change

(Do not merge before #30433!)

What it says.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

This is based atop #30433 , so there are extra commits (you are only interested in the last two).

Possible problem with this are long compile times with CUDA (half an hour+) where previously we uesd prebuilt binaries.

Bazel is a wonderful tool for building projects -- it's certainly a genius idea to create and use a 500k LoC Java build manager tool which eats 1.5G of memory for building C++. Unfortunately, Nix is not enterprise enough so our ways to isolate builds were not supported by Basel, making building a project with it effectively impossible in Nix (which is why TensorFlow used binary distribution before). Specifically, Bazel tries to download all external project dependencies by itself, which fails because of Nix isolation. In vanilla Bazel there is no way to circumvent this (please correct me if there's a better way!).

My solution is to add a special patched version of Bazel which skips checking of fetched dependencies checksums (which depend, among other things, on build directory paths). Then we can do two-staged build as described in 9b3559f.

This, along with TensorFlow's own pecularities (special scripts for everything, awful compile times because all dependent libraries like CURL or ffmpeg are built statically, incomprehensible build system scripts even considering Bazel etc.) made this PR a work of three or four straight days. I'd say this gets a solid second place in my list of my most awful packages after Telegram.

Sorry for the rant ~_~

@abbradar abbradar requested a review from FRidh as a code owner October 15, 2017 13:11
@abbradar
Copy link
Member Author

cc @jyp -- I won't have access to NVIDIA GPU for quite some time yet so it'd be cool if you could test this.

@abbradar
Copy link
Member Author

@TravisWhitaker , you may also be interested in this.

@copumpkin
Copy link
Member

Your bazel changes seem helpful more broadly than here too!

@abbradar
Copy link
Member Author

@copumpkin Yeah, I'd imagine this approach can be used to build any other Bazel project on Nix. If you have anything else in mind we could merge the Bazel patch separately faster.

@TravisWhitaker
Copy link
Contributor

@abbradar This looks excellent, thanks so much for doing this! I'll give this a go tomorrow for sure.

};

cudatoolkit8 = common {
version = "8.0.61";
url = https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run;
url = "https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these quotes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I semi-conciously replace instances of unquoted URLs to quoted ones because my terminal sees ; as a part of URL otherwise and doesn't allow me to open it with a click.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edolstra
Copy link
Member

What is the build time for Tensorflow?

@abbradar
Copy link
Member Author

abbradar commented Oct 15, 2017

@edolstra ~20 minutes for pure CPU build and ~35 for CUDA-enabled one on my machine -- this is very subjective though, I didn't get any measurements.

EDIT: this is on Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz (Haswell laptop-grade i7).

@edolstra
Copy link
Member

Okay, that's not too bad :-)

@jb55
Copy link
Contributor

jb55 commented Oct 16, 2017

Impressive. I gave up on this one awhile back for sanity reasons. Packages be warned, don't mess with @abbradar .

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

@abbradar Bazel fails to build with trying to access /tmp. Here is the tail of the log:

INFO: From Compiling src/main/cpp/blaze.cc:
<command-line>:0:0: warning: "_FORTIFY_SOURCE" redefined
<command-line>:0:0: note: this is the location of the previous definition
INFO: From JavacBootstrap src/java_tools/buildjar/java/com/google/devtools/build/buildjar/libbootstrap_JarOwner.jar [for host]:
warning: Implicitly compiled files were not subject to annotation processing.
  Use -proc:none to disable annotation processing or -implicit to specify a policy for implicit compilation.
1 warning
Target //src:bazel up-to-date:
  bazel-bin/src/bazel
INFO: Elapsed time: 101.997s, Critical Path: 92.72s
WARNING: /tmp/bazel_XdUveRNx/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_XdUveRNx/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions.

Build successful! Binary is here: /tmp/nix-build-bazel-0.4.5.drv-0/output/bazel
Error: mkdir('/tmp/.bazel'):  bV
builder for ‘/nix/store/m0c7wz18nmwi9zh3zy54478l73qddv1y-bazel-0.4.5.drv’ failed with exit code 36
cannot build derivation ‘/nix/store/igx4pla9mmc5p475z65xmwz7x0nzvqy8-python3.6-tensorflow-1.3.1.drv’: 1 dependencies couldn't be built
cannot build derivation ‘/nix/store/73ram5ch0sp485gk6pn28wcvgg51x9zy-python3-3.6.2-env.drv’: 1 dependencies couldn't be built
error: build of ‘/nix/store/73ram5ch0sp485gk6pn28wcvgg51x9zy-python3-3.6.2-env.drv’ failed
/usr/local/bin/nix-shell: failed to build all dependencies

@abbradar
Copy link
Member Author

@jyp Not sure what happens here. Do you have sandbox enabled?

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

@abbradar No, I am not using sandboxing. Should I?

I see now that /tmp/.bazel already exists, owned by nixbld4. I am wondering if the error was due to a race condition because I was using a multiprocess build. I'll try to delete the directory and re-run the build.

@abbradar
Copy link
Member Author

@jyp Yeah, it's disabled by default for users because of performance reasons but practically required if you start building stuff.

nix.useSanbdox = true;

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

@abbradar
Using a single process I could build bazel, but tensorflow does not build. There is apparently a problem with libstdc++. Here is the tail of the log.

/nix/store/qxb0x9fcs60zv719yy0zm95gyd8mmgks-glibc-2.25-49-dev/include/features.h:373:4: warning: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Wcpp]
 #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
    ^
In file included from /nix/store/iwkcli8sy3rs7ks6kw7vclhwpgjn5bgb-cudatoolkit-8.0.61-unsplit/bin/..//include/host_config.h:173:0,
                 from /nix/store/iwkcli8sy3rs7ks6kw7vclhwpgjn5bgb-cudatoolkit-8.0.61-unsplit/bin/..//include/cuda_runtime.h:78,
                 from <command-line>:0:
/nix/store/qxb0x9fcs60zv719yy0zm95gyd8mmgks-glibc-2.25-49-dev/include/features.h:373:4: warning: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Wcpp]
 #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
    ^
ERROR: /tmp/nix-build-python3.6-tensorflow-1.3.1.drv-0/tensorflow-v1.3.1-src/output/external/jpeg/BUILD:167:1: Executing genrule @jpeg//:simd_x86_64_assemblage23 failed: bash failed: error executing command /nix/store/h404wfcz8rzzlq8vr4z7plcijwzfci72-bash-4.4-p12/bin/bash -c ... (remaining 1 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127.
bazel-out/host/bin/external/nasm/nasm: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
____Building complete.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
____Elapsed time: 85.798s, Critical Path: 60.86s
builder for ‘/nix/store/igx4pla9mmc5p475z65xmwz7x0nzvqy8-python3.6-tensorflow-1.3.1.drv’ failed with exit code 1
cannot build derivation ‘/nix/store/73ram5ch0sp485gk6pn28wcvgg51x9zy-python3-3.6.2-env.drv’: 1 dependencies couldn't be built
error: build of ‘/nix/store/73ram5ch0sp485gk6pn28wcvgg51x9zy-python3-3.6.2-env.drv’ failed
/usr/local/bin/nix-shell: failed to build all dependencies

@abbradar
Copy link
Member Author

abbradar commented Oct 16, 2017

Wha~... Not sure what happens here again, haven't seen that before. What do you mean by "single process"?

EDIT: do you use my branch or cherry-pick my patches atop other tree?

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

@abbradar I meant not using "-j n" option.

After activating sandboxing (using /etc/nix/nix.conf, because I am not on NixOS but Fedora), I am getting the same error, but apparently at a different point:

____[1,567 / 4,608] Writing file tensorflow/core/libfunctional_ops_op_lib.pic.lo-2.params
____[1,591 / 4,769] Writing file tensorflow/core/kernels/libsegment_reduction_ops_gpu.pic.lo-2.params
____[1,653 / 4,926] Writing file tensorflow/core/kernels/liblookup_table_op.pic.lo-2.params
____[1,703 / 4,982] Compiling external/boringssl/linux-x86_64/crypto/bn/x86_64-mont5.S
____[1,817 / 4,982] Compiling external/boringssl/src/crypto/bn/add.c
____[1,828 / 4,982] Compiling external/boringssl/src/crypto/refcount_lock.c
____[1,931 / 4,982] Compiling external/boringssl/src/crypto/bn/mul.c
____[1,965 / 4,982] Compiling external/boringssl/src/crypto/hmac/hmac.c
____[2,045 / 4,987] Compiling external/curl/lib/hash.c
____[2,086 / 4,996] Writing file external/highwayhash/libsip_hash.a-2.params [for host]
____[2,098 / 5,010] Compiling external/png_archive/pngmem.c [for host]
____[2,243 / 5,017] Compiling external/grpc/src/core/lib/iomgr/load_file.c
____[2,244 / 5,017] Compiling external/grpc/src/core/lib/iomgr/timer_heap.c
____[2,245 / 5,017] Compiling external/grpc/src/core/lib/iomgr/udp_server.c
____[2,252 / 5,017] Compiling external/grpc/src/core/lib/surface/server.c
____[2,245 / 5,017] Compiling external/grpc/src/core/lib/iomgr/timer.c
ERROR: /tmp/nix-build-python3.6-tensorflow-1.3.1.drv-0/tensorflow-v1.3.1-src/output/external/protobuf/BUILD:269:1: Executing genrule @protobuf//:generate_js_well_known_types_embed failed: bash failed: error executing command /nix/store/h404wfcz8rzzlq8vr4z7plcijwzfci72-bash-4.4-p12/bin/bash -c ... (remaining 1 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127.
bazel-out/host/bin/external/protobuf/js_embed: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
____Building complete.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
____Elapsed time: 77.122s, Critical Path: 55.70s
builder for ‘/nix/store/igx4pla9mmc5p475z65xmwz7x0nzvqy8-python3.6-tensorflow-1.3.1.drv’ failed with exit code 1
cannot build derivation ‘/nix/store/73ram5ch0sp485gk6pn28wcvgg51x9zy-python3-3.6.2-env.drv’: 1 dependencies couldn't be built
error: build of ‘/nix/store/73ram5ch0sp485gk6pn28wcvgg51x9zy-python3-3.6.2-env.drv’ failed
/usr/local/bin/nix-shell: failed to build all dependencies

@abbradar
Copy link
Member Author

abbradar commented Oct 16, 2017

About single process: this seems to be a bug in our Bazel package then, I'll look at this later.

About your setup: do you use single-user or multi-user Nix?

Try to build it with --keep-failed and poke around in the build directory. Let's start with ldd path/to/failed/binary...

EDIT: patchelf --print-rpath path/to/failed/binary will also be interesting.

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017 via email

@abbradar
Copy link
Member Author

abbradar commented Oct 16, 2017

It seems to impurely use your host OS's libraries for some reason. That shouldn't happen with sandboxing. Do you use nix-daemon?

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

@abbradar Yes, I am using nix-daemon. I suppose that I should re-build bazel with sandboxing enabled. However I do not know how to invalidate entries in the nix build cache.

@abbradar
Copy link
Member Author

nix-store --delete /nix/store/foo

@abbradar
Copy link
Member Author

I managed to make my laptop's hybrid NVIDIA to work with CUDA and verified that TensorFlow works for me.

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

That's great :) But unfortunately I am still getting the same error after re-compiling bazel and tensorflow. How can I verify that the sandbox mode is enabled?

@abbradar
Copy link
Member Author

Try to build this:

with import <nixpkgs> {};

stdenv.mkDerivation {
  name = "foo";
  buildCommand = ''
    cp /tmp/test-file $out
  '';
}

with touch /tmp/test-file; nix-build foo.nix. It should fail if sandboxing is enabled.

@jyp
Copy link
Contributor

jyp commented Oct 16, 2017

Thanks. So for some reason nix-* disregards my configuration

@abbradar
Copy link
Member Author

Did you restart nix-daemon after your change? You set build-use-sandbox = true, correct?

@TravisWhitaker
Copy link
Contributor

I'm also failing on Error: mkdir('/tmp/.bazel') I simply checked out abbradar/tensorflow and ran:

$ nix-build --option build-use-sandbox true -E 'with import ./default.nix {}; pythonPackages.tensorflow.override {cudaSupport = true; cudaCapabilities = ["3.7" "6.1"];}'

I had this problem with my impure Tensorflow derivation too (I need to build with a specific Bazel version), and hacked around it with:

bazel.overrideAttrs (a: rec
    {
        preBuild = ''
            rm -rf /tmp/.bazel
        '';

        postFixup = ''
            rm -rf /tmp/.bazel
        '';
    });

@jyp
Copy link
Contributor

jyp commented Oct 17, 2017

@abbradar Yes, that is what I did, with apparently no effect.

@abbradar
Copy link
Member Author

abbradar commented Oct 17, 2017

@jyp Strange! What happens to you with sudo nix-build --option build-use-sandbox true -A tensorflow (sudo is important but don't worry, it won't result in builds from root)?

EDIT: but just to be sure: nix-daemon works for you, i.e. builds are usually done from nixbld* users -- correct?

@jyp
Copy link
Contributor

jyp commented Oct 17, 2017

Finally I managed to enable the sandbox and I'm getting yet another error:

🍃  Building Bazel with Bazel.
.WARNING: /tmp/bazel_GwJ9FX3o/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_GwJ9FX3o/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions.
INFO: Found 1 target...
ERROR: /tmp/nix-build-bazel-0.4.5.drv-0/third_party/zlib/BUILD:9:1: C++ compilation of rule '//third_party/zlib:zlib' failed: gcc failed: error executing command 
  (cd /tmp/bazel_GwJ9FX3o/out/execroot/nix-build-bazel-0.4.5.drv-0 && \
  exec env - \
    PATH=/nix/store/qh1ikg5lsnhiglsrfjz9k69fhzq17v8h-gcc-wrapper-6.4.0/bin:/nix/store/rww78vdn2rkayrnqsjl8ib5iq2vfm3sw-gcc-6.4.0/bin:/nix/store/9i5bj3c48vrdb4m4khbhaipvhy3m0bh0-binutils-2.28.1/bin:/nix/store/x0l1173jd26pnagv1ydsqhj4alfd2l9z-glibc-2.25-49-bin/bin:/nix/store/8qh2yq93x7ijvkvrf9gi0jhxr8jwh341-coreutils-8.28/bin:/nix/store/ci380q0m2lwv2w98rgqdbqh08zyhbxly-openjdk-8u144b01/bin:/nix/store/mlhv28m1v1glgxf1i07yy2xmw70slcd8-openjdk-8u144b01-jre/bin:/nix/store/r7diwi12ld7n2v3z6ybvdnl74zh7bzfl-zip-3.0/bin:/nix/store/davzb06kryh486ynm7q7kwg387rvlx63-unzip-6.0/bin:/nix/store/pdhd6442zdmc6cwl43mvk5qjjwsvgd3d-which-2.21/bin:/nix/store/84x58f3nz0807v9fby44c8k5sz1hdvss-patchelf-0.9/bin:/nix/store/x3aabh4qy5kfmnkljcm0fmnjvk83mkf8-paxctl-0.9/bin:/nix/store/qh1ikg5lsnhiglsrfjz9k69fhzq17v8h-gcc-wrapper-6.4.0/bin:/nix/store/rww78vdn2rkayrnqsjl8ib5iq2vfm3sw-gcc-6.4.0/bin:/nix/store/9i5bj3c48vrdb4m4khbhaipvhy3m0bh0-binutils-2.28.1/bin:/nix/store/x0l1173jd26pnagv1ydsqhj4alfd2l9z-glibc-2.25-49-bin/bin:/nix/store/8qh2yq93x7ijvkvrf9gi0jhxr8jwh341-coreutils-8.28/bin:/nix/store/h404wfcz8rzzlq8vr4z7plcijwzfci72-bash-4.4-p12/bin:/nix/store/8qh2yq93x7ijvkvrf9gi0jhxr8jwh341-coreutils-8.28/bin:/nix/store/bbjlrsdwxx6syy6qyfhnzykrl2mx2sx2-findutils-4.6.0/bin:/nix/store/27zzah3yb7r2zsx3l1mjmsavfyri55zr-diffutils-3.6/bin:/nix/store/fnd9290qsby769l8zmh982yrcmxc5qj8-gnused-4.4/bin:/nix/store/sdz6f70558njbvv005a5acp8wz1jq7xi-gnugrep-3.1/bin:/nix/store/b41mihdw1a7zzal16kcdpabrbvqhf7xr-gawk-4.1.4/bin:/nix/store/q6nivamar2h6havsq73iyhcdnfd5jw7h-gnutar-1.29/bin:/nix/store/42y51bqv59l4x5dlr0j9c13n9vbsd7cd-gzip-1.8/bin:/nix/store/migmjb53v38qrjxa0bb6v3wmrwpv7d2y-bzip2-1.0.6.0.1-bin/bin:/nix/store/fp9z69i41s78k3m69p11aynm1hfxapna-gnumake-4.2.1/bin:/nix/store/h404wfcz8rzzlq8vr4z7plcijwzfci72-bash-4.4-p12/bin:/nix/store/99v6v4qvk2p4ynpqsy0yqzkrfjdqpj8g-patch-2.7.5/bin:/nix/store/bhw4c05qqvvw78rd89kl6j2hrab7icgl-xz-5.2.3-bin/bin \
    TMPDIR=/tmp \
  /nix/store/qh1ikg5lsnhiglsrfjz9k69fhzq17v8h-gcc-wrapper-6.4.0/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/nix/store/qh1ikg5lsnhiglsrfjz9k69fhzq17v8h-gcc-wrapper-6.4.0/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -g0 -MD -MF bazel-out/host/bin/third_party/zlib/_objs/zlib/third_party/zlib/gzclose.d -iquote . -iquote bazel-out/host/genfiles -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem third_party/zlib -isystem bazel-out/host/genfiles/third_party/zlib -isystem external/bazel_tools/tools/cpp/gcc3 -w -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c third_party/zlib/gzclose.c -o bazel-out/host/bin/third_party/zlib/_objs/zlib/third_party/zlib/gzclose.o): java.io.IOException: Cannot run program "/tmp/bazel_GwJ9FX3o/out/execroot/nix-build-bazel-0.4.5.drv-0/_bin/process-wrapper" (in directory "/tmp/bazel_GwJ9FX3o/out/execroot/nix-build-bazel-0.4.5.drv-0"): error=2, No such file or directory.
Target //src:bazel failed to build
INFO: Elapsed time: 13.925s, Critical Path: 0.46s

ERROR: Could not build Bazel

@abbradar
Copy link
Member Author

abbradar commented Oct 17, 2017

Hmm, can you allow /bin/sh in your sandbox? I think that's the problem.

build-sandbox-paths = /bin/sh and also add all libraries that it depends on. You can do it by running:

with import <nixpkgs> {};
stdenv.mkDerivation {
  name = "foo";
  buildCommand = ''
    /bin/sh -c 'touch $out'
  '';
}

and fixing your /bin/sh until it works.

@abbradar
Copy link
Member Author

I've pushed an update which hopefully fixes Bazel's /tmp problems.

@edolstra , I've measured TensorFlow compile time with CUDA enabled on my machine -- 55 minutes. Seems I tend to strongly underestimate time ^_^"

@copumpkin
Copy link
Member

@abbradar any thoughts on upstreaming your changes to make Bazel make fewer assumptions about its build environment?

@abbradar
Copy link
Member Author

@copumpkin The problem is that my patch is strongly a hack: it breaks Bazel environment isolation (which is a bad thing really but we want it since we have our own!) and breaks its checksums to achieve network isolation.

All other changes are our usual Nix package building business like s,/bin/sh,${stdenv.shell}/bin/sh,g :D

@jyp
Copy link
Contributor

jyp commented Oct 17, 2017

You can do it by running:

with import <nixpkgs> {};
stdenv.mkDerivation {
  name = "foo";
  buildCommand = ''
    /bin/sh -c 'touch $out'
  '';
}

and fixing your /bin/sh until it works.

I don't understand what this means.

@abbradar
Copy link
Member Author

@jyp This derivation should work for you. If it doesn't, that means your sandbox doesn't include /bin/sh (which it should). To fix that you need to add build-sandbox-paths = /bin/sh .... to your Nix configuration, where ... are libraries which are needed by your sh. You can find them out by running ldd /bin/sh -- all those should be added to the option, separated by spaces.

@jyp
Copy link
Contributor

jyp commented Oct 17, 2017

@abbradar It works :) And thanks a lot for guiding me through all this!

A few remarks:

  1. shouldn't the build also work without sandboxing? I've been using nix for years and it's the first time that I needed that.
  2. I am guessing that the build won't work on darwin. (Does bazel even build on darwin?)
  3. did you manage to fix the issue with /tmp?
  4. I am still getting the messages about not using SSE and whatnot:
 2017-10-17 22:00:43.873246: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-17 22:00:43.873274: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-17 22:00:43.873282: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-17 22:00:43.873288: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-17 22:00:43.873295: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

Could we enable those build flags?

@abbradar
Copy link
Member Author

  1. We can try but this requires further crawling in Bazel's build system (eeeew) which is responsible for picking host's libraries -- and for a little gain, because when this is upstream you'd always get Bazel from Hydra;
  2. I'd like to make this work on Darwin to avoid regression but I don't have a machine available to check. The only thing I can do is merge this and build-fix-build on Hydra. Bazel should work on Darwin I think (after all it's not like they cross-compile TensorFlow from Linux so it means its build system is working);
  3. Not sure but may be fixed with the new patch above;
  4. We can't do this because TensorFlow doesn't support any kind of fallback -- i.e. if you enable AVX and then run it on a machine without AVX it'd just crash.

@jyp
Copy link
Contributor

jyp commented Oct 18, 2017

It would be nice to be able make the flags available anyway, I am guessing that they make quite a bit of performance difference. Also, some options are pretty much safe to enable (SSE) since they were introduced on CPUs 10 years ago. (Who is using tensorflow on such old machines?)


disabled = isPyPy || pythonOlder "2.6" || (isPy3k && pythonOlder "3.3");

src = fetchurl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchPypi

# the fix for which hasn't been merged yet.

# keep Nose around since running the tests by hand is possible from Python or bash
buildInputs = [ nose ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkInputs

@abbradar
Copy link
Member Author

@FRidh Fixed in #30433 -- those changes here are only because I base on that branch.

* Skip verifying checksums for already fetched packages.
  Needed for two-staged building in Nix:
    1. Build a fixed derivation with `bazel fetch` (filtered out of non-reproducable bits).
    2. Build an actual derivation which uses fetched dependencies (skipping
       checksums needed here because they depend on the build directory).
* Don't clean environment variables for children processes.
  Needed for Nix compiler wrappers.
Build from source.

It's implemented as a two-staged Bazel build (see also
546b4aec776b3ea676bb4d58d89751919ce4f1ef).
@cstrahan
Copy link
Contributor

cstrahan commented Nov 5, 2017

  • Don't clean environment variables for children processes.
    Needed for Nix compiler wrappers.

You should be able to use --action_env=VARIABLE to preserve a given environment variable: https://bazel.build/designs/2016/06/21/environment.html

It might be better to enumerate NIX_* variables and add an action_env flag for each, rather than patching bazel to preserve all env vars.

Regarding downloads, generally (that is, assuming the respective build systems are well behaved) you should be able to specify a repository for anything that would need to be downloaded, which you can point at a store path when building as apart of a nix derivation. You can see where I leverage this to point Envoy's build system at third-party dependencies that are built by Nix:

new_local_repository(
name = "nix_envoy_deps",
path = "${repoEnv}",
build_file = "nix_envoy_deps.BUILD"
)

If you can't prevent fetching stuff over the network, that's either a bug in Tensorflow's build system or a bug in Bazel.

I'll see if I can find time to see if I can improve the package, and report any issues upstream.

@cstrahan cstrahan self-assigned this Nov 5, 2017
@abbradar
Copy link
Member Author

abbradar commented Nov 6, 2017

@cstrahan I've seen this article but isn't this a design document? I thought that this functionality is not implemented yet -- if I'm mistaken it's great, let's then just passthru NIX_*.

About local repositories -- I didn't know that! From the build file it seemed that it'd always just download dependencies over the network but I'm very unfamiliar with Bazel. It would be great if you could take a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants