Build Tensorflow and libtensorflow from source #64716

abbradar · 2019-07-13T22:32:15Z

Motivation for this change

Restore Tensorflow source build while leaving binary builds . Based on @yorickvP and @timokau work.with some parts taken from both.

Things done

cc @uri-canva for changes regaring Bazel.

Tested both CUDA and non-CUDA builds of the Python library, both source and binary builds. Didn't test libtensorflow, I assume that if Python library works and ldd output is okay libtensorflow is okay too.

abbradar · 2019-07-13T22:35:29Z

Forgot to mention actual previous PRs: #63208 and #63616.

abbradar · 2019-07-13T22:37:55Z

Darwin still uses binary builds for now. I don't have a Darwin machine so I can't test if source build works and fix it if it doesn't - unless someone wants to take on that role!

abbradar · 2019-07-13T22:51:22Z

(For some reason I cannot request a review from @yorickvP on Github)

abbradar · 2019-07-14T08:33:40Z

Updated binary libtensorflow doesn't build for Darwin. Can someone with Darwin knowledge help with it?

danieldk · 2019-07-14T10:35:29Z

Updated binary libtensorflow doesn't build for Darwin. Can someone with Darwin knowledge help with it?

It seems that Google now build on newer macOS versions, I had the same problem with Tensorflow 1.13.1 when building on my own MacBook with Mojave. A newer cctools is needed, see #49371. I think this is the relevant PR: #61172

yorickvP · 2019-07-14T11:20:29Z

This is great! I'm hoping to get a fully static build at some point, but having the same protobuf on tensorflow as the rest of the code should fix our linker errors! :)

You can't request a review from me because I don't have commit rights, I suppose.

yorickvP

whoever built tensorflow upstream must have patched out that saved_model_portable_proto line. it's been in there for several version snow.

pkgs/build-support/build-bazel-package/default.nix

timokau · 2019-07-14T16:41:08Z

As I mentioned in #63616, there are still issues with the python2 build. Seems to be a general problem with parallelism on python2, only manifests itself here because its such a big project and basically hopeless to build without a lot of parallelism (used 12 cores).

I already added one patch to python2 (#64067), but I still saw the issue (sometimes, its transient) after that. I suspect the second part of the fix in the python ticket is also needed. That was never backported by debian, so I had to do it myself (being not very experienced with C). I've pushed those changes to the branch of the other PR. Its not very polished, which is why I didn't push it before.

I don't have time to work on this or give extensive review right now. But I can answer questions if you have any.

abbradar · 2019-07-14T17:03:42Z

@timokau I saw that one but IIUC this is a general Python problem, not Tensorflow-specific one, so I didn't really look into it. It shouldn't happen during the "big" Tensorflow build because I only build "source" package now and build the wheel separately with our buildPythonProgram. I didn't observe the problem on 8-core (16-thread) machine during ~10 builds too.

timokau · 2019-07-14T21:06:22Z

Neat, so you only need bazel for the C part and don't need to unnecessarily rebuild everything for libtensorflow, py2 tensorflow and py3 tensorflow?

abbradar · 2019-07-14T21:20:09Z

Sadly no, you still need to rebuild everything because their C libraries link to different Pythons and I presume use different APIs. Still, in the end you get something that resembles a usual source Python package with setup.py etc. I use this as src for buildPythonPackage so any Python bugs should be isolated there.

timokau · 2019-07-14T22:08:42Z

Oh, so you mean you just skip the final bdist_wheel as it was done previously? I actually changed that on purpose, so that we can unify the binary and source builds. That way we wouldn't have all the duplication between default.nix and bin.nix.

I don't think the bdist_wheel step alone was responsible for the build failures, but who knows.

abbradar · 2019-07-14T22:10:10Z

The primary reason I switched back to the previous behavior was to ensure fixupPhase runs on unpacked Tensorflow libraries. This ensures RPATH is set up correctly.

yorickvP · 2019-07-15T09:22:53Z

Will this work with bazel 0.28.0? #64633

timokau · 2019-07-15T16:56:45Z

My PR had some issues with bazel 0.27, not sure if @abbradar worked around those. But the bazel team has promised no further backwards incompatibilities for the next 3 releases (then they'll break backwards compatibility once more and release 1.0).

abbradar · 2019-07-15T17:05:52Z

But the bazel team has promised no further backwards incompatibilities for the next 3 releases (then they'll break backwards compatibility once more and release 1.0).

Nice news! I didn't test 0.28 though yet, wanted to do it today but sadly I won't have time for one more rebuild.

some issues with bazel 0.27, not sure if @abbradar worked around those

Huh, didn't notice anything. Maybe you remember something specific? I apply patch for 0.27 from upstream and also add a flag to enable backwards compatibility with some behavior.

abbradar · 2019-07-17T08:36:45Z

Now that other parts are here I'll wait till staging goes into master and merge this (so that we don't break Python 2 + Tensorflow).

timokau · 2019-07-17T11:23:05Z

Why not just retarget this PR to staging and merge now? Then we'll get it automatically on the next staging-next merge.

abbradar · 2019-07-17T11:56:01Z

Well, I usually avoid crowding staging with not-mass-rebuildy PRs to ease bisect in case it's needed (searching for problems among mass rebuild commits can be painful).

…

On July 17, 2019 2:23:15 PM GMT+03:00, Timo Kaufmann ***@***.***> wrote: Why not just retarget this PR to staging and merge now? Then we'll get it automatically on the next staging-next merge. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #64716 (comment)

-- Nikolay.

timokau · 2019-07-17T14:20:21Z

Fair enough.

timokau · 2019-07-18T14:12:32Z

I tried to build tensorflow with SSE4.1 support, but unfortunately the build failed. Apparently that has to be specified at configure time now. This is my attempt to get it working:
https://github.com/timokau/nixpkgs/commits/3323284
The patch to buildBazelPackages is necessary since tensorflow won't accept the --config=opt flag for fetching.

Feel free to cherry-pick and/or rebase.

This also changes it to a from-source build.

This merges work done by yorickvP and timokau in NixOS#63208 and NixOS#63616 respectively. Now the derivation builds both libtensorflow and the Python package and puts them into different outputs. Quite a bit of improvements were done on the top, including: * Use official tag revision as source, not a branch; * Use all system libraries possible (before only one was actually used); * Move various environment variables to the derivation itself from hooks; * Use source Python build instead of wheel build to ensure fixup hooks do their important jobs on libraries; * And more that I forgot!

They sometimes take separate flags.

Now need to be passed in the configure phase. abbradar: Don't change CUDA build hash.

timokau · 2019-08-24T10:48:30Z

@abbradar in case you haven't noticed yet, the tensorflow build (py2 and py3, but there are different issues for py2 and py3 I think) is currently failing. Unfortunately I don't have access to the necessary compute resources to debug this right now.

abbradar · 2019-08-27T06:43:47Z

Fixed in 7109546. Thanks for the heads up!

abbradar requested review from FRidh and Profpatsch as code owners July 13, 2019 22:32

ofborg bot added the 6.topic: python label Jul 13, 2019

abbradar requested a review from timokau July 13, 2019 22:48

abbradar force-pushed the tensorflow-revive branch from df1b94b to 568648c Compare July 14, 2019 07:25

ofborg bot added 8.has: clean-up 11.by: package-maintainer 10.rebuild-darwin: 1-10 10.rebuild-linux: 11-100 labels Jul 14, 2019

abbradar force-pushed the tensorflow-revive branch from 568648c to 45298f1 Compare July 14, 2019 07:59

yorickvP approved these changes Jul 14, 2019

View reviewed changes

pkgs/build-support/build-bazel-package/default.nix Outdated Show resolved Hide resolved

yorickvP mentioned this pull request Jul 14, 2019

libtensorflow: 1.9 -> 1.14.0 #63208

Closed

10 tasks

abbradar force-pushed the tensorflow-revive branch from 45298f1 to 7b5a242 Compare July 14, 2019 17:28

abbradar closed this Jul 14, 2019

abbradar reopened this Jul 14, 2019

abbradar force-pushed the tensorflow-revive branch 2 times, most recently from 81651a2 to 388a1ed Compare July 25, 2019 13:24

ofborg bot added the 2.status: merge conflict label Jul 25, 2019

abbradar force-pushed the tensorflow-revive branch from 388a1ed to 5f49e3c Compare July 31, 2019 08:34

ofborg bot removed the 2.status: merge conflict label Jul 31, 2019

timokau and others added 9 commits July 31, 2019 13:28

tensorflow: re-enable build from source [WIP]

3df4e2d

libtensorflow: 1.9 -> 1.14.0

2e46ae0

This also changes it to a from-source build.

python.pkgs.tensorflow: cleanup binary build

170dd55

libtensorflow: add binary build and add automatic generation

0a1bf47

buildBazelPackage: add flags for build and fetch

e458a34

They sometimes take separate flags.

python.pkgs.tensorflow: fix optimization flags

d30ec1a

Now need to be passed in the configure phase. abbradar: Don't change CUDA build hash.

python.pkgs.tensorflow: update dependencies hash

19cdfe8

tensorflow: expose binary builds

cd0e461

abbradar force-pushed the tensorflow-revive branch from 5f49e3c to cd0e461 Compare July 31, 2019 10:34

ofborg bot added the 8.has: package (new) label Jul 31, 2019

ofborg bot requested a review from basvandijk July 31, 2019 10:45

abbradar merged commit 6ee9799 into NixOS:master Jul 31, 2019

globin mentioned this pull request Aug 4, 2019

libtensorflow: 1.9.0 -> 1.10.1 #47025

Closed

timokau mentioned this pull request Aug 8, 2019

[WIP] tensorflow: re-enable build from source #63616

Closed

10 tasks

yorickvP mentioned this pull request Aug 31, 2019

bazel build of tensorflow fails #42809

Closed

jyp mentioned this pull request Sep 1, 2019

Tensorflow crashes on GPU #67257

Closed

timokau mentioned this pull request Sep 21, 2019

Enable unbundling dependencies and linking to the system libraries instead. tensorflow/tensorflow#20284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build Tensorflow and libtensorflow from source #64716

Build Tensorflow and libtensorflow from source #64716

abbradar commented Jul 13, 2019 •

edited

abbradar commented Jul 13, 2019

abbradar commented Jul 13, 2019

abbradar commented Jul 13, 2019

abbradar commented Jul 14, 2019

danieldk commented Jul 14, 2019 •

edited

yorickvP commented Jul 14, 2019

yorickvP left a comment

timokau commented Jul 14, 2019

abbradar commented Jul 14, 2019

timokau commented Jul 14, 2019

abbradar commented Jul 14, 2019

timokau commented Jul 14, 2019

abbradar commented Jul 14, 2019

yorickvP commented Jul 15, 2019

timokau commented Jul 15, 2019

abbradar commented Jul 15, 2019

abbradar commented Jul 17, 2019

timokau commented Jul 17, 2019

abbradar commented Jul 17, 2019 via email

timokau commented Jul 17, 2019

timokau commented Jul 18, 2019

timokau commented Aug 24, 2019

abbradar commented Aug 27, 2019

Build Tensorflow and libtensorflow from source #64716

Build Tensorflow and libtensorflow from source #64716

Conversation

abbradar commented Jul 13, 2019 • edited

Motivation for this change

Things done

abbradar commented Jul 13, 2019

abbradar commented Jul 13, 2019

abbradar commented Jul 13, 2019

abbradar commented Jul 14, 2019

danieldk commented Jul 14, 2019 • edited

yorickvP commented Jul 14, 2019

yorickvP left a comment

Choose a reason for hiding this comment

timokau commented Jul 14, 2019

abbradar commented Jul 14, 2019

timokau commented Jul 14, 2019

abbradar commented Jul 14, 2019

timokau commented Jul 14, 2019

abbradar commented Jul 14, 2019

yorickvP commented Jul 15, 2019

timokau commented Jul 15, 2019

abbradar commented Jul 15, 2019

abbradar commented Jul 17, 2019

timokau commented Jul 17, 2019

abbradar commented Jul 17, 2019 via email

timokau commented Jul 17, 2019

timokau commented Jul 18, 2019

timokau commented Aug 24, 2019

abbradar commented Aug 27, 2019

abbradar commented Jul 13, 2019 •

edited

danieldk commented Jul 14, 2019 •

edited