New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Tensorflow and libtensorflow from source #64716
Conversation
Darwin still uses binary builds for now. I don't have a Darwin machine so I can't test if source build works and fix it if it doesn't - unless someone wants to take on that role! |
(For some reason I cannot request a review from @yorickvP on Github) |
df1b94b
to
568648c
Compare
568648c
to
45298f1
Compare
Updated binary |
It seems that Google now build on newer macOS versions, I had the same problem with Tensorflow 1.13.1 when building on my own MacBook with Mojave. A newer cctools is needed, see #49371. I think this is the relevant PR: #61172 |
This is great! I'm hoping to get a fully static build at some point, but having the same protobuf on tensorflow as the rest of the code should fix our linker errors! :) You can't request a review from me because I don't have commit rights, I suppose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoever built tensorflow upstream must have patched out that saved_model_portable_proto line. it's been in there for several version snow.
As I mentioned in #63616, there are still issues with the python2 build. Seems to be a general problem with parallelism on python2, only manifests itself here because its such a big project and basically hopeless to build without a lot of parallelism (used 12 cores). I already added one patch to python2 (#64067), but I still saw the issue (sometimes, its transient) after that. I suspect the second part of the fix in the python ticket is also needed. That was never backported by debian, so I had to do it myself (being not very experienced with C). I've pushed those changes to the branch of the other PR. Its not very polished, which is why I didn't push it before. I don't have time to work on this or give extensive review right now. But I can answer questions if you have any. |
@timokau I saw that one but IIUC this is a general Python problem, not Tensorflow-specific one, so I didn't really look into it. It shouldn't happen during the "big" Tensorflow build because I only build "source" package now and build the wheel separately with our |
45298f1
to
7b5a242
Compare
Neat, so you only need bazel for the C part and don't need to unnecessarily rebuild everything for libtensorflow, py2 tensorflow and py3 tensorflow? |
Sadly no, you still need to rebuild everything because their C libraries link to different Pythons and I presume use different APIs. Still, in the end you get something that resembles a usual source Python package with |
Oh, so you mean you just skip the final I don't think the |
The primary reason I switched back to the previous behavior was to ensure |
Will this work with bazel 0.28.0? #64633 |
My PR had some issues with bazel 0.27, not sure if @abbradar worked around those. But the bazel team has promised no further backwards incompatibilities for the next 3 releases (then they'll break backwards compatibility once more and release 1.0). |
Nice news! I didn't test 0.28 though yet, wanted to do it today but sadly I won't have time for one more rebuild.
Huh, didn't notice anything. Maybe you remember something specific? I apply patch for 0.27 from upstream and also add a flag to enable backwards compatibility with some behavior. |
Now that other parts are here I'll wait till staging goes into master and merge this (so that we don't break Python 2 + Tensorflow). |
Why not just retarget this PR to staging and merge now? Then we'll get it automatically on the next staging-next merge. |
Well, I usually avoid crowding staging with not-mass-rebuildy PRs to ease bisect in case it's needed (searching for problems among mass rebuild commits can be painful).
…On July 17, 2019 2:23:15 PM GMT+03:00, Timo Kaufmann ***@***.***> wrote:
Why not just retarget this PR to staging and merge now? Then we'll get
it automatically on the next staging-next merge.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#64716 (comment)
--
Nikolay.
|
Fair enough. |
I tried to build tensorflow with SSE4.1 support, but unfortunately the build failed. Apparently that has to be specified at configure time now. This is my attempt to get it working: Feel free to cherry-pick and/or rebase. |
81651a2
to
388a1ed
Compare
388a1ed
to
5f49e3c
Compare
This also changes it to a from-source build.
This merges work done by yorickvP and timokau in NixOS#63208 and NixOS#63616 respectively. Now the derivation builds both libtensorflow and the Python package and puts them into different outputs. Quite a bit of improvements were done on the top, including: * Use official tag revision as source, not a branch; * Use all system libraries possible (before only one was actually used); * Move various environment variables to the derivation itself from hooks; * Use source Python build instead of wheel build to ensure fixup hooks do their important jobs on libraries; * And more that I forgot!
They sometimes take separate flags.
Now need to be passed in the configure phase. abbradar: Don't change CUDA build hash.
5f49e3c
to
cd0e461
Compare
Fixed in 7109546. Thanks for the heads up! |
Motivation for this change
Restore Tensorflow source build while leaving binary builds . Based on @yorickvP and @timokau work.with some parts taken from both.
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)cc @uri-canva for changes regaring Bazel.
Tested both CUDA and non-CUDA builds of the Python library, both source and binary builds. Didn't test libtensorflow, I assume that if Python library works and
ldd
output is okay libtensorflow is okay too.