New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Theano cleanup #26580
Theano cleanup #26580
Conversation
postPatch = | ||
let | ||
# Closure for running a command in a stdenv for building with given buildInputs | ||
wrapped = command: buildInputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use makeWrapper
/wrapProgram
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I ran into is that Theano tries to compile the code it generates with just a standard g++ call, and then expects to be able to dynamically load the results into itself. Generally this goes wrong because
-
the system g++ cannot find the required libraries or, even if the first succeeds,
-
the resulting binary is not compatible with running Theano (i.e., depends on the system versions of common libraries like glibc which, especially if you are not using NixOS, may not match the ones Theano was compiled with).
Ideally I think you would want the runtime builds to execute in the same sort of environment that build-time builds were done in. Hence the above code to capture an appropriate stdenv. I emailed the nix-dev mailing list about this earlier, but haven't received any replies.
It would be possible to further patch Theano or use makeWrapper
/wrapProgram
to add the appropriate compiler flags so it can find its libraries. I think the above may be more elegant and less maintenance though as you automatically get the full build environment (i.e., the closure of the required includes, the setting of the appropriate NIX_* flags to enforce compiler purity, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this sort of function could be added the standard libraries?
I say that as I think there are actually quite a few places it could be used. For example, I recently packaged up COFFEE which has similar issues (another python package that compiles code on the fly and then dynamically loads it into itself).
I also noticed in another package that behind the scenes nvcc was being called and it was using my system gcc. This was failing as nvcc wasn't compatible with the system gcc version. Ideally a Nix provided nvcc should be using a Nix provided gcc determined at the installation time of the nvcc.
''; | ||
|
||
in '' | ||
sed -i -e 's|@g++@|${wrapped "g++" [ libgpuarray ]}|' theano/configdefaults.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nicer to create a new variable, say compiler = ${wrapped "g++" [ libgpuarray ]}
and then modify the patch to use @compiler@
(or @stdenv.cc@
).
# Test whether or not g++ is present: disable C code if it is not. | ||
try: | ||
- rc = call_subprocess_Popen(['g++', '-v']) | ||
+ rc = call_subprocess_Popen(['@g++@', '-v']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly using g++
is indeed not going to work because with Nix we wrap our compilers as you've noticed. So its likely best here to just use stdenv.cc
(though that might not be ideal on Darwin since they use LLVM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a link to the relevant bit in the patched file. As you can see, they treat the Mac case specially by trying to run clang++
.
Perhaps the patch should delete this compiler determination mess and replace it with
param = "@compiler@"
to be substituted at build time by according to the Nix expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, purge that mess :)
19f2309
to
2d6e314
Compare
|
2d6e314
to
158dc28
Compare
nvcc wants the bin directory, had accidentally given it the top level directory. |
Noticed the non-gpu version had been updated to 0.9.0. Added an update of the combined to 0.9.0. Not sure why the travis failed on the prior. Some sort of error about the python dogpile cache module. Suspect it may have have been some (hopefully transient) issue with the travis system/scripts. |
Something is not good yet. When I build |
Thanks for the heads up @FRidh. I'll have a look and fix it up. |
a74f55c
to
ba58738
Compare
@FRidh I believe I have address the issue. It was as you had figured (dependencies added by the patches). I hadn't realized that |
''; | ||
|
||
# Theano spews warnings and disabled flags if the compiler isn't named g++ | ||
cxx_compiler = wrapped "g++" [ libgpuarray cudnn ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a cuda dependency still exists because of this here
cxx_compiler = wrapped "g++" [ libgpuarray cudnn ]; | ||
in '' | ||
substituteInPlace theano/configdefaults.py \ | ||
--subst-var-by cxx_compiler '${cxx_compiler}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
Yes. You are correct. I thought it was good for sure when I checked the derivation file and verified there were no cuda related items in it. Should have checked the closure of the built package. Sorry about all the back and forth and thanks for the prompt reviews. |
No problem. It can be hard, especially in the beginning, to see what's all happening. Just check with |
ba58738
to
1b6b6f7
Compare
Sorry for the delay. Had to make some serious progress on some other projects. I see master has now picked up its own merged CUDA and non-CUDA versions. I gave this one a try and it doesn't work. Here is the test code (lifted from the Theano GPU documentation) cat temp.py from theano import function, config, shared, tensor
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu') commands (note that nix-build -E '(import <nixpkgs> { }).TheanoWithoutCuda'
THEANO_FLAGS='device=cuda' ./result/bin/python temp.py
I've merged it with mine to produce something that works. I also finally fixed the dependencies so the non-CUDA one is no longer needlessly pulling in CUDA dependencies. Thanks! -Tyson PS: At least the following problems exist in the merged Theano in master which are addressed with this pull
|
@FRidh thanks for the tip about |
@FRidh Any thoughts on this one? Not sure what the grahamcofborg-eval thing is. |
Is this related in any way? #33073 |
Any thoughts on this anyone? Thanks! |
Another month so figured I would post again. Any chance this can be merged? As far as I know, it's good to go. Thanks! -Tyson |
Been a couple of months again now, so posting again. Could someone please give me some directions on how to proceed here? Do I have to close this and open a new ticket? Thanks! -Tyson |
It would be nice if the history would be "fixed". e3f2c12 seems to be undone again (partially) somewhere. |
Other than that, the final changes look fine to me. |
Looks like master bumped the Theano version back on December 30th so the patches don't apply cleanly anymore. I'll have a look into it. @FRidh was there anything particular you were wanting fixed up about the history (other than making sure the cuda and non-cuda version are unified in the final version)? |
@twhitehead master should not be merged into your changes. Instead, your changes should be rebased onto it. |
Ok. I'm going to do a merge squash back onto master and then force push that back to the pull request. I believe this should give one simple commit on top of master instead of the current very complex history. |
Original was a mix of config changes and code changes with a search and replace that also changed unintended bits such as messages.
b235f19
to
303a3f9
Compare
@FRidh I believe this should finally be good. I've brought it back up to date so grahamcofborg is now happy, and I've replaced the convoluted history with just three self-contained commits on the top of master. |
Motivation for this change
The first patch merges the cuda and non-cuda derivations. This removes a lot of duplication and was mentioned in the original commit 07dcc4f as something that should be done.
The second patch bakes the build-time compiler into Theano for it to use when it compiles up and loads it EDSL. Without this, it uses whatever compiler is current in the environment. This can cause issues, especially in the non-NixOS case where the system compiler is often not a good choice.
I've built and tested this against the 17.03 release. It wouldn't build against master for me due to issues with the dependencies unreleated to these patches.
Things done
(nix.useSandbox on NixOS,
or option
build-use-sandbox
innix.conf
on non-NixOS)
nix-shell -p nox --run "nox-review wip"
./result/bin/
)