Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch: Move cudatoolkit to nativeBuildInputs #60002

Merged
merged 1 commit into from Apr 27, 2019

Conversation

andersk
Copy link
Contributor

@andersk andersk commented Apr 22, 2019

Motivation for this change

nvcc must be available in PATH at build time; otherwise CUDA support will be disabled.

This is a more minimal version of #57438.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

nvcc must be available in PATH at build time; otherwise CUDA support
will be disabled.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Copy link
Contributor

@teh teh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jyp
Copy link
Contributor

jyp commented Apr 26, 2019

I tried your patch, but unfortunately the build did not go through. Here is the tail of the build:

[ 79%] Building CXX object caffe2/CMakeFiles/caffe2_gpu.dir/queue/queue_ops_gpu.cc.o
[ 80%] Building CXX object caffe2/CMakeFiles/caffe2_gpu.dir/sgd/iter_op_gpu.cc.o
[ 80%] Building CXX object caffe2/CMakeFiles/caffe2_gpu.dir/sgd/learning_rate_op_gpu.cc.o
[ 80%] Linking CXX shared library ../lib/libcaffe2_gpu.so
impure path `/usr/local/cuda/lib/libcudnn.so.7' used in link
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/build.make:4687: lib/libcaffe2_gpu.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:5422: caffe2/CMakeFiles/caffe2_gpu.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
setup.py::build_deps::run()
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-mkldnn --use-qnnpack caffe2'
builder for '/nix/store/i5n4iqqk9kzkhhn4a8v4bnm3b00f3k64-python3.7-pytorch-1.0.0.drv' failed with exit code 1
cannot build derivation '/nix/store/5mhc9w2bi01f0p1jpb0j8fc9hp76pziy-python3-3.7.3-env.drv': 1 dependencies couldn't be built
error: build of '/nix/store/5mhc9w2bi01f0p1jpb0j8fc9hp76pziy-python3-3.7.3-env.drv' failed

@andersk
Copy link
Contributor Author

andersk commented Apr 26, 2019

@jyp It sounds like you’ve run into a different problem: the Nix package expects to use the packaged CUDA and cuDNN in /nix/store, but you have a local copy installed in /usr/local/cuda. Can you provide some more context? Are you using NixOS or something else? Why do you have a /usr/local/cuda, and does it work if you remove that? What happens without the patch? What happens with only the first hunk of the patch (i.e. add cudatoolkit_joined to nativeBuildInputs without removing it from buildInputs)?

@FRidh
Copy link
Member

FRidh commented Apr 27, 2019

Clearly sandboxing was disabled in @jyp's build.

@FRidh FRidh merged commit 27d1362 into NixOS:master Apr 27, 2019
@jyp
Copy link
Contributor

jyp commented Apr 29, 2019

Indeed it works using sandbox. (I was confused by #51671)

@mtn
Copy link

mtn commented May 30, 2019

I'm here after reading this and a few related issues (eg. #51671). Here is my shell.nix -- on first dropping into the this shell, PyTorch was built form source:

with import <nixpkgs> {};

let
  py = pkgs.python37;
in
stdenv.mkDerivation rec {
  name = "python-environment";

  buildInputs = [
    py
    py.pkgs.matplotlib
    py.pkgs.tkinter
    py.pkgs.numpy
    py.pkgs.pytorchWithCuda
  ];
}

However, during the build, there were several messages like "CUDA not available, skipping tests". Maybe relatedly, there were errors like "Error in cpuinfo: failed to parse the list of present procesors in /sys/devices/system/cpu/present". I can provide a full log if need be.

Additional outputs:

$LD_LIBRARY_PATH=/run/opengl-driver/lib
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27       Driver Version: 415.27       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+

In a python session:

>>> import torch
>>> torch.cuda.is_available()
False
>>>

I'm still pretty new to nix, so I might be doing something wrong. Should this have worked/what can I do to fix it?

@jyp
Copy link
Contributor

jyp commented May 30, 2019

@mtn These errors were fixed for me when using a sandboxed build.

@mtn
Copy link

mtn commented May 30, 2019

How can I do this, and can I still use nix-shell? I'm finding this issue hard to parse: #903. Or maybe there's a easier way I should be building and using this? I just want the result to be isolated.

Edit: @jyp I'm running nixos 19.03, so shouldn't sandboxing be happening by default?

@haskelious
Copy link
Contributor

I have the same problem as @mtn and I have no idea how to resolve on 19.03

@OmnipotentEntity
Copy link
Contributor

OmnipotentEntity commented Jul 6, 2019

@mtn @fkstef I just ran into this problem and spent a long time trying to solve it. The answer is actually quite simple. This fix has not been backported to 19.03. You can either pin to this fix using pinning or use an overrideAttr directive. I did the second, but the first might be more useful if you're planning on a long term thing.

For instance, here is an example shell.nix

let
  pkgs = import <nixpkgs> {};
  pytorch-cuda = pkgs.python37Packages.pytorchWithCuda.overrideAttrs (oldAttrs: {
    nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [ pkgs.cudatoolkit ];
  });
  python3 = pkgs.python3.withPackages (ps: with ps; [opencv4 numpy pytorch-cuda]);

in pkgs.stdenv.mkDerivation (with pkgs; {
  name = "env";

  buildInputs = [
    python3
  ];

})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants