Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch: fix CUDA support #57438

Closed
wants to merge 1 commit into from
Closed

pytorch: fix CUDA support #57438

wants to merge 1 commit into from

Conversation

dnaq
Copy link
Contributor

@dnaq dnaq commented Mar 11, 2019

This commit fixes CUDA support when building with allowUnfree = true and cudaSupport = true. The previous change to pytorch.nix
built, but at runtime cuda support didn't work.

This is a work in progress, the test-suite still doesn't find CUDA, so no CUDA-tests are made of the compiled package.
Also the list of packages in nativeBuildInputs and propagatedBuildInputs are probably wrong, due to a lack of understanding from my side.

Motivation for this change

CUDA support didn't work in the previous version.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

This commit fixes CUDA support when building with `allowUnfree = true` and `cudaSupport = true`. The previous change to pytorch.nix
built, but at runtime cuda support didn't work.

This is a work in progress, the test-suite still doesn't find CUDA, so no CUDA-tests are made of the compiled package.
Also the list of packages in `nativeBuildInputs` and `propagatedBuildInputs` are probably wrong, due to a lack of understanding from my side.
] ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ]
++ lib.optionals stdenv.isLinux [ numactl ];

propagatedBuildInputs = [
cffi
numpy.blas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unlikely to be correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@@ -79,20 +79,19 @@ in buildPythonPackage rec {

nativeBuildInputs = [
cmake
numpy.blas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably needs to be in both nativeBuildInputs and buildInputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don’t really know what packages need to be in buldInputs, nativeBuildInputs or propagatedBuildInputs. This is just something that builds and solves my immediate need. I’d be happy to modify it as needed, but given that each build of pytorch takes a couple of hours I don’t really have the time for a lot of trial and error.

@smatting
Copy link
Contributor

smatting commented Apr 5, 2019

Thanks @dnaq!
I can confirm that with @FRidh proposed changes also build and works.

Here is what I have tested:

  nativeBuildInputs = [
     cmake
     utillinux
     which
     numpy.blas
  ] ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ]
    ++ lib.optionals stdenv.isLinux [ numactl ];

  buildInputs = [
     numpy.blas
  ] ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ]
    ++ lib.optionals stdenv.isLinux [ numactl ];

  propagatedBuildInputs = [
    cffi
    numpy
    pyyaml
    numpy.blas
  ] ++ lib.optional (pythonOlder "3.5") typing
    ++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn ];

Could you please add this as a comment on the top of the package?

# NOTE: To be able to use the CUDA version of this package,
# you need to manually load the CUDA library from your installed nvidia driver.
# On a NixOs machine this can be done by adding
#
# environment.variables = {
#     LD_PRELOAD = "${pkgs.linuxPackages.nvidia_x11}/lib/libcuda.so:${pkgs.linuxPackages.nvidia_x11}/lib/libnvidia-fatbinaryloader.so";
# };
#
# to your configuration.nix

@teh
Copy link
Contributor

teh commented Apr 14, 2019

@dnaq @smatting based on this discussion #46032 I sounds like this PR may no longer be needed? Would you mind trying cuda on current master?

@baracoder
Copy link
Contributor

I tried a nix-shell on 7d0db6a
with python36 and pytorchWithCuda on a project,
I am getting AssertionError: Torch not compiled with CUDA enabled

The build on this PR fails for me at some other dependency.

@andersk
Copy link
Contributor

andersk commented Apr 22, 2019

I confirmed that moving cudatoolkit_joined from buildInputs to nativeBuildInputs is sufficient to get CUDA working (torch.cuda.is_available()True, torch.cuda.get_device_name(0)'GeForce GTX 1080'). It’s required to make nvcc available in PATH at build time, or CUDA support is disabled, with a warning buried in the build log:

which: no nvcc in (/nix/store/lmwk7mg8y79m3izdxdlckdn385x7jgl7-python3-3.7.3/bin:/nix/store/r3p6lbws0mp0lp8jwvivl68qcbzdvy8k-python3.7-setuptools-40.8.0/bin:/nix/store/vb8h3l9jvprlb34a0fjw4g6r7dv329ka-cmake-3.13.4/bin:/nix/store/f2bc62h4xcnqhbgppz199aqikxy164jj-util-linux-2.33.1-bin/bin:/nix/store/lil4rsy5ng1dq5232r6xhgfrvjrkmkmf-which-2.21/bin:/nix/store/lmwk7mg8y79m3izdxdlckdn385x7jgl7-python3-3.7.3/bin:/nix/store/409rs332a9qqkg5xd648j0rx01v6f7a7-python3.7-coverage-4.5.2/bin:/nix/store/bd2sn66007fvkvvn2sk3ga69dgpxpsqq-patchelf-0.9/bin:/nix/store/y60j0zq2j50iaaqjn39i18hkhp277zfy-gcc-wrapper-7.4.0/bin:/nix/store/pm4rg0bdiaj5b748kncp9vf7n3x446sd-gcc-7.4.0/bin:/nix/store/f5wl80zkrd3fc1jxsljmnpn7y02lz6v1-glibc-2.27-bin/bin:/nix/store/baylddnb83lh45v3fz15ddhbpxbdb7m7-coreutils-8.31/bin:/nix/store/1n593wk7xhygrxi2nwah6f93ksd4if8i-binutils-wrapper-2.31.1/bin:/nix/store/1kl6ms8x56iyhylb2r83lq7j3jbnix7w-binutils-2.31.1/bin:/nix/store/f5wl80zkrd3fc1jxsljmnpn7y02lz6v1-glibc-2.27-bin/bin:/nix/store/baylddnb83lh45v3fz15ddhbpxbdb7m7-coreutils-8.31/bin:/nix/store/baylddnb83lh45v3fz15ddhbpxbdb7m7-coreutils-8.31/bin:/nix/store/r432g6h0qy7wq18kksdbm9f72h0wx7yv-findutils-4.6.0/bin:/nix/store/2hr6x9f9ivljdr2dkh4sz2wyhmpn8xmc-diffutils-3.7/bin:/nix/store/h67k75i4wm7jkyaan97xzw0g38vm3yxa-gnused-4.7/bin:/nix/store/pyfxqzjkffbs8c0cg28bvspmyb8rvdc8-gnugrep-3.3/bin:/nix/store/b9kmciqh6n9z2b1lg4dlfbh1qzq2pq8z-gawk-4.2.1/bin:/nix/store/4c2akixx0smyz2xbwpfa41bk7gf7rq6f-gnutar-1.31/bin:/nix/store/d9cv4lh32as716x3d9p9ikdh7j2kqrdh-gzip-1.10/bin:/nix/store/plcgyqkiqb599q42cczkqhnrii6pav6w-bzip2-1.0.6.0.1-bin/bin:/nix/store/yg76yir7rkxkfz6p77w4vjasi3cgc0q6-gnumake-4.2.1/bin:/nix/store/yjkch3aia9ny4dq42dbcjrdwqb1y8c33-bash-4.4-p23/bin:/nix/store/xkzym3c0r5368lxs2m9h247c93m0hiv2-patch-2.7.6/bin:/nix/store/5zdqndi3fk72n4drd38wzmgbrqhlaciv-xz-5.2.4-bin/bin)

Possibly other changes may be desirable for cross-compilation; all I know is that this one is necessary for the normal case. Submitted as #60002.

As for @smatting’s comment:

Could you please add this as a comment on the top of the package?

# NOTE: To be able to use the CUDA version of this package,
# you need to manually load the CUDA library from your installed nvidia driver.
# On a NixOs machine this can be done by adding
#
# environment.variables = {
#     LD_PRELOAD = "${pkgs.linuxPackages.nvidia_x11}/lib/libcuda.so:${pkgs.linuxPackages.nvidia_x11}/lib/libnvidia-fatbinaryloader.so";
# };
#
# to your configuration.nix

I didn’t need any such configuration. I simply configured services.xserver.videoDrivers = [ "nvidia" ];, which caused NixOS to add /run/opengl-driver/lib (a symlink to ${nvidia_x11}/lib) to LD_LIBRARY_PATH, which is enough to allow PyTorch to find the needed libraries. Force-loading libraries into every process with LD_PRELOAD may have unintended side effects.

@dnaq
Copy link
Contributor Author

dnaq commented May 11, 2019

Seems like #60002 solves this issue in a more idiomatic way.

@dnaq dnaq closed this May 11, 2019
@dnaq dnaq deleted the pytorch-fix branch May 11, 2019 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants