Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch-0.3 with cuda and cudnn #32438

Closed
wants to merge 2 commits into from
Closed

pytorch-0.3 with cuda and cudnn #32438

wants to merge 2 commits into from

Conversation

akamaus
Copy link
Contributor

@akamaus akamaus commented Dec 8, 2017

Motivation for this change

Currently pytorch derivation lacks cuda support. After introducing cudatoolkit9 supporting gcc6 it became easy to add cuda-related flags.

Notes

Unfortunately cuda-enabled pytorch tests fail to detect driver if run by nix builder. Still they can be run manually after package installation. Just run unpackPhase and execute the script.

Had to provide a fixed commit fix hash for revision, without it getFetchFromGithub doesn't download required submodules.

FRidh
FRidh previously requested changes Dec 9, 2017
checkPhase = ''
${stdenv.shell} test/run_test.sh
'';
preConfigure = if cudnn != null then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib.optionalString

utillinux
which
] ++ lib.optionals cudaSupport
(lib.remove null [cudatoolkit cudnn]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to remove null

cudnn = pkgs.cudnn;
};

pytorchWithCuda = self.pytorch.override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add extra attributes when its a matter of overriding and setting cudaSupport=true;?

@akamaus
Copy link
Contributor Author

akamaus commented Dec 19, 2017

done

'';

doCheck = false; # for some unknown reason doesn't detect cuda if run from builder user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests have to be disabled for pytorchWithoutCuda too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, not.

@tcosmo
Copy link

tcosmo commented Jan 13, 2018

Hey, sorry I am a huge noob with NixOs, how can I use your fork in my nixpkgs so that I can finally run pytorch with cuda like calling something like : nix-env -iA nixos.python35Packages.pytorchWithCuda
Thank you so much,

@akamaus
Copy link
Contributor Author

akamaus commented Jan 14, 2018

@tcosmo, personally I just cherry-picked top two commits to channels/nixos-17.09. Now I can run pytorch like this:

nix-shell -p python3Packages.ipython -p python3Packages.pytorchWithCuda

You may find this link useful: http://anderspapitto.com/posts/2015-11-01-nixos-with-local-nixpkgs-checkout.html

@FRidh
Copy link
Member

FRidh commented Jan 14, 2018

cc maintainer @teh

@tcosmo
Copy link

tcosmo commented Jan 14, 2018

@akamaus Thank you very much for your answer!!

Copy link
Contributor

@teh teh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

'';

doCheck = !cudaSupport; # for some unknown reason doesn't detect cuda if run from builder user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the check failure be caused by group permissions? I think to use cuda you need to be in the video group? In any case disabling for the cuda case with comment is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure. Users working with Cuda on my machine don't belong to video group. DISPLAY variable doesn't make any difference too.
Permissions for nvidia device are as follows:

% ls /dev/nvidia0 -l
crw-rw-rw- 1 root root 195, 0 jan 14 18:35 /dev/nvidia0

@andersk
Copy link
Contributor

andersk commented Apr 3, 2018

This doesn’t build when merged to current master (or channels/nixos-18.03).

In file included from /nix/store/gv7w3c71jg627cpcff04yi6kwzpzjyap-cudatoolkit-9.1.85.1/include/host_config.h:50:0,
                 from /nix/store/gv7w3c71jg627cpcff04yi6kwzpzjyap-cudatoolkit-9.1.85.1/include/cuda_runtime.h:78,
                 from <command-line>:0:
/nix/store/gv7w3c71jg627cpcff04yi6kwzpzjyap-cudatoolkit-9.1.85.1/include/crt/host_config.h:121:2: error: #error -- unsupported GNU version! gcc versions later than 6 are not supported!
 #error -- unsupported GNU version! gcc versions later than 6 are not supported!
  ^~~~~

@teh
Copy link
Contributor

teh commented Apr 3, 2018

Another issue: The non-cuda build seems to require cuda. I think it's down to this line:

preConfigure = lib.optionalString (cudnn != null) "export CUDNN_INCLUDE_DIR=${cudnn}/include";

@teh
Copy link
Contributor

teh commented Apr 3, 2018

Another one: multiprocessig tests fail with

RuntimeError: refcounted file mapping not supported on your system at /tmp/nix-build-python3.6-pytorch-0.3.0.drv-0/source/torch/lib/TH/THAllocator.c:525

I think the optionalString has to be sth like this:

preConfigure = lib.optionalString (cudaSupport && cudnn != null) "export CUDNN_INCLUDE_DIR=${cudnn}/include";

@akamaus you still interested in updating this PR? Might be worth updating to 0.3.1 as well.

@andersk
Copy link
Contributor

andersk commented Apr 6, 2018

I opened a new PR #38530 based on this, addressing all the above problems.

@FRidh
Copy link
Member

FRidh commented Apr 7, 2018

Closing in favor of #38530.

@FRidh FRidh closed this Apr 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants