Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnnl: init at 1.1.2 #74893

Merged
merged 1 commit into from Jan 24, 2020
Merged

dnnl: init at 1.1.2 #74893

merged 1 commit into from Jan 24, 2020

Conversation

bhipple
Copy link
Contributor

@bhipple bhipple commented Dec 3, 2019

This commit continues the work proposed in #68014, and provides an entirely FOSS
variant of the dnnl package. Updates to add an mkl flavor will be added
later, pending discussion about the cleanest way to overlay this consistently.

Fixes #67982, closes #68014

Co-authored-by: Alex Rice alexrice999@hotmail.co.uk

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @alexarice @jonringer @stites @Ericson2314

@bhipple
Copy link
Contributor Author

bhipple commented Dec 3, 2019

The mklSupport attribute was chosen specifically because that's what pytorch happens to use; in #74894 I'm working on generalizing this across the stack so it can be consistently overlaid.

@stites stites mentioned this pull request Dec 9, 2019

self: super: {
mklSupport = true;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested that:

  1. By default, it builds without MKL, and works even if you have disallowed all unfree packages
  2. This overlay enables MKL, which fails up-front if unfree packages are disabled (as expected)
  3. With this overlay and unfree packages enabled, everything works with MKL as expected:
λ brh nixpkgs →  nix-build -A dnnl
...
-- Intel(R) MKL: include /nix/store/km760fljn77c4d8wkdxchf3j21zgdlmw-mkl-2019.5.281/include
-- Intel(R) MKL: lib /nix/store/km760fljn77c4d8wkdxchf3j21zgdlmw-mkl-2019.5.281/lib/libmkl_rt.so
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Could NOT find Git (missing: GIT_EXECUTABLE)
-- Configuring done
...

@jonringer
Copy link
Contributor

@FRidh what about a different package set where mkl is enabled? kinda like pkgsMusl where the overlay just make all the packages which support mkl now build with mkl.

something like:

python3Packages.pkgsMKL

variable in an overlay:

self: super: {
mklSupport = true;
Copy link
Member

@FRidh FRidh Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should encourage setting non-derivation top-level attributes. It is convenient though because callPackage picks it up. Not sure what would be better but would achieve the same.

@FRidh
Copy link
Member

FRidh commented Dec 16, 2019

A top-level pkgsMkl seems fine to me although one needs to ask: where do we draw the line?

@jonringer
Copy link
Contributor

jonringer commented Dec 16, 2019

That's fair, it will be awkward if we had something like:

    python3Packages.pkgsMkl.pkgsCuda

then again, this would be opposed to:

  (python3Packages.override {
    MklSupport = true;
    CudaSupport = true;
 })

@bhipple
Copy link
Contributor Author

bhipple commented Dec 19, 2019

If you could give a hint about how this would be implemented, I'd be happy to give it a shot in #74894, then update this PR accordingly.

I see that the top of top-level/python-packages.nix has an overrides attr. Should we hook in there?

One consideration though is that compute-optimized libraries are wider than just the python package set. This dnnl library, for instance, has no python dependency, so it feels to me like we need the overlay at the top-level, not the python level (as in pkgsMusl or pkgsStatic).

@jonringer
Copy link
Contributor

jonringer commented Dec 19, 2019

the overrides attr is to allow for people to create a new python package set when wrapping an application like with azure-cli

If this did happen, then it would probably involve merging #67422 and using the overrideScope to create new package sets were withMklSupport = true

  # definition in top-level/python-packages.nix
  pkgsMkl = self.overrideScope( self: super: { withMklSupport = true; });
  # usage
  propagatedBuildInputs = with python.pkgs.pkgsMkl [ numpy pandas ];

the overrideScope is also nice, because the overlay construct allows for composition

  # usage
  propagatedBuildInputs = with python.pkgs.pkgsCuda.pkgsMkl [ tensorflow pandas ];

and, we would get a tensoflow-gpu package from it

Edit: cc @infinisil thoughts?

description = "Deep Neural Network Library (DNNL)";
homepage = "https://intel.github.io/mkl-dnn/dev_guide_transition_to_dnnl.html";
license = licenses.asl20;
platforms = platforms.all;
Copy link
Contributor

@jonringer jonringer Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
platforms = platforms.all;
platforms = platforms.x86_64;

adds x86 specific extensions to compiler:

g++: error: unrecognized command line option '-msse4.1'
g++: error: unrecognized command line option '-msse4.1'
g++: error: unrecognized command line option '-msse4.1'

also:

  DNNL supports 64 bit platforms only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly

@jonringer
Copy link
Contributor

@GrahamcOfBorg build pkgsi686Linux.dnnl

@bhipple bhipple changed the title dnnl: init at 1.1.1 dnnl: init at 1.1.2 Jan 9, 2020
@bhipple bhipple requested a review from jonringer January 9, 2020 04:06
@bhipple
Copy link
Contributor Author

bhipple commented Jan 9, 2020

@GrahamcOfBorg build dnnl

@bhipple
Copy link
Contributor Author

bhipple commented Jan 9, 2020

I've removed the optional dependency on the proprietary mkl package pending some further investigation and just left the very simple, straightforward FOSS package build for now. This is the one Hydra will build and most users will want to use.

I've also bumped it to the latest release and restricted it to 64-bit, per above suggestion.

@jonringer
Copy link
Contributor

@GrahamcOfBorg build dnnl

@bhipple
Copy link
Contributor Author

bhipple commented Jan 11, 2020

@GrahamcOfBorg build dnnl

It appears there's a failure in a test suite on Darwin, so I've upgraded to just Linux x86_64-linux for now, as I don't have a Darwin machine to debug.

@jonringer
Copy link
Contributor

@GrahamcOfBorg build dnnl

Copy link
Contributor

@jonringer jonringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

Comment on lines 30 to 33
addAutoPatchelfSearchPath $PWD/src
autoPatchelf tests examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like:

  export LD_LIBRARY_PATH=$PWD/src/build

would be more appropriate and drop the autopatchelf

generally autopatchelf is to help with precompilied binaries, not test binaries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started digging into it, but I wasn't able to get this to work with LD_LIBRARY_PATH. It looks like the test driver re-launches with a subprocess or something that cleans out the var (???) and it doesn't get caught, hence the original comment by @alexarice about the LD_LIBRARY_PATH env var not working.

I did upstream the patch on the shell to Intel, who says they'll release it in 1.2.0, so at least we can clean that part of the packaging up eventually.

@GrahamcOfBorg build dnnl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm a liar: the tests do build, run, and pass just fine without the autopatchelf hook and just

  preCheck = ''
    export LD_LIBRARY_PATH=$PWD/src
  '';

but we then later run into issues at installation and dependency resolution time . . .

checking for references to /build/ in /nix/store/3g8jhc6iwipb22a03grlif5y6bddb09j-dnnl-1.1.2-doc...
automatically fixing dependencies for ELF files
searching for dependencies of /nix/store/yixg9d3dfryiypffdn90zl30z6kk0yz7-dnnl-1.1.2/lib/libdnnl.so.1.1
  libstdc++.so.6 -> not found!
  libgomp.so.1 -> not found!
builder for '/nix/store/11bqvbfw2a3sxjpyfg3fycva2mxm408i-dnnl-1.1.2.drv' failed with exit code 1
error: build of '/nix/store/11bqvbfw2a3sxjpyfg3fycva2mxm408i-dnnl-1.1.2.drv' failed

Since we actually depend on the patchelf hook for mutating the installed binaries themselves, I will update the comment to accurately describe the state of the world!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that's the cause, then do the autoPatchelfHook, but let it do it's thing during the fixup phase instead of the installCheck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claim of being a lie was itself a lie, due to an unrelated issue. This takes a while to rebuild and re-run the tests on my laptop, so by the time it finished with a failure I had forgotten what I was attempting :)

The tests actually pass just fine with a simple LD_LIBRARY_PATH env var set, and the installed artifacts look good, so I think what we've got right now in the PR is the optimal fix in terms of both simplicity for test running, clarity of maintenance, and avoiding pollution of the installed artifacts for fixing test issues.

This commit continues the work proposed in NixOS#68014, and provides an entirely FOSS
variant of the `dnnl` package. Updates to add an `mkl` flavor will be added
later, pending discussion about the cleanest way to overlay this consistently.

Fixes NixOS#67982, closes NixOS#68014

Co-authored-by: Alex Rice <alexrice999@hotmail.co.uk>
@bhipple
Copy link
Contributor Author

bhipple commented Jan 12, 2020

A quick scan of the libs reveals no broken RPaths, so we weren't relying on the patchelf helper for anything here:

λ brh nixpkgs →  ldd ./result/lib/*.so
        linux-vdso.so.1 (0x00007ffd981a9000)
        libpthread.so.0 => /nix/store/aag9d1y4wcddzzrpfmfp9lcmc7skd7jk-glibc-2.27/lib/libpthread.so.0 (0x00007f8477bb7000)
        libdl.so.2 => /nix/store/aag9d1y4wcddzzrpfmfp9lcmc7skd7jk-glibc-2.27/lib/libdl.so.2 (0x00007f8477bb2000)
        libstdc++.so.6 => /nix/store/faybw5875yia1sg36sv84gjvfrwn7q8f-gcc-9.2.0-lib/lib/libstdc++.so.6 (0x00007f84779d2000)
        libm.so.6 => /nix/store/aag9d1y4wcddzzrpfmfp9lcmc7skd7jk-glibc-2.27/lib/libm.so.6 (0x00007f847783c000)
        libgomp.so.1 => /nix/store/faybw5875yia1sg36sv84gjvfrwn7q8f-gcc-9.2.0-lib/lib/libgomp.so.1 (0x00007f8477807000)
        libgcc_s.so.1 => /nix/store/aag9d1y4wcddzzrpfmfp9lcmc7skd7jk-glibc-2.27/lib/libgcc_s.so.1 (0x00007f84775ef000)
        libc.so.6 => /nix/store/aag9d1y4wcddzzrpfmfp9lcmc7skd7jk-glibc-2.27/lib/libc.so.6 (0x00007f8477439000)
        /nix/store/wx1vk75bpdr65g6xwxbj4rw0pk04v5j3-glibc-2.27/lib64/ld-linux-x86-64.so.2 (0x00007f8478c9f000)

I believe this should be good to go now . . . LMK @jonringer and thanks for all the help so far!

@bhipple
Copy link
Contributor Author

bhipple commented Jan 24, 2020

@jonringer I'd like to tentatively claim that all code review comments have been addressed. When you get a chance would you mind taking a look?

Copy link
Contributor

@jonringer jonringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff LGTM
commit LGTM
outputs LGTM
has tests 👍

[4 built, 0.0 MiB DL]
https://github.com/NixOS/nixpkgs/pull/74893
1 package built:
dnnl

@jonringer jonringer merged commit d411938 into NixOS:master Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package request: mkldnn
3 participants