-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia module: set LD_LIBRARY_PATH #67893
Conversation
It's needed for CUDA driver, which applications are supposed to dlopen() and which is a driver version-dependent.
@abbradar See #67791 and #67791 (comment) |
I guess we'll need to roll back on 19.09 after all. However there are at least two different issues:
* older applications not working without LD_LIBRARY_PATH;
* CUDA broken because libcuda cannot be found.
I don't know any better way to solve the second issue than leaving LD_LIBRARY_PATH enabled for Nvidia (on master too).
EDIT: github layout explodes when replying by mail and even with manual editing I cannot fix it!
|
@ivan This is similar to OpenGL drivers before libglvnd became a thing, we used I'm not sure if we want to start using |
This was done in 370d3af for #11434, but I don't fully understand the rationale. "Currently Mesa and NVidia drivers don't set I believe #11434 (comment) references some specific problem that this change solved ("any program that links to one of these libraries will load the one from the AMD drivers. This can result in horrible breakage") but I don't know the details, maybe @ambrop72 can clarify? |
The idea is that now for dispatch libraries we either:
That means that for vast majority of cases Regarding the comment - |
@ambrop72 I see now that you already patched some packages to find CUDA, what do you think of my suggestion instead? My argument is that it's less error-prone that way (especially if application links to |
…es." This reverts commit 28a0918.
21619fb
to
5e004da
Compare
Whenever a driver ships with their own version of some library, e.g. llvm, which uses the same sonames as system libraries, if an application both links to a nixpkgs driver and uses the driver (possibly very indirectly), only one of the libraries will end up used, one thing will inevitably use the wrong version. This cannot be solved due to the design of Linux dynamic linking. However, if those driver-specific libraries end up in I understand the arguments about breaking things, but I think the simple solution of sticking with |
Also, if we keep the |
Sorry I missed this, yes the NVidia drivers seem fine. But I still wouldn't want to undo the work that went into getting rid of LD_LIBRARY_PATH because of one case (CUDA) and which only affects a fraction of people. |
Sorry, my fault - I mentioned you here only after I noticed you already patched some software.
My problem is specifically with CUDA: addOpenGLRunpath is a burden to use for all CUDA applications, and it may be very difficult to do correctly (I especially fear stuff linking against libcuda because it will lead to hard to debug errors when driver versions are different).
Regarding number of CUDA users - I think that amongst people with nvidia GPUs it's actually a pretty frequent usecase, as a lot of random software makes use of it (ml libraries, blender, cryptominers, mpv, games etc.), and we only revert this flag back for Nvidia users.
There are also possible edge cases like people dlopening libcuda from interpreters like Python, but there's not a single case that I know of.
Overall my point is - sadly we don't have a single entry point, so instead of plugging holes let's use the sledgehammer.
…On September 2, 2019 8:27:17 PM GMT+03:00, Ambroz Bizjak ***@***.***> wrote:
Also, if we keep the `LD_LIBRARY_PATH` for 19.09 and backport
`addOpenGLRunpath` fixes to 19.09, then the "older applications not
working without LD_LIBRARY_PATH" problem will not be after the upgrade
to the next release, where `LD_LIBRARY_PATH` finally disappears,
assuming one is not using applications older than 19.09.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#67893 (comment)
--
Nikolay.
|
I think the sledgehammer approach is inconsistent with Nix/NixOS philosophy which is focused on purity. Of course we can't be pure here, so the current solution for such problems is to fix individual packages to look in impure places ( By the way I filed a PR to fix Houdini (#67958). Generally I will be glad to fix other programs that I am told about. |
Please note that for big applications, setting |
I suggest that this is merged only to 19.09 and without reverting the CUDA fixes. This would fix the immediate 19.09 issues and leave the most freedom to decide how to proceed - by still having the fixes in 19.09, if the |
I'm still not sure we should do this by patching packages but let's try and see what problems arise. The package I was trying to fix now was I don't think we should merge this for 19.09, rather either merge to post-19.09 unstable or reject. For 19.09 we'd rather leave LD_LIBRARY_PATH always enabled for the time being, there is #67791 for that. |
Yeah that's better because it prevents breakage of packages from user environments after upgrade to 19.09 in cases of using open-source drivers. |
I fixed this bug for tensorflow at least. Let's close this for now. |
It's needed for CUDA driver, which applications are supposed to dlopen() and
which is a driver version-dependent.
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)Tested on Tensorflow which works now. I'll merge in several days unless there are better ideas.