Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libglvnd, ocl-icd, vulkan-loader: Add /run/opengl-driver(-32) to RUNPATH. #60985

Merged
merged 3 commits into from May 22, 2019

Conversation

ambrop72
Copy link
Contributor

@ambrop72 ambrop72 commented May 5, 2019

Previously we were relying on LD_LIBRARY_PATH to discover vendor driver libraries (libGL, ligGLX, libEGL, OpenCL and Vulkan). This has the problem that setuid programs (in particular VirtualBox) ignore LD_LIBRARY_PATH. Fix it by setting RUNPATH in various dispatch libraries to look in /run/opengl-driver(-32).

Setting this on the VirtualBoxVM binary (which would be simpler) would not solve the problem, the RUNPATH must be on the library which performs the dlopen(). The dlopen man page claims that the RUNPATH from the executable is used but that is incorrect, only RUNPATH from the module performing dlopen() is used.

Note that setting RPATH on the executable would work, but patchelf manipulates RUNPATH and not RPATH (RPATH is a legacy feature).

Fixes #22760
Fixes #18457.

Motivation for this change

3D acceleration in VirtualBox does not work with NVidia drivers when hardening is enabled for VirtualBox (the default) (#22760)

Things done

Tested this on 19.03, with this change there is no error when starting a VM with 3D acceleration enabled and the VM log shows that OpenGL was initialized successfully.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@flokli
Copy link
Contributor

flokli commented May 5, 2019

@ambrop72
Copy link
Contributor Author

ambrop72 commented May 5, 2019

I think so, hope there isn't something else that uses it.

@flokli
Copy link
Contributor

flokli commented May 5, 2019

@abbradar How does that sound to you?

@flokli flokli requested review from abbradar and Mic92 May 5, 2019 14:41
@ambrop72
Copy link
Contributor Author

ambrop72 commented May 6, 2019

Actually I think LD_LIBRARY_PATH may still be needed for libcuda, which does not have a dispatch library (AFAIK). So for now I am hesitant to remove it. For the future we could think up some solution maybe.

@ambrop72
Copy link
Contributor Author

ambrop72 commented May 6, 2019

For CUDA I suppose we could apply the same RUNPATH correction to any code that dlopen's libcuda, but I am not at all familiar with CUDA things. Do apps normally dlopen libcuda directly, or are they linked to it, or is there a standard helper library that takes care of this?

@ambrop72
Copy link
Contributor Author

ambrop72 commented May 6, 2019

Forgot about VDPAU though, where we have libvdpau as the dispatch library. From what I can tell, it does not need LD_LIBRARY_PATH, because we configure it with --with-module-dir=${libGL_driver.driverLink}/lib/vdpau and the code tries to dlopen <module_dir>/lib/vdpau/libvdpau_<vendor>.sh. So it should keep working if we get rid of LD_LIBRARY_PATH.

I will remove that LD_LIBRARY_PATH on my system and see if things break.

@ambrop72
Copy link
Contributor Author

ambrop72 commented May 7, 2019

Added patching of some CUDA things.

With these patches, I have then removed that LD_LIBRARY_PATH settings from opengl.nix and do not see any breakage. KDE Plasma, Chromium, Blender, and Steam+games still work fine on my system with NVIDIA driver.

I suggest we apply the changes in this PR and also remove that LD_LIBRARY_PATH, then we can fix any breakages that people report by patching RUNPATH the same way. I would be really great to get rid of that LD_LIBRARY_PATH (it's a hack and it slows down program startup).

@flokli
Copy link
Contributor

flokli commented May 17, 2019

I gave this a test run on my system (19.03 with these two commits cherry-picked). Also removed adding to LD_LIBRARY_PATH from opengl.nix - no breakages observed - inclined to merge this :-)

@ambrop72, can you add the commit removing from LD_LIBRARY_PATH (plus some changelog entry) to this PR?

It should be safe to backport (only) 73f34cb to 19.03, too. cc @samueldr @lheckemann

@ambrop72
Copy link
Contributor Author

Sorry forgot about this. Well now I know that maintainers can push to the PR branch by default :)
I agree to backport the first commit only, to fix the VirtualBox problem.

Copy link
Member

@abbradar abbradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome idea! Regarding complete drop of LD_LIBRARY_PATH: I'm not sure now if that would break anything else; I'll try to remember what I encountered when I tried to remove it long time ago but don't count on this much. Bottom line is that there were some problems but maybe they are not relevant anymore.

pkgs/applications/misc/blender/default.nix Show resolved Hide resolved
@abbradar
Copy link
Member

abbradar commented May 20, 2019 via email

@flokli
Copy link
Contributor

flokli commented May 21, 2019

@abbradar I'm fine with dropping the commit removing LD_LIBRARY_PATH for now, to not break too much at once :-)

Do you want to give the RUNPATH hook a try?

for program in $out/bin/blender $out/bin/.blender-wrapped; do
isELF "$program" || continue
origRpath=$(patchelf --print-rpath "$program")
patchelf --set-rpath "$origRpath:${libGL_driver.driverLink}/lib" "$program"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to put driverLink path first instead? This way if a package depends on Mesa or nvidia drivers directly it would still load system-wide drivers, not the ones it was linked with -- that's the behavior we arguably expect. It's especially important with nvidia because its driver libraries require exactly the same kernel module version and provide stable ABI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one libGL that can be linked to, that is the one from libglvnd. Neither the mesa_drivers nor nvidia-x11 provide a libGL that can be linked to, only libGL_vendor that is dlopen-ed from libglvnd. I do not see any case where this RPATH adjustment would need to override some default library. Adding the RPATH at the end should reduce the performance impact to normal library loading.

@@ -17,7 +17,7 @@ stdenv.mkDerivation rec {
sha256 = "0zhrwj1gi90x2w8gaaaw5h4b969a8gfy244kn0drrplhhb1nqz3b";
};

nativeBuildInputs = [ pkgconfig ];
nativeBuildInputs = [ pkgconfig patchelf ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patchelf is included in stdenv for Linux by default so no need to add it here.

@abbradar
Copy link
Member

@flokli I implemented a setup hook and added it everywhere. I only manually tested libglvnd for now.

@abbradar
Copy link
Member

@flokli Regarding LD_LIBRARY_PATH: I'm very much in favor of dropping it but have a gut feeling it would break something ;) I'll run a rebuild with this on my machine later. If it works okay too we could merge and fix breakages as they appear. @vcunat maybe you remember any problems w.r.t. replacing LD_LIBRARY_PATH with RUNPATH? We discussed LD_LIBRARY_PATH and OpenGL before.

@abbradar
Copy link
Member

abbradar commented May 21, 2019 via email

@ambrop72
Copy link
Contributor Author

I didn't mean libGL, rather any drivers themselves. Nvidia has this problem as I mentioned if one has nvidia_x11 in their RUNPATH for some reason, but also mesa_drivers. I think there were several cases where packages were linked to mesa_drivers or nvidia_x11 but I'll recheck this.

Well I didn't see any issue with appending RUNPATH. For libglvnd and other dispatch libraries the only thing that matters is that dlopen will find libraries like libGL_vendor and there is only going to be one package that has those, due to the name.

@abbradar
Copy link
Member

abbradar commented May 21, 2019 via email

@ambrop72
Copy link
Contributor Author

Yeah, I'll build and check a few things as well.

@ambrop72 ambrop72 force-pushed the virtualbox-nvidia-accel-fix branch from 0ef193d to 547c6b6 Compare May 21, 2019 16:17
@ambrop72
Copy link
Contributor Author

Tested that all packages with these R(UN)PATH adjustments build and get the correct R(UN)PATH (using readelf -d): libglvnd, ocl-icd, vulkan-loader, cudatoolkit, cudnn, nccl, blender(cudaSupport=true). Also cudatoolkit correctly gets RPATH instead of RUNPATH.

I will now rebase locally on top of channels/nixos-unstable and rebuild and test with my system (including the LD_LIBRARY_PATH removal).

@ambrop72
Copy link
Contributor Author

While testing this I got hit by SDDM breakage in nixos-unstable (not caused by this, happens also on vanilla nixos-unstable, #61840). So I used LightDM instead, but with LightDM one still gets something in LD_LIBRARY_PATH (see comment here).

I will try removing that xserver LD_LIBRARY_PATH and fixing resulting breakage, but cannot do it for all drivers. But I don't think we should delay merging this (minus the LD_LIBRARY_PATH removal which can be done layer once more testing was done without it).

@flokli
Copy link
Contributor

flokli commented May 22, 2019

@ambrop72 agreed. Could you squash things together a bit and drop 2f056af?

abbradar and others added 3 commits May 22, 2019 18:19
This hook allows to add NixOS driver libraries path to given ELF
objects' RUNPATH. We use it instead of settings RUNPATH manually
everywhere. It must be invoked in postFixup so that RUNPATH stripping
does not remove the path.

It puts the path first instead of last so that system-wide drivers
are always preferred.
Previously we were relying on LD_LIBRARY_PATH to discover driver libraries (libGL, ligGLX, libEGL, OpenCL and Vulkan). This has the problem that setuid programs (in particular VirtualBox) ignore LD_LIBRARY_PATH. Fix it by setting RUNPATH in various dispatch libraries.

This is not needed for libvdpau because it is already configured to look for libraries in the driver paths.

Fixes NixOS#22760.
This is to avoid relying on LD_LIBRARY_PATH for finding the CUDA driver libraries.
@ambrop72 ambrop72 force-pushed the virtualbox-nvidia-accel-fix branch from 547c6b6 to 28a0918 Compare May 22, 2019 16:36
@ambrop72
Copy link
Contributor Author

Rearranged commits, dropped the LD_LIBRARY_PATH removal. I think this is ready.

@flokli
Copy link
Contributor

flokli commented May 22, 2019

Thanks!

@flokli flokli merged commit 2ed6903 into NixOS:master May 22, 2019
# actually sets RUNPATH not RPATH, which applies only to dependencies of the binary
# it set on (including for dlopen), so the RUNPATH must indeed be set on these
# libraries and would not work if set only on executables.
addOpenGLRunpath() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a no-op on macOS. We can link to the OpenGL framework there I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a no-op, or we should link to the OpenGL framework?

@nixos-discourse
Copy link

This pull request has been mentioned on Nix community. There might be relevant details there:

https://discourse.nixos.org/t/software-rendering-using-mesa-not-possible/3023/8

# Set RUNPATH so that driver libraries in /run/opengl-driver(-32)/lib can be found.
# See the explanation in addOpenGLRunpath.
postFixup = ''
addOpenGLRunpath $out/lib/libvulkan.so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed.

Vulkan Loader doesn't need the openGLRunpath hardcoded because it will look up libraries not in the opengl path but in whatever the ICD json points to. So vulkan will never look in /run/opengl-driver/lib,
it will instead look in /nix/store/y99l7729c0157a52y9qfzi3zg4f94b57-mesa-noglu-18.3.4-drivers/lib/libvulkan_intel.so because the ICD in /run/opengl-driver/share/vulkan/icd.d says so

cat /run/opengl-driver/share/vulkan/icd.d/intel_icd.x86_64.json 
{
    "ICD": {
        "api_version": "1.1.90",
        "library_path": "/nix/store/y99l7729c0157a52y9qfzi3zg4f94b57-mesa-noglu-18.3.4-drivers/lib/libvulkan_intel.so"
    },
    "file_format_version": "1.0.0"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently needed for NVidia because the library_path is only a name:

$ cat /run/opengl-driver-32/share/vulkan/icd.d/nvidia.json 
{
    "file_format_version" : "1.0.0",
    "ICD": {
        "library_path": "libGLX_nvidia.so",
        "api_version" : "1.1.99"
    }
}

I think we should patch it and then remove the runpath.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. any other drivers that would need similar fixups? ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #62870 to set absolute paths in NVidia ICD files.
Also related:

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/building-with-cuda-on-nixpkgs/4462/8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants