Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor NVidia drivers #22304

Merged
merged 3 commits into from Feb 10, 2017
Merged

Refactor NVidia drivers #22304

merged 3 commits into from Feb 10, 2017

Conversation

abbradar
Copy link
Member

@abbradar abbradar commented Jan 31, 2017

Motivation for this change

Make several improvements to nvidia drivers on NixOS:

Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

Sadly I can't fully test this myself because at some point my laptop lost ability to switch to dedicated-only card in BIOS so I can't use nvidia without intel/bumblebee now. Bumblebee works fine though, both optirun and primusrun.

@mention-bot
Copy link

@abbradar, thanks for your PR! By analyzing the history of the files in this pull request, we identified @edolstra, @vcunat and @wkennington to be potential reviewers.

@grahamc
Copy link
Member

grahamc commented Jan 31, 2017

What laptop do you have? ps: if you rebase this against master, https://prs.nix.gsc.io/eval/857 will build the nixos/release.nix + tests.

@abbradar
Copy link
Member Author

@grahamc Lenovo T440p. I think at some point option to force NVidia only was there, but it's gone now (I regularly update BIOS so that may be the reasion).

Would Hydra build unfree packages and their reverse dependencies? Also, will use that chance to say thank you very much for your work on Hydra for PRs ^_^.

@grahamc
Copy link
Member

grahamc commented Jan 31, 2017

Hmm... I wish it would. I'm working on having this included in hydra.nixos.org. Unfortunately, I don't want to set a precedent that it'll build unfree :(

@grahamc
Copy link
Member

grahamc commented Jan 31, 2017

Also, you're welcome, I'm glad to make progress on this! :)

@abbradar
Copy link
Member Author

abbradar commented Jan 31, 2017

Unfortunately, I don't want to set a precedent that it'll build unfree :(

That's completely fine, on the contrary -- I'd be surprised if you did enable this :D

inherit version useGLVND;
inherit (stdenv) system;

outputs = [ "out" ] ++ optional (!libsOnly) "bin";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention has been to put the output containing binaries as the first output (it changed at one point).

Also I didn't check too closely, but doesn't splitting to multiple outputs remove the need for all the libsOnly logic?

@abbradar
Copy link
Member Author

abbradar commented Jan 31, 2017 via email

@dezgeg
Copy link
Contributor

dezgeg commented Jan 31, 2017

Hmm, what's the benefit of splitting this to multiple outputs then? In what situations you end up with the built system depending on only one of the outputs?

@abbradar
Copy link
Member Author

abbradar commented Jan 31, 2017

@dezgeg Never ;) The point of splitting is to keep logically separate parts separate. This way we don't have kernel modules and OpenGL libraries in system environment, for example.

BTW is there a way to enforce current convention (binary outputs primary) when the binary output doesn't always exist?

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

I can test on a notebook hard-wired to nvidia.

@abbradar
Copy link
Member Author

@vcunat would be cool!

BTW (offtopic, teaser) I have an in-progress branch that moves us to libglvnd completely (so mesa_noglu is used only for drivers and libOSMesa). However I'll hold it until some other distro makes this move and irons the wrinkles (IIRC Fedora already started) -- Mesa still doesn't support EGL and GLES via libglvnd, and even GLX is still very fresh.

Hopefully when AMDGPU-PRO also moves to libglvnd we can finally get rid of LD_LIBRARY_PATH!

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

@abbradar: my older libglvnd WIP if you find some useful bits... but I expect most work will be in finishing the details, especially if we were to implement putting the C symbols into a separate linker namespace.

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

I originally paused the effort for AMD to catch up; that might be a long wait, so waiting for at least some larger distro might save some work.

@abbradar
Copy link
Member Author

Yep, I think there are also more drivers that won't support libglvnd any time soon (maybe VMware/some other proprietary VMs?) so we can't drop LD_LIBRARY_PATH support altogether, but waiting for Mesa and some distro to make the move first seems a good idea. Then we can at least link everything to libglvnd.

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

I think we can link anyway to glvnd-provided libGL shim during build-time and then just put another libGL on $LD_LIBRARY_PATH instead, if the driver requires it.

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

I'm specifically looking forward to mesa updates not causing mass rebuilds.

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

Driver seems OK, but nvidia-settings can't find any GTK.

@vcunat
Copy link
Member

vcunat commented Jan 31, 2017

The libnvidia-gtk3.so that gets to /run/opengl-driver/lib can't find most of libs. Strangely there's some such library in nvidia-settings-375.26/lib also but it's got good linkage and it's missing version symlinks and executable bit.

@abbradar
Copy link
Member Author

abbradar commented Feb 1, 2017

Ah, during some round of final refactoring I lost rm libnvidia-gtk* (we build it from source now). Try again please!

@abbradar
Copy link
Member Author

abbradar commented Feb 5, 2017

Resolved conflicts.

@vcunat
Copy link
Member

vcunat commented Feb 10, 2017

The settings app does work now for me, and the rest has been working OK the whole time - now even with xorg-server-1.19.1. I don't use any special stuff like vulkan or CUDA.

Anything else to wait for? It doesn't seem likely to get much more testing until it's forced on unstable/master users :-)

@abbradar
Copy link
Member Author

Actually I was just waiting for you to confirm that settings now work -- we got a deadlock :D

@abbradar abbradar merged commit 442b4d6 into NixOS:master Feb 10, 2017
@vcunat vcunat mentioned this pull request Feb 11, 2017
56 tasks
@vcunat
Copy link
Member

vcunat commented Feb 11, 2017

JFTR, some of this bunch of lines from journalctl seem fishy:

kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
systemd-modules-load[509]: Inserted module 'nvidia_uvm'
systemd-udevd[591]: Process '/nix/store/b3hnjqy2py1hbh364i6jj6jmh9ajm79l-bash-4.4-p5/bin/bash -c 'mknod -m 666 /dev/nvidia-uvm c $(grep nvidia-uvm /proc/devices | cut -d \  -f 1) 0'' failed with exit code 1.
kernel: nvidia-modeset: Allocated GPU:0 (GPU-4e7b3168-e136-eb9b-a86c-613b8e52234e) @ PCI:0000:01:00.0
nvidia-persistenced[3690]: Failed to create directory /var/run/nvidia-persistenced: Permission denied
nvidia-persistenced[3690]: Unable to access /var/run/nvidia-persistenced: No such file or directory
nvidia-persistenced[3690]: Shutdown (3690)
nvidia-settings[3713]: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files
xsession[1274]: ERROR: nvidia-settings could not find the registry key file. This file
xsession[1274]:        /usr/share/nvidia/nvidia-application-profiles-375.26-key-documentati
xsession[1274]:        /usr/share/nvidia/nvidia-application-profiles-key-documentation. The
nvidia-settings[3713]: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg.
nvidia-settings[3713]: Allocating size to CtkWindow 0xb462a0 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?
nvidia-settings[4815]: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files
xsession[1274]: ERROR: nvidia-settings could not find the registry key file. This file
xsession[1274]:        /usr/share/nvidia/nvidia-application-profiles-375.26-key-documentati
xsession[1274]:        /usr/share/nvidia/nvidia-application-profiles-key-documentation. The
nvidia-settings[4815]: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg.
nvidia-settings[4815]: Allocating size to CtkWindow 0x24922a0 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?

@abbradar
Copy link
Member Author

@vcunat Those look interesting:

  1. I'm confused at what tries to run nvidia-persistenced. I have packaged it but without a NixOS module and such -- it is supposed to be a daemon which keeps CUDA state in memory or something like that;
  2. Paths for application profiles need to be fixed, I'll look into that.

@abbradar
Copy link
Member Author

Can you track at which moment nvidia-persistenced is run? I.e. during start of user session, when you start settings, during boot...

@vcunat
Copy link
Member

vcunat commented Feb 11, 2017

I think I invoked it from my user session by a mistake.

@vcunat
Copy link
Member

vcunat commented Feb 11, 2017

Yes, logs confirm it wasn't done during startup.

Now I also noticed:

xsession[1274]: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GLContext is disabled due to a previous crash.|.... (repeats of all the same)

@abbradar
Copy link
Member Author

abbradar commented Feb 11, 2017 via email

@vcunat
Copy link
Member

vcunat commented Feb 11, 2017

No idea yet. After switching xorg-server to 1.19.1, my firefox shows github's top bar with black background instead of light, though that might be unrelated. I keep watching the system, but so far I've felt no problems.

@abbradar
Copy link
Member Author

abbradar commented Feb 11, 2017 via email

@grahamc
Copy link
Member

grahamc commented Feb 11, 2017

The black bar has been present in GitHub Enterprise for months, apparently they've moved it over to public GitHub. FWIW.

@abbradar
Copy link
Member Author

Oh, that explains it, thanks.

@vcunat
Copy link
Member

vcunat commented Feb 11, 2017

:-D I tried on chromium and didn't have the bar. I assumed logging in won't change it, but I was wrong.

@dezgeg
Copy link
Contributor

dezgeg commented Feb 11, 2017

vcunat added a commit that referenced this pull request May 30, 2017
Fixes #26250.  This is fallout from PR #22304.
It's null for 304 and 173 legacy drivers.

(cherry picked from commit bc7b895)
vcunat added a commit that referenced this pull request May 30, 2017
Fixes #26250.  This is fallout from PR #22304.
It's null for 304 and 173 legacy drivers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants