Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia: Preliminary nVidia/AMD PRIME and dynamic power management support #100519

Merged
merged 2 commits into from Jan 29, 2021

Conversation

Baughn
Copy link
Contributor

@Baughn Baughn commented Oct 14, 2020

Motivation for this change

This permits nVidia PRIME to be used in AMD/nVidia configurations, such as on the Zephyrus G14. It also adds a flag for dynamic power management, which should (when fully baked) allow the nVidia dGPU to be fully powered off while unused.

Caveats:

  • Most (non-Intel?) machine configurations are not supported by the Dynamic Power Management code yet. This includes the G14. Enabling it in fact increases power draw slightly. (The same configuration is hilariously buggy on Windows. nVidia appears to have made the safe choice in not permitting it on Linux.)

  • Including "nvidia" in xserver.videoDrivers is required, and should probably be done by default. This PR is only concerned with supporting the G14's hardware, but I think that would be a good idea. However:

  • videoDrivers isn't deduplicated, and each entry creates a separate Device section in xorg.conf. This means including "amdgpu" or "modesetting" in videoDrivers breaks PRIME in an unobvious fashion, permitting X11/Wayland to appear to work, but causing any invocation of the dGPU to throw GLX errors.

  • Some people may genuinely have multiple GPUs of the same brand, so unconditionally deduplicating them is no good either.

This is all equally true for Intel-based PRIME, and has no bearing on the PR. Just food for thought.

Tested on a Zephyrus G14.

  • With nVidia disabled: 6.5W power draw.
  • With nVidia enabled (via PRIME offloading): 12W power draw.

This roughly matches Windows. (Insert rant on malfunctioning dynamic power management here. I'm sure it'll all be in order in a couple of months.)

Not tested, because my hardware doesn't support it:

  • Sync mode.
  • Dynamic power management, beyond "Does it obviously break anything". It's marked 'experimental' for a reason...
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Copy link
Member

@eadwu eadwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skim

@@ -63,6 +63,15 @@ in
'';
};

hardware.nvidia.powerManagement.finegrained = mkOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTD3 power management is experimental so I would probably not include it. Nor is it a pain to set up as probably editing Xorg configuration so it might be better to exclude it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen several people ask about it, it's supposed to work with Intel iGPUs, and it's also supposed to work for APUs "shortly". I thought I'd get a head start on supporting it.

It's still marked as experimental in the documentation, but the code here shouldn't need any change as it stabilizes, except perhaps to remove the udev exclusions.

display = offloadCfg.enable;
modules = optional (igpuDriver == "amdgpu") [ pkgs.xorg.xf86videoamdgpu ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? If you have a AMD GPU this should be prerequisite. I'm not familiar with this but isn't there an open source and closed source driver. Is it compatible with both and/or conflicts are settled if both are included.

Copy link
Contributor Author

@Baughn Baughn Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proprietary driver is amdgpu-pro. It doesn't work basically at all, for anyone -- perhaps a slight exaggeration, but I'd be astonished to see it in use.

The modesetting driver is bundled with xorg; amdgpu isn't. The usual user interface for fixing that is adding it to videoDrivers, but as I've explained, doing so will break PRIME. It has to be added to modules, or the AMD gpu won't work at all.

Adding driver modules this way does nothing by itself, so a user who explicitly wants to install amdgpu-pro should be able to do so. Though I don't think that would work in a PRIME configuration, and this explicitly selects the amdgpu driver elsewhere. Note that the module doesn't let Intel users choose the intel driver instead of the modesetting one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, alright I guess, but that doesn't seem to stop anyone from adding "amdgpu" into videoDrivers which would end up breaking it still based off what I've understood, which is probably the way most people would start to avoid to get the display working first.

If I've understood correctly, an assert to make sure amdgpu isn't in videoDrivers should be added if someone uses PRIME.

It's not that Intel users can't choose the intel xf86video driver, but the docs explicitly state that it's for modesetting, so it probably wouldn't work anyway if they chose to use the old driver. Though the Arch wiki states AMD GPU's are supported based off the docs, the link doesn't seem to show any proof of that (unless amdgpu implements modesetting but as it's own driver in which case it would be implied).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amdgpu implements modesetting, yes.

'' + optionalString offloadCfg.enable ''
Option "AllowNVIDIAGPUScreens"
'';

services.xserver.displayManager.setupCommands = optionalString syncCfg.enable ''
# Added by nvidia configuration module for Optimus/PRIME.
${pkgs.xorg.xrandr}/bin/xrandr --setprovideroutputsource modesetting NVIDIA-0
${pkgs.xorg.xrandr}/bin/xrandr --setprovideroutputsource ${igpuDriver} NVIDIA-0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration looks to be basically the same for Intel and AMD, so not sure why you're adding logic?

Copy link
Contributor Author

@Baughn Baughn Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AMD config doesn't use the modesetting driver, and calling it that would be misleading.

Granted, Sync doesn't work on my hardware at all. I have no way of testing this, so if you believe it should e unconditionally be "modesetting" I'll remove the change.

Copy link
Member

@eadwu eadwu Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arch wiki uses radeon instead of amdgpu in it's example, though I'm not sure what it is for AMD. This should just be a projection from different sinks from xrandr --listproviders, i.e., Provider 0 -> 1

Provider 0: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 4 associated providers: 0 name:modesetting
Provider 1: id: 0x258 cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-G0

I'm not sure about naming, if they both use modesetting internally then I would leave it as modesetting otherwise it's fine to leave it.

@emmanuelrosa
Copy link
Contributor

Firstly, thank you @Baughn for putting this together. I'm using this PR to get NVIDIA PRIME working on my HP Pavilion Gaming 15-ec1047ax.

I have an AMD Renoir-based iGPU and an NVIDIA GeForce GTX 1650 dGPU. The internal display is connected to the iGPU and the HDMI port is connected to the dGPU. Here's my annotated configuration:

{ config, pkgs, lib, ... }:

{
 
  boot = {
    # Support for the iGPU was added in Linux 5.10
    kernelPackages = pkgs.linuxPackages_latest;
    # The nouveau driver causes xorg to crash.
    blacklistedKernelModules = [ "nouveau" "nvidiafb" ];
    
  environment.systemPackages = with pkgs; [
    # A script to utilize offloading rendering to the dGPU.
    (pkgs.writeShellScriptBin "nvidia-offload" ''
      export __NV_PRIME_RENDER_OFFLOAD=1
      export __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0
      export __GLX_VENDOR_LIBRARY_NAME=nvidia
      export __VK_LAYER_NV_optimus=NVIDIA_only
      exec -a "$0" "$@"
    '')  
  ];

  # Enable sound via ALSA. PulseAudio doesn't work well on this system.
  # Set the default audio device to the integrated interface, rather than NVIDIA's HDMI.
  sound = {
    enable = true;
    extraConfig = ''
      defaults.pcm.card 1
      defaults.ctl.card 1
    '';
  };

  environment.variables = {
    # VAAPI and VDPAU config for accelerated video.
    # Using the iGPU (AMD) because I can't get it to work with the dGPU.
    "VDPAU_DRIVER" = "radeonsi";
    "LIBVA_DRIVER_NAME" = "radeonsi";
  };
 
  hardware.nvidia.prime = {
    offload.enable = true;
    nvidiaBusId = "1@0:0:0";
    amdgpuBusId = "5@0:0:0";
  };

  services.xserver = {
    enable = true;
    # Note: Adding the amdgpu driver here will cause xorg to crash.
    videoDrivers = [ "nvidia" ];
  };
}
  • I've used the nvidia-offload script with the Brave and qutebrowser while rending WebGL content. I confirmed the dGPU was being used with nvidia-smi.
  • I've used the nvidia-offload script with VLC to get hardware-accelerated rending (via OpenGL). I don't think hardware-accelerated video decoding is working.
  • I've used hardware-accelerated video encoding with openshot, using nvencode.
  • Running xrandr --setprovideroutputsource [NVIDIA dGPU] [AMD iGPU] enables the HDMI port on the dGPU. But I have yet to successfully use an HDMI display.

I'm summary, I'd like for this to be merged ;)

@veprbl veprbl requested a review from eadwu January 29, 2021 15:10
Copy link
Member

@eadwu eadwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, untested since I don't have AMD.

@veprbl veprbl merged commit c9f8884 into NixOS:master Jan 29, 2021
message = "Sync precludes powering down the NVIDIA GPU.";
}
{
assertion = cfg.powerManagement.enable -> offloadCfg.enable;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be cfg.powerManagement.finegrained instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -205,6 +245,7 @@ in
''
BusID "${pCfg.nvidiaBusId}"
${optionalString syncCfg.allowExternalGpu "Option \"AllowExternalGpus\""}
${optionalString cfg.powerManagement.finegrained "Option \"NVreg_DynamicPowerManagement=0x02\""}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the docs here, this is in the wrong spot. this option should be set on the nvidia kernel module during it's initialization. it's not an Xorg option for initializing the device:

[this feature] can be enabled or disabled via the NVreg_DynamicPowerManagement nvidia.ko kernel module parameter.

this setting also defaults to on for Ampere and newer cards as of this driver version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #174057

@jian-lin
Copy link
Contributor

A related PR: #174058

Reviews are welcome,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants