Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-docker/podman: refactor nvidia container runtime support #108862

Merged
merged 22 commits into from Jan 15, 2021

Conversation

cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Jan 9, 2021

Motivation for this change

The motivation for this change was:

  • Shared packaging code between docker, podman, and nvidia container runtime related packages
  • Allow multiple container runtimes to have GPU support enabled
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Additionally, I've tested both docker and podman with the nvidia container runtime in the following ways on a machine with an NVidia RTX 2080 Super:

  1. docker: nvidia-smi using the --runtime nvidia flag
    docker run --runtime nvidia --rm nvidia/cuda nvidia-smi --list-gpus
    GPU 0: GeForce RTX 2080 SUPER (UUID: GPU-16efc9cd-6886-5c12-52f8-2bef887747d9)
    
  2. docker: nvidia-smi using the --gpus all flag
    docker run --gpus all --rm nvidia/cuda nvidia-smi --list-gpus
    GPU 0: GeForce RTX 2080 SUPER (UUID: GPU-16efc9cd-6886-5c12-52f8-2bef887747d9)
    
  3. docker: TensorFlow list_physical_devices using the --runtime nvidia flag
    docker run -e TF_CPP_MIN_LOG_LEVEL=1 -e LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu --runtime nvidia --rm tensorflow/tensorflow:latest-gpu python -c 'import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))'
    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
    
  4. docker: TensorFlow list_physical_devices using the --gpus all flag
    docker run -e TF_CPP_MIN_LOG_LEVEL=1 -e LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu --gpus all --rm tensorflow/tensorflow:latest-gpu python -c 'import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))'
    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
    
  5. podman: nvidia-smi using the --runtime nvidia flag
    podman run --runtime nvidia --rm nvidia/cuda nvidia-smi --list-gpus
    GPU 0: GeForce RTX 2080 SUPER (UUID: GPU-16efc9cd-6886-5c12-52f8-2bef887747d9)
    
  6. podman: TensorFlow list_physical_devices using the --runtime nvidia flag
    podman run -e TF_CPP_MIN_LOG_LEVEL=1 -e LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu --runtime nvidia --rm tensorflow/tensorflow:latest-gpu python -c 'import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))'
    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
    

Copy link
Member

@SuperSandro2000 SuperSandro2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone other than me needs to review the module changes.

@SuperSandro2000
Copy link
Member

This is a semi-automatic executed nixpkgs-review which does not build all packages (e.g. lumo, tensorflow or pytorch)
If you find some bugs or got suggestions for further things to search or run please reach out to SuperSandro2000 on IRC.

Result of nixpkgs-review pr 108862 run on x86_64-linux 1

2 packages built:
  • nvidia-docker
  • nvidia-podman

@cpcloud cpcloud force-pushed the refactor-nvidia-containers branch 2 times, most recently from 46ebdd7 to 37cbd6c Compare January 9, 2021 23:44
@cpcloud
Copy link
Contributor Author

cpcloud commented Jan 11, 2021

@Mic92 @zowoq Can one of y'all review the changes here?

@Mic92 Mic92 merged commit f3042e3 into NixOS:master Jan 15, 2021
@Mic92
Copy link
Member

Mic92 commented Jan 15, 2021

Thanks!

@cpcloud cpcloud deleted the refactor-nvidia-containers branch January 15, 2021 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants