Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/slurm: Improve slurm configuration options and features. #41377

Merged
merged 6 commits into from Jun 2, 2018

Conversation

markuskowa
Copy link
Member

Motivation for this change

On the way to achieve the full potential of Slurm.
Some cleanup in the module to allow for secondary configure files (e.g. cgroup.conf) that need to be in the same directory as slurm.conf. Added more libraries to the packages. Slurm can now run X11 support natively. The man pages are now available when slurm services are enabled in the module.

CC @joachifm @lsix @jagajaga

Things done

Summary of changes:

  • add freeipmi to get power meter readings
  • readline support for scontrol
  • libssh2 support for native X11 support
  • Add note to enableSrunX11 in module
  • fix hwloc support (was not detected by configure)
  • Add man pages to wrapedSlurm
  • Added extraPlugstackConfig parameter. User supplied spank modules are possible
  • Added option enableStools (submit jobs from host with no daemon enabled)
  • Added cgroup.conf to module
  • Update test, add pure submit host to test 'enableStools'
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
    Test runs (nixos/tests/slurm.nix), tested the cgroup feature on a life system (by backporting to 18.03)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

* Updated SrunX11 option
* Added extraPlugstackConfig parameter
* Added option enableStools
* Add cgroup.conf to module
* Fix some typos
The nixos module adds a new derivation to
systemPackages to make sure that the binaries
get the generated config file. This derivation
did not contain the man pages so far.
Activating the module now makes the man pages
available in the system environment.
* add freeipmi to get power meter readings
* readline support for scontrol
* libssh2 support for X11 supporta
* Add note to enableSrunX11 in module
* fix hwloc support (was detected by configure)
* Add pure submit host to test 'enableStools'
* Disable client.enable on control machine
@@ -142,7 +186,7 @@ in
wrappedSlurm = pkgs.stdenv.mkDerivation {
name = "wrappedSlurm";

propagatedBuildInputs = [ cfg.package configFile ];
propagatedBuildInputs = [ cfg.package etcSlurm ];
Copy link
Contributor

@joachifm joachifm Jun 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of propagated build inputs here? If it is merely to ensure that they end up in the runtime closure, referencing them in the wrapper suffices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good question. The wrappedSlurm derivation only contains the wrappers (and now the man pages). Is the propagatedBuildInputs = [ cfg.package ] even necessary here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what buildEnv does with propagated build inputs, maybe it does work as intended, though I'd expect propagatedUserEnvPkgs to be more appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propagatedBuildInputs does not propagate the paths into the new package. That's the reason why the man pages were missing. I am not sure how propagatedUserEnvPkgs would work here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buildEnv builder adds packages declared in propagatedUserEnvPkgs to the resulting env, so it should work as if you had added the underlying slurm package to system packages with lower priority (than the wrapper).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a mkDerivation and the builder is replaced by the custom builder script. Would propagatedUserEnvPkgs still work in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, no ... I hadn't noticed that it uses a custom builder ... I also see that it explicitly links in the manpage so propagation would be pointless anyway :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases, propagatedUserEnvPkgs is the correct choice, however, but ofc. requires that the builder at least runs the standard fixup phase or something like it.

Copy link
Member Author

@markuskowa markuskowa Jun 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the conclusion is that we can remove the line propagatedBuildInputs = [ cfg.package configFile ]; entirely? I just ran the test without it. It still works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, given that it doesn't run the fixup phase, it made no difference to begin with

description = ''
Wether to provide a slurm.conf file.
Enable this option if you do not run a slurm daemon on this host
(i.e. <literal>server.enable</literal> and <literal>client.enable</literal> are <literal>false</literal>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the default for this option based on whether those other options are set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if server || client enabled then set enableStools default to false? That can be done, but it would have no effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says to set it to true if the others are false, seems like that could be done in the default directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slurm module does not have a global services.slurm.enable option. Setting enableStools to default true would enable slurm for every nixos user out there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean now.

@joachifm
Copy link
Contributor

joachifm commented Jun 2, 2018

@GrahamcOfBorg test slurm

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: tests.slurm

Partial log (click to expand)

test script finished in 74.89s
cleaning up
killing node3 (pid 593)
killing submit (pid 605)
killing control (pid 617)
killing node2 (pid 629)
killing node1 (pid 641)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/c9krbj32j6n2p6b6jwvknjfl52xmzjyb-vm-test-run-slurm

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: tests.slurm

Partial log (click to expand)

test script finished in 27.55s
cleaning up
killing node3 (pid 627)
killing node1 (pid 640)
killing submit (pid 653)
killing node2 (pid 666)
killing control (pid 679)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/cxdj3740zzjxw8bq2ks427swihykg7ki-vm-test-run-slurm

propagatedBuildInputs = [ cfg.package etcSlurm ]; had no effect
@joachifm joachifm merged commit c30bd1c into NixOS:master Jun 2, 2018
@joachifm
Copy link
Contributor

joachifm commented Jun 2, 2018

Thank you

@markuskowa markuskowa deleted the slurm-ext-pr branch June 2, 2018 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants