Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openmpi: 1.10.7->3.0.0, add markuskowa as maintainer #34065

Merged
merged 4 commits into from Feb 11, 2018

Conversation

markuskowa
Copy link
Member

@markuskowa markuskowa commented Jan 20, 2018

Motivation for this change

Update to the latest stable version. Added myself as maintainer.

Other changes:

  • update meta data
  • refactored some with
  • added libnl and zlib support
  • enabled test suite (doCheck=true)
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
    Some test fail due to broken tensorflow derivations (Tensorflow build fails on master #31492)
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@grahamc
Copy link
Member

grahamc commented Jan 20, 2018

@GrahamcOfBorg build

Copy link

@GrahamcOfBorg GrahamcOfBorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure for system: aarch64-linux

error: Please be informed that this pseudo-package is not the only part of
Nixpkgs that fails to evaluate. You should not evaluate entire Nixpkgs
without some special measures to handle failing packages, like those taken
by Hydra.

Copy link

@GrahamcOfBorg GrahamcOfBorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure for system: x86_64-darwin

error: Please be informed that this pseudo-package is not the only part of
Nixpkgs that fails to evaluate. You should not evaluate entire Nixpkgs
without some special measures to handle failing packages, like those taken
by Hydra.

Copy link

@GrahamcOfBorg GrahamcOfBorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure for system: x86_64-linux

error: Please be informed that this pseudo-package is not the only part of
Nixpkgs that fails to evaluate. You should not evaluate entire Nixpkgs
without some special measures to handle failing packages, like those taken
by Hydra.

@markuskowa
Copy link
Member Author

markuskowa commented Jan 20, 2018

@grahamc How can I figure out what causes the build problem? The derivation builds on my local NixOS.

meta = {
homepage = http://www.open-mpi.org/;
description = "Open source MPI-2 implementation";
longDescription = "The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.";
maintainers = [ ];
maintainers = with maintainers; [ markuskowa ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be with stdenv.lib.maintainers I think.

This would be fine if in line 48 you had meta = with stdenv.lib; { (which is often done since stdenv.lib is used to access licenses and maintainers). Since no license is given here (maybe there should), you can just fix the with in line 52.

@lsix
Copy link
Member

lsix commented Jan 25, 2018

Hi, thanks for the PR.

I think you could squash some commits (at least c884297 and 7764e2d).

e76684e should probably be renamed openmpi: add markuskowa as maintainer,

And finally, 8d6b322 should be renamed openmpi: add zlib support

@markuskowa markuskowa changed the title openmpi: 1.10.7->3.0.0 openmpi: 1.10.7->3.0.0, add markuskowa as maintainer Jan 25, 2018
@markuskowa
Copy link
Member Author

Thanks for the feedback! I did some cleanup of the commits and the PR message (and title).

@Mic92
Copy link
Member

Mic92 commented Feb 10, 2018

@GrahamcOfBorg build openmpi

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Partial log (click to expand)

shrinking /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/bin/orte-info
shrinking /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/bin/orte-server
shrinking /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/bin/orte-dvm
shrinking /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/bin/ompi_info
shrinking /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/bin/oshmem_info
gzipping man pages under /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/share/man/
strip is /nix/store/5qj61lcvzlap87rf6blvf8p577d482bv-binutils-2.28.1/bin/strip
stripping (with command strip and flags -S) in /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/lib  /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0/bin
patching script interpreter paths in /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0
checking for references to /build in /nix/store/vdx65k8cr0gbw33rzj5rdk9wbssag91l-openmpi-3.0.0...

@FRidh
Copy link
Member

FRidh commented Feb 10, 2018

@GrahamcOfBorg build python3.pkgs.h5py-mpi

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Partial log (click to expand)

shrinking /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/lib/openmpi/mca_allocator_basic.so
shrinking /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/lib/libmca_common_verbs.so.40.0.0
shrinking /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/lib/libmca_common_sm.so.40.0.0
shrinking /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/lib/libopen-pal.so.40.0.0
gzipping man pages under /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/share/man/
strip is /nix/store/xmpjypwjmp2qi1chs5kr0hacnh161ls4-binutils-2.28.1/bin/strip
stripping (with command strip and flags -S) in /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/lib  /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0/bin
patching script interpreter paths in /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0
checking for references to /build in /nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0...
/nix/store/jxzvi5vawybgik3w6man2c4829503h2d-openmpi-3.0.0

@GrahamcOfBorg
Copy link

Success on x86_64-darwin (full log)

Partial log (click to expand)

make[3]: Leaving directory '/private/tmp/nix-build-openmpi-3.0.0.drv-0/openmpi-3.0.0'
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/private/tmp/nix-build-openmpi-3.0.0.drv-0/openmpi-3.0.0'
make[1]: Leaving directory '/private/tmp/nix-build-openmpi-3.0.0.drv-0/openmpi-3.0.0'
post-installation fixup
gzipping man pages under /nix/store/7zmll2i1sy12b3jrf1x587ca51yqbsz6-openmpi-3.0.0/share/man/
strip is /nix/store/5a88zk3jgimdmzg8rfhvm93kxib3njf9-cctools-binutils-darwin/bin/strip
stripping (with command strip and flags -S) in /nix/store/7zmll2i1sy12b3jrf1x587ca51yqbsz6-openmpi-3.0.0/lib  /nix/store/7zmll2i1sy12b3jrf1x587ca51yqbsz6-openmpi-3.0.0/bin
patching script interpreter paths in /nix/store/7zmll2i1sy12b3jrf1x587ca51yqbsz6-openmpi-3.0.0
/nix/store/7zmll2i1sy12b3jrf1x587ca51yqbsz6-openmpi-3.0.0

@GrahamcOfBorg
Copy link

Success on x86_64-darwin (full log)

Partial log (click to expand)


********************************************************************************
Executing cythonize()
/private/tmp/nix-build-python3.6-h5py-2.7.1.drv-0/h5py-2.7.1/h5py/h5p.pyx: cannot find cimported module 'mpi4py.mpi_c'
.....................................................x.........................s.........................................x....................................s................................................................................................................ssssss.................................................................x....x........................x.....x.................................................ssss..............
----------------------------------------------------------------------
Ran 446 tests in 0.634s

OK (skipped=12, expected failures=6)
/nix/store/p8k48wwlrkqa96wspwi66sfs3yddw6g8-python3.6-h5py-2.7.1

@GrahamcOfBorg
Copy link

Failure on aarch64-linux (full log)

Partial log (click to expand)

patching script interpreter paths in /nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1
/nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1/share/hdf5_examples/run-all-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/7fxbh1yhagvwbdrmdyyy5ghcjhwjndhs-bash-4.4-p12/bin/sh"
/nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1/share/hdf5_examples/hl/run-hl-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/7fxbh1yhagvwbdrmdyyy5ghcjhwjndhs-bash-4.4-p12/bin/sh"
/nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1/share/hdf5_examples/hl/c/run-hlc-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/7fxbh1yhagvwbdrmdyyy5ghcjhwjndhs-bash-4.4-p12/bin/sh"
/nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1/share/hdf5_examples/c/run-c-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/7fxbh1yhagvwbdrmdyyy5ghcjhwjndhs-bash-4.4-p12/bin/sh"
/nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1/bin/h5redeploy: interpreter directive changed from " /bin/sh" to "/nix/store/7fxbh1yhagvwbdrmdyyy5ghcjhwjndhs-bash-4.4-p12/bin/sh"
/nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1/bin/h5pcc: interpreter directive changed from " /bin/sh" to "/nix/store/7fxbh1yhagvwbdrmdyyy5ghcjhwjndhs-bash-4.4-p12/bin/sh"
checking for references to /build in /nix/store/3bqpl3rd0z8i8gy4m7ga2jjcskgnilrx-hdf5-1.10.1...
cannot build derivation '/nix/store/jjab2xdv5wnch7ji38cxkkhjcbiby2rm-python3.6-h5py-2.7.1.drv': 1 dependencies couldn't be built
error: build of '/nix/store/jjab2xdv5wnch7ji38cxkkhjcbiby2rm-python3.6-h5py-2.7.1.drv' failed

@GrahamcOfBorg
Copy link

Failure on x86_64-linux (full log)

Partial log (click to expand)

/nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1/bin/h5pcc: interpreter directive changed from " /bin/sh" to "/nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh"
/nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1/bin/h5redeploy: interpreter directive changed from " /bin/sh" to "/nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh"
/nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1/share/hdf5_examples/c/run-c-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh"
/nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1/share/hdf5_examples/run-all-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh"
/nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1/share/hdf5_examples/hl/c/run-hlc-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh"
/nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1/share/hdf5_examples/hl/run-hl-ex.sh: interpreter directive changed from " /bin/sh" to "/nix/store/yq03c2ny43mc24j7dq5riznzb09ddhpq-bash-4.4-p12/bin/sh"
checking for references to /tmp/nix-build-hdf5-1.10.1.drv-0 in /nix/store/fhxz0jp7l0gcz9sz81pd9x2h95byah2l-hdf5-1.10.1...
building of ‘/nix/store/5hj3cr14s8cg0xy54azaaa79sbssdpdb-python3.6-mpi4py-3.0.0.drv’ timed out after 3600 seconds
cannot build derivation ‘/nix/store/lgmg9xkd4majq0kv52gh89jj52wj0qv0-python3.6-h5py-2.7.1.drv’: 1 dependencies couldn't be built
error: build of ‘/nix/store/lgmg9xkd4majq0kv52gh89jj52wj0qv0-python3.6-h5py-2.7.1.drv’ failed

@FRidh FRidh self-assigned this Feb 10, 2018
@markuskowa
Copy link
Member Author

building of ‘/nix/store/5hj3cr14s8cg0xy54azaaa79sbssdpdb-python3.6-mpi4py-3.0.0.drv’ timed out after 3600 seconds
cannot build derivation ‘/nix/store/lgmg9xkd4majq0kv52gh89jj52wj0qv0-python3.6-h5py-2.7.1.drv’: 1 dependencies couldn't be built
error: build of ‘/nix/store/lgmg9xkd4majq0kv52gh89jj52wj0qv0-python3.6-h5py-2.7.1.drv’ failed

I do not really understand why the build fails. Is this a hydra related problem? I can build everything locally.

@grahamc
Copy link
Member

grahamc commented Feb 10, 2018 via email

@vcunat
Copy link
Member

vcunat commented Feb 10, 2018

Sandboxed build seems to hang for me (no CPU is consumed):

building python3.6-mpi4py-3.0.0 (installCheckPhase): [localhost:01386] Set MCA parameter "orte_base_help_aggregate" to 0 to see all

test_spawn.py fails when build with openmpi-3.0.0 in a sandboxed
environment.
@markuskowa
Copy link
Member Author

markuskowa commented Feb 11, 2018

Thanks the for input. I could identify the problem: one particular test in mpi4py (test_spawn.py) fails, when built in a sandboxed environment. This failure is caused by openmpi not being able to communicate with its processes resulting in a hanging build process (this is what caused the hydra eval to time out). This test is only executed for openmpi version 3.0.0 or greater even though the spawn feature is already available in all previous stable openmpi versions. I added a patch that turns off this specific test. This might not be best practice but it solves the problem for now. Since all unit tests still run with sandbox disabled the package could still be considered functional (?).

@FRidh
Copy link
Member

FRidh commented Feb 11, 2018

@markuskowa can you open an issue upstream?

@FRidh FRidh merged commit 35ab636 into NixOS:master Feb 11, 2018
@markuskowa
Copy link
Member Author

I can give it a try although I am not sure how to reproduce the problem in a non-nixos environment. The problem seems to be the sandbox build environment, which openmpi collides with. I'm not sure that openmpi was ever designed to run in a such an restricted environment.

@FRidh
Copy link
Member

FRidh commented Feb 12, 2018

I would expect openmpi to time out if it can't connect to whatever it is connecting.

@markuskowa
Copy link
Member Author

Ah ok, yes the missing time out is a bug.

@markuskowa markuskowa deleted the openmpi3-pr branch May 11, 2018 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants