Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haskell: re-enable aarch64, but disable parallel builds on that arch. #47901

Merged
merged 1 commit into from Oct 7, 2018
Merged

haskell: re-enable aarch64, but disable parallel builds on that arch. #47901

merged 1 commit into from Oct 7, 2018

Conversation

dhess
Copy link
Contributor

@dhess dhess commented Oct 5, 2018

This is a workaround for unreliable parallel Haskell builds on aarch64. See https://ghc.haskell.org/trac/ghc/ticket/15449

Motivation for this change

ghc843 bootstraps fine on aarch64 if you disable parallel builds and are willing to wait awhile. On the aarch64 NixOS community builder, it takes about 9 hours to build (and takes only 1 core, obviously).

Many important haskellPackages also build successfully with this work-around.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

This is a workaround for unreliable parallel Haskell builds on
aarch64. See https://ghc.haskell.org/trac/ghc/ticket/15449
@@ -48,7 +48,9 @@ in
# We cannot enable -j<n> parallelism for libraries because GHC is far more
# likely to generate a non-determistic library ID in that case. Further
# details are at <https://github.com/peti/ghc-library-id-bug>.
, enableParallelBuilding ? (stdenv.lib.versionOlder "7.8" ghc.version && !isLibrary) || stdenv.lib.versionOlder "8.0.1" ghc.version
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using this to force a section break or should this line be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a leftover from a rebase. If the owners agree with the substantive changes, I'll fix it.

@samueldr
Copy link
Member

samueldr commented Oct 7, 2018

If the change is deemed good, I would gladly backport for ZHF #45962.

As far as I'm concerned, don't know the haskell infra at all, but the change seems isolated enough to not cause issues; it only tweaks conditionals for parallel building.

@samueldr samueldr merged commit d8d8584 into NixOS:master Oct 7, 2018
@dhess
Copy link
Contributor Author

dhess commented Oct 7, 2018

Sweet, thank you! It'll be so nice to have aarch64 Haskell packages in Hydra!

@dhess dhess deleted the ghc-aarch64 branch October 7, 2018 19:26
@franckrasolo
Copy link

Hi @dhess,

The last comments regarding the non-deterministic failure on aarch64 seem to suggest that enableParallelBuilding could remain true for Cortex-A53 cores.

What ARM core(s) do you use for your aarch64 GHC builds?

Based on the above and your (much appreciated) earlier gists, I'm currently building ghc802 on an hexa-core (A53) QEMU/Ubuntu 16.04 guest with 16GB of RAM.

Given how long it takes to build both compiler and the full package set for any given version of GHC, and until Hydra has all aarch64 Haskell packages, I'm planning on setting up a binary cache on Cachix next. I might name it aarch64-ghc.cachix.org or more generally aarch64.cachix.org.

Would you be interested in pushing your GHC builds there too?

@dhess
Copy link
Contributor Author

dhess commented Oct 8, 2018

Hi @franckrasolo,

I also inferred from that comment on the GHC Trac that this is probably a memory ordering bug that only manifests on out-of-order cores. I'm currently using the NixOS community aarch64 builder, which is, according to @grahamc's comment here, a Hi1616. According to WikiChip, that CPU is based on Cortex-A72 IP and would probably be vulnerable to the bug.

Re: Cachix, I may be interested in that, but I'd prefer to get the official NixOS Hydra building haskellPackages for aarch64 now that this fix is in master. In fact, as of trunk evaluation 1482713, you can see that the official Hydra is now trying to build Haskell stuff on aarch64:

https://hydra.nixos.org/eval/1482713#tabs-new

Unfortunately, as you can see from here, GHC 8.4.3 didn't finish, because the build ran out of time in stage1. It looks like the Hydra gives up on a job after 10 hours, but maybe we can convince the admins to bump that for aarch64 builds in the interest of generating a working GHC.

Anyway, let's circle back on this in a week or so after we've given the Hydra admins time to consider the impact of this commit on the aarch64 builder workload.

@franckrasolo
Copy link

Re: Cachix, I may be interested in that, but I'd prefer to get the official NixOS Hydra building haskellPackages for aarch64 now that this fix is in master.

I'd prefer that too, ideally.

In fact, as of trunk evaluation 1482713, you can see that the official Hydra is now trying to build Haskell stuff on aarch64:

https://hydra.nixos.org/eval/1482713#tabs-new

Unfortunately, as you can see from here, GHC 8.4.3 didn't finish, because the build ran out of time in stage1. It looks like the Hydra gives up on a job after 10 hours, but maybe we can convince the admins to bump that for aarch64 builds in the interest of generating a working GHC.

Well, it's somewhat encouraging that aarch64 builds of nixpkgs/trunk are happening again.

Is the error below, taken from the log of one of the ghc822 builds, the cause or the result of the timeout?

ghc: failed to create OS thread: Resource temporarily unavailable
make[1]: *** [libraries/ghc-boot/ghc.mk:3: libraries/ghc-boot/dist-boot/build/GHC/PackageDb.o] Error 1
make: *** [Makefile:125: all] Error 2
builder for '/nix/store/8rbmj6xlfndcrma77b6swmyy9hdwiz7p-ghc-8.2.2.drv' failed with exit code 2

If the latter, a 10-hour timeout at perhaps at best half-way through stage1 would suggest that it might take over a day or so for any given version of GHC to build successfully from source.

Could the OfBorg admins exceptionally allow just one specific build (say ghc822) or at least one evaluation of a jobset without timeout? Knowing how long non-binary GHC builds take on the shiny new aarch64 builder would obviously help tweak the timeout value. Interestingly, build durations for other CPU architectures vary a fair bit, taking anything from 50+ minutes up to a little over 5 hours.

Anyway, let's circle back on this in a week or so after we've given the Hydra admins time to consider the impact of this commit on the aarch64 builder workload.

Agreed.

@ElvishJerricco
Copy link
Contributor

As I mentioned in the other thread, I'm not sure that GHC itself needs to be built with enableParallelBuilding = false;, as the GHC bug only seems to pertain to ghc -j, which is not used when building GHC. GHC just uses make. Enabling parallel building for GHC itself could help with the timeout issues the job has had, though I have no idea what's causing the make[1]: fork: Resource temporarily unavailable error.

@dhess
Copy link
Contributor Author

dhess commented Oct 14, 2018

@ElvishJerricco Before this fix, I wasn't able to get a working GHC built for aarch64, but perhaps I was conflating the -j issue in the GHC Trac with something else; I was not aware that the GHC build system doesn't use -j'.

I've created a new fork with enableParallelBuilding = true for GHC only, and I'll test this on the community aarch64 builder to see how it goes.

@dhess
Copy link
Contributor Author

dhess commented Oct 15, 2018

I set enableParallelBuilding = true for GHC only:

master...dhess:ghc-aarch64-parallel

Then I started a build and ran the following script on the aarch64 community builder:

while true ; do ps auxwwww | grep ghc | grep "\-j" ; sleep 5 ; done

That was a bit naive, but it did the trick. During the ghc843 build, this caught the following steps:

nixbld1  45741 1357  0.1 282352576 252088 ?    Sl   01:58   1:07 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs
nixbld10 48004  441  0.1 279624492 158692 ?    Sl   01:58   0:13 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
nixbld1+ 48406  829  0.1 279550760 161092 ?    Sl   01:58   0:16 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
...
nixbld1  54146  536  0.1 279624492 163052 ?    Sl   02:00   0:16 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
nixbld10 55709  997  0.1 279624492 163528 ?    Sl   02:00   0:19 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
nixbld1+ 57453  0.0  0.1 279108368 153604 ?    Sl   02:01   0:13 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs
nixbld1  54146  202  0.1 279624492 168108 ?    Sl   02:00   0:16 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
...
nixbld1+ 17593  543  0.1 279477028 159904 ?    Sl   02:12   0:27 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs
nixbld1+ 17593  273  0.1 279477028 165072 ?    Sl   02:12   0:27 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs
nixbld1+ 25206  528  0.1 279624492 159060 ?    Sl   02:12   0:26 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
nixbld1+ 26419  607  0.1 279624492 161376 ?    Sl   02:12   0:24 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.hs
nixbld1+ 28814 1402  0.1 279477028 159552 ?    Sl   02:12   0:28 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs
nixbld1+ 30142 2239  0.1 279477028 156584 ?    Sl   02:12   0:22 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs

... plus a bunch more like this towards the end of the ghc843 build.

The good news is that ghc843 built successfully with enableParallelBuilding, but I've only made this one attempt so far, so it could be luck.

However, what's odd is that now that my build has moved on to building the haskellPackages it needs, and even though I didn't disable parallel builds for haskellPackages, I'm still seeing processes like this:

nixbld1+ 60981  397  0.1 279477032 161736 ?    Sl   02:15   0:19 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs
nixbld1+ 61217  397  0.1 279477028 160860 ?    Sl   02:15   0:15 /nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3/bin/ghc -B/nix/store/bz080s1vgziz7l2nd7a4smlikm6b46ki-ghc-8.4.3/lib/ghc-8.4.3 -package-db=/build/setup-package.conf.d -j64 -threaded --make -o Setup -odir /build -hidir /build Setup.lhs

So it looks like -jNCPUs is leaking through to some phases of haskellPackages builds; is this during -doc builds, maybe?

Anyway, I'm going to let it keep running to see if it can successfully build a couple of my packages.

@dhess
Copy link
Contributor Author

dhess commented Oct 15, 2018

Update: my test build built ghc843 in a few hours, and made it through large parts of haskellPackages. tls dies on some tests, but many major packages are fine, e.g., lens.

There is definitely something odd going on with -j leaking through during the configure build phase or something, but these issues can be addressed once Hydra gets through a ghc843 build. I'll open a new PR to enable parallel builds on GHC to make that happen.

Thanks to @ElvishJerricco for pointing out that GHC's build system doesn't use -j when it's bootstrapping itself.

@ElvishJerricco
Copy link
Contributor

Yea I forgot that it does a ghc --make during the setup of some internal build process tool (ghc-cabal I think). I don't know for sure that it does -jN in that process, but it sounds like it does. I'm guessing the -j's near the end are probably the test suite.

@dhess
Copy link
Contributor Author

dhess commented Oct 15, 2018

I've opened a new PR (#48446) to re-enable parallel builds of GHC on aarch64. Hopefully this will prove stable and we can start fixing aarch64-specific haskellPackages issues once the Hydra starts churning builds out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants