Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haskell: re-enable aarch64, disable parallel builds on that arch. #44482

Closed
wants to merge 1 commit into from
Closed

haskell: re-enable aarch64, disable parallel builds on that arch. #44482

wants to merge 1 commit into from

Conversation

dhess
Copy link
Contributor

@dhess dhess commented Aug 5, 2018

Attempt to work around unreliable Haskell builds on aarch64. See
https://ghc.haskell.org/trac/ghc/ticket/15449

Motivation for this change

ghc843 builds fine on aarch64-linux if you disable parallel builds, and the resulting binary is also capable of building many Haskell packages without issues (again, if parallel builds are disabled).

The only drawback to this fix is that ghc itself takes a very long time to build (~24h on my Jetson TX1).

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

@vcunat
Copy link
Member

vcunat commented Aug 5, 2018

The drawback may be fatal on Hydra, as the machine there is weak in single-core performance (but it has 96 cores). I suppose we can just try that and increase the timeout if needed, like we did for chromium #44225. I just now started the final ghc build on aarch64.nixos.community (it has the same HW as the machine on Hydra).

@@ -87,7 +87,8 @@ stdenv.mkDerivation rec {
sha256 = "1z05vkpaj54xdypmaml50hgsdpw29dhbs2r7magx0cm199iw73mv";
};

enableParallelBuilding = true;
# https://ghc.haskell.org/trac/ghc/ticket/15449
enableParallelBuilding = !buildPlatform.isAarch64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be targetPlatform if it is happening during codegen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing. Can we get a confirmation on this from @Ericson2314?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildPlatform sounds correct since according the ticket the problem happens when running multithreaded Haskell code on aarch64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok- I read the codegen part and thought it was talking about the target.

enableParallelBuilding = (versionOlder "7.8" ghc.version && !hasActiveLibrary) || versionOlder "8.0.1" ghc.version;
#
# Currently disabled for aarch64. See https://ghc.haskell.org/trac/ghc/ticket/15449.
enableParallelBuilding = ((versionOlder "7.8" ghc.version && !hasActiveLibrary) || versionOlder "8.0.1" ghc.version) && !(buildPlatform.isAarch64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be hostPlatform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, please review @Ericson2314

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary parenthesis around buildPlatform.isAarch64.

@Ericson2314
Copy link
Member

Those both are correct. The issue would probably both to running the stage 1 compiler building stage 2, and running the the stage 0 bootstrap compiler, so it doesn't matter what the host platform is.

@dhess
Copy link
Contributor Author

dhess commented Aug 9, 2018

Hi @Ericson2314, thank you for reviewing this!

To clarify, do you mean that the original patch is correct, or that both of @matthewbauer's corrections are correct?

@Ericson2314
Copy link
Member

buildPlatform for both is correct.

@dhess
Copy link
Contributor Author

dhess commented Aug 9, 2018

@Ericson2314 Thanks!

So assuming I make @dezgeg's small correction to the original patch, is there interest in merging this, despite the potential Hydra impact?

@matthewbauer
Copy link
Member

matthewbauer commented Aug 9, 2018

We might want to set hydraPlatforms so that it doesn't build everything on aarch64. The aarch64 tend to be the slowest of the 3 systems which holds up the queue.

@dhess
Copy link
Contributor Author

dhess commented Aug 9, 2018

I agree that it shouldn't be in the critical path for the other jobs, but wouldn't it be worth setting up a new Hydra job for haskellPackages on aarch64? Given the really slow GHC build time, it should really be cached. Also, one advantage of the fix is that it will only consume 1 build core, so it shouldn't too adversely impact other aarch64 jobs.

@matthewbauer
Copy link
Member

Also, one advantage of the fix is that it will only consume 1 build core, so it shouldn't too adversely impact other aarch64 jobs.

I mean that will make the problem worse. We build everything so we have to wait for those 1 core jobs to finish for a release. But it may be worth doing anyway just because it is such a pain to build GHC.

@dhess
Copy link
Contributor Author

dhess commented Aug 9, 2018

@matthewbauer Maybe I didn't explain myself clearly. What I am proposing is to create a new jobset, one that no other jobsets are dependent on, that builds haskellPackages on aarch64. If no other jobs are dependent on it, and it only consumes a couple of cores (because of the limited parallelism of aarch64 Haskell jobs), then it should not adversely affect other jobsets, no?

edit Additionally, this new jobset could be made lower priority, run less often, etc.

@matthewbauer
Copy link
Member

Ok sorry i misunderstood

@ElvishJerricco
Copy link
Contributor

Can we make a separate issue for whether Hydra builds Haskell stuff for aarch64? Making it uncached but at least buildable sounds like a step in the right direction.

@ElvishJerricco
Copy link
Contributor

Actually, wait, the trac issue claims it's an issue with GHC's -j. I thought -j wasn't used by GHC's build system since they just use make and ghc -M. Does that bug actually affect building GHC? Maybe we can build GHC in parallel (and thus cache it) but disable parallel building of Haskell packages.

@vcunat
Copy link
Member

vcunat commented Oct 14, 2018

@ElvishJerricco
Copy link
Contributor

@vcunat Huh. That build failure seems unrelated to the -j bug, no?

make[1]: fork: Resource temporarily unavailable

@dhess
Copy link
Contributor Author

dhess commented Oct 14, 2018

Hi all,

I'm sorry for the confusion. I first made this PR in early August. It languished and by the time I revisited the issue in my own fork, my changes had to be re-done because of changes in upstream. In the process, I forgot that I'd made this PR and made a new one here:

#47901

That was merged 10 days ago and that is why Hydra is now trying to build GHC on aarch64.

I'm going to close this PR now as it's no longer relevant. Probably the discussion about what to do with Hydra and GHC on aarch64 should be opened as a new issue. For one thing, at least some of the builds are taking over 10 hours and are getting killed by the Hydra for taking too long. I'm not sure what the fork: Resource temporarily unavailable issue is or whether it's related to Hydra killing the builds, but it has nothing to do with the -j bug.

@dhess dhess closed this Oct 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants