Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treewide: Introduce stdenv.cc.bintools #30549

Merged
merged 4 commits into from
Nov 8, 2017

Conversation

Ericson2314
Copy link
Member

@Ericson2314 Ericson2314 commented Oct 18, 2017

Motivation for this change

Rather than refer to gcc or clang directly, we refer to stdenv.cc. But instead of doing the same thing for binutils, we fake binutils on Darwin, and expose binutils-raw. This introduces stdenv.cc.bintools for consistency.

Also, to be correct for cross, tools like compilers that use binutils/cctools at run time should depend on __targetPackages.stdenv.cc.bintools. This sort of weird, effectively referring to the same stage as the "next stage's previous stage", but I can't think of a better way.

This was made from #30484, which now depends on this. It just adds the stdenv.cc.binutils for now. Then down the road (along with other changes), binutils-raw can become binutils without changing anything.

Things done

Almost no hashes are changed by this PR.

  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s) by CI
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

CC @bgamari

Sorry, something went wrong.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Oct 18, 2017

[@peti thanks for super-quickly reviewing the other; all the GHC changes are now in this one but exactly the same.]

@edolstra
Copy link
Member

What is __targetPackages and why does it start with two underscores?

@Ericson2314
Copy link
Member Author

Answered in person, but I'll copy here for anyone else that's curious. targetPackages refers to the package set used for "emit time" dependencies, i.e. packages used by compilers not for themselves, but for their standard libraries.

targetPackages formerly had the __ prefix because its a somewhat unfortunate concept---compilers really ought not to build their own standard library, but I removed that, as I do intend this to be an official interface, albeit one to avoid if possible.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Nov 5, 2017

Poll: targetPackages.stdenv.bintools vs targetPlatform-determined binutils

OK, there's two ways to get things done, and I need everyone's help to decide which is best. I'm hoping to merge by the end of Wednesday, Novermber 7, so please way in quick!

For context, buildPackages is the package set for resolving build-time dependencies, and targetPackages is the package set for resolving "emit-time" dependencies. [For those who didn't see my informal talk, "emit time" is the sort of made-up concept for the run time of packages produced by the current package/stage (assume are are talking about compilers. It is the dual of build time.] When not doing cross builds, rest assured these are all the same, so please just assume cross compilation / they are not the same for evaluation the options.

The 3 package sets mean that if you need a build tool, you use buildPackages.foo, and if you need the same thing at run-time, you use just plain foo or pkgs.foo, and if you need it at emit time, you use targetPackages.foo. Alternatively, due to my crazy splicing stuff, if you put plain foo in nativeBuildInputs it will automatically resolve the buildPackages one.

Stdenv is handled weirdly here, not like other packages. (Indeed, I wish it weren't a derivation, in order to not raise false expectations). stdenv is a build-time dep of the packages in the same stage as it, not run-time one like the other packages. that means that pkgs.foo is built with pkgs.stdenv, buildPackages.foo is built with buildPackages.stdenv, and targetPackages.foo is built with targetPackages.stdenv. That means stdenv.cc is likely to be buildPackages.gcc or buildPackages.clang, not pkgs.gcc or pkgs.clang; any derivation
Because of this weirdness, no such splicing happens for the stdenv---buildPackages.stdenv or targetPackages.stdenv must be used explicitly.

Option 1: targetPackages.stdenv.bintools

This is what this PR does currently for run-time dependencies. Since the next stage draws the stdenv derivations from its previous stages, this means that this value is probably equal to pkgs.clang or pkgs.gcc. However, if the next stage is customized to use some non-standard tools, this will change the run-time deps of this stage's packages to respect that.

Pros

  • Correctness, arguably GHC should be baked to use ICC or whatever the next stage chooses

Cons

  • Cache misses: maybe those run-time deps don't matter, in which case it would be nicer to just use the cached build with the default

  • Potential for cycles / infinite recursion: It's much nicer for stages to only depend on themselves or the previous stage, which rules out inter-stage infinite recursion by construction

Option 2: targetPlatform-determined binutils

Make cc and bintools attributes in the stage, using the stage's targetPlatform to pick sane default tools. Splicing would work for these attributes, but buildPackages.cc need not match stdenv.cc (likewise for buildPackages.bintools for stdenv.cc.bintools), and cc also need not match targetPackages.stdenv.cc (likewise for bintools and targetPackages.stdenv.cc.bintools).

Pros

  • Cache hits: Since deps aren't affected by the next stage's stdenv, will always hit cache

  • No extra infinite recursion footgun: targetPackages is a nice datum, finite, and fully serializable as JSON.

Cons

  • Correctness: reverse of argument from before

  • buildPackages.cc or buildPackages.binutils should never be used: otherwise there is no point of the stdenv! But there's no good way to enforce this without doing splicing for native builds, but this is prohibitively costly. [We skip splicing today since pkgs == buildPackages == targetPackages in the native case; Adding warnings, adding errors, or removing attributes would break those equalities.]

@Ericson2314 Ericson2314 added 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild 10.rebuild-linux-stdenv This PR causes stdenv to rebuild and removed 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild 10.rebuild-linux-stdenv This PR causes stdenv to rebuild labels Nov 5, 2017

Verified

This commit was signed with the committer’s verified signature.
fpletz Franz Pletz
…2314-cross-base

Verified

This commit was signed with the committer’s verified signature.
fpletz Franz Pletz
One should do this when needed executables at build time. It is more
honest and cross-friendly than refering to binutils directly.
…tils directly

One should do this when needed executables at run time. It is more
honest and cross-friendly than refering to binutils directly, if one
neeeds the default binary tools for the target platform, rather than
binutils in particular.
@orivej
Copy link
Contributor

orivej commented Nov 6, 2017

First, I'd like to recap the structure of cross compilation, more like I imagine that than what it realy is. (Please correct me if I'm wrong.) Then I'd comment on your proposal.

To support usage of binutils and gcc with their notions of build, host and target platforms, the function that returns all packages is parametrized with these three variables. Let's mean exactly this when we say that each derivation is configured with the platform on which it should be built (PB), the platform on which it can run (PH), and the platform for which it should generate code when it runs (TP). (Certainly a derivation may in fact be independent from the value of one of these variables.)

Let A and B be the names of platforms, such as "x86_64-linux" and "aarch64-linux". Then the simplest bootstrap sequence to build a package using gcc with the configuration A B B (i.e. pb=A ph=B tp=B), so that we could build it on a A host, copy it to a B host and run there, is:

Stage packages PB PH TP
bootstrap compiler and tools N/A A A
cross compiler, build tools A A B
target program, runtime dependencies A B B

The bootstrap is downloaded and unpacked using Nix primitives, so its builder does not run any programs (at least from the perspective of Nixpkgs) which makes it independent from the build platform.

However, we don't trust the bootstrap compiler to build programs during the native compilation, and we won't trust it to build the build tools for the cross compilation. We will introduce another stage, and build the build tools with the native compiler:

Stage packages PB PH TP
(0) bootstrap compiler and tools N/A A A
(1) native compiler, tools for tools for target program A A A
(2) cross compiler, tools for target program A A B
(3) target program, runtime dependencies A B B

(Note that stage 1 tools and stage 2 tools that don't depend on TP — almost everything besides gcc and binutils — evaluate to the same derivations. However to confirm this Nix has to evaluate each definition twice — once for A A A and once for A A B.)

To be safe, the builder of a derivation of each stage should run programs only from the previous stage and link to libraries (or save references to programs for use at the run time) only from the current stage, with a few exceptions:

  • bootstrap programs may run other bootstrap programs
  • the effective previous stage of stage 1 is a composition of bootstrap tools with stage 1 (not just bare bootstrap tools)
  • a compiler configured for A A B also generates libraries such as libc libgcc effectively configured for A B, so it's fine that the programs in the next stage configured for A B B link with them

We can model this pipeline in Nix with nested sets:

  • bootstrap.pkgs is a set of pkgs in stage 0
  • bootstrap.stage.A.pkgs is a set of pkgs in stage 1 (configured as A A A)
  • bootstrap.stage.A.stage.B.pkgs is a set of pkgs in stage 2 (configured as A A B)
  • bootstrap.stage.A.stage.B.stage.B.pkgs is a set of pkgs in stage 3 (configured as A B B)

To make a derivation for a certain stage, we call its function with pkgs that refer to the packages in the current stage, and buildPackages that refer to pkgs in the previous stage.

stdenv is a wrapper for some build tools, so to build a derivation for a certain stage, we should use stdenv from the previous stage.


@Ericson2314 Overall I'm in favor of option 1. If I understand the situation correctly, the overhead from using the target stdenv (the cache miss contra) is minimal because about everything depends on stdenv. However, I do not see why stdenv should violate the separation of stages and why callPackage of the current stage can't just use stdenv from the previous stage.

@Ericson2314
Copy link
Member Author

@orivej Fantastic recap. That's almost entirely exactly correct [1], in as much detail as I've seen anyone else write.

Overall I'm in favor of option 1.

I think me too, it's better to get those run-time deps correct while we're not sure whether it matters.

If I understand the situation correctly, the overhead from using the target stdenv (the cache miss contra) is minimal because about everything depends on stdenv.

That's true that if we're building stage 4 with a non-standard stdenv, we'll be rebuilding the world anyways, good point. I guess the case I was thinking is stage 3 caching: e.g. if stage 4 is has a Linux host platform, but uses clang, should we rebuild stage 3 compilers that depend on CC to (run-time) use clang instead of gcc, or can we rely on wrapper scripts etc to force them to run-time use stdenv.cc? But you are totally right that in general far more stage 4 packages than stage 3 packages will be built (I don't think very many things are used as build-time deps in practice), so stage 3 caching may well be not worth thinking about.

However, I do not see why stdenv should violate the separation of stages and why callPackage of the current stage can't just use stdenv from the previous stage.

Hehehe. So first of all, I don't really like our notion of stdenv, I rather just have mkDerivation; that along might clear up some things :). But let me give stdenv its due. stdenv makes most sense not as a package, and not as value in any package set, but just as a parameter to a stage. It's a way of asking "what tools should I use for this stage"; a limited solution to the general problem of there being multiple implementation packages to a given interface.

It doesn't belong in the previous stage because for the same reason we don't like earlier stages depending on later stages: we don't like earlier stages determining later stages. Think with a GC'd heap and immutable linked lists, many link lists could share the same tail, and the tail has no idea its any linked list's tail, as it's a full-fledged linked list in its own right. Likewise, with a bootstrapping sequence, the goal is any subsequence from 0 should be just as valid--stages 0-1 really is the native chain, in addition to being part of the 0-4 cross chain. Now yes, since we are effectively doing a doubly-linked chain with dfold in pkgs/stdenv/booter.nix, there is less modularity / induction in reality, but that's still the aspiration.

So since as you point out, it doesn't belong in the current stage, and as I point out it doesn't belong in the previous stage, my parameter-only idea is just concluding it doesn't involve in any stage! It would just exist as an "ephemeral" parameter in order to select the right derivations without having to override anything (as that's anti-modular / bad for sharing and thus expensive, depending on how one looks at it). Now again, that ignores the reality of these backwards dependencies and what-not, but that's the aspiration.

tl;dr is yes we can be better, but that's out of scope of this PR :).


[1]: Well, except for the tiny quibble that its libgcc not libc that's built with the compiler, libc does need to be built with the final compiler, but that means a backlink from stage 3 to 2, whereas libgcc like you said creates an invisible run-time dep from 2-3 (invisible because the derivation is gcc's.

@periklis
Copy link
Contributor

periklis commented Nov 7, 2017

@orivej Thanks from me too for this cc recap. The text is good candidate for an intro/motivation section on CC in the docs.

@edolstra
Copy link
Member

edolstra commented Nov 8, 2017

Looks good to me.

Copy link
Member

@copumpkin copumpkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Ericson2314
Copy link
Member Author

The travis failure is real, but due some other PR or darwin version impurity. gtikgnutella built fine on this machine on this branch w/o merge.

Thanks, for the final reviews!

@Ericson2314 Ericson2314 merged commit 0101856 into NixOS:master Nov 8, 2017
@Ericson2314 Ericson2314 deleted the bintools branch November 8, 2017 19:20
@periklis
Copy link
Contributor

periklis commented Nov 9, 2017

👏 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cross-compilation Building packages on a different platform than they will be used on 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants