treewide: Introduce stdenv.cc.bintools #30549

Ericson2314 · 2017-10-18T18:29:33Z

Motivation for this change

Rather than refer to gcc or clang directly, we refer to stdenv.cc. But instead of doing the same thing for binutils, we fake binutils on Darwin, and expose binutils-raw. This introduces stdenv.cc.bintools for consistency.

Also, to be correct for cross, tools like compilers that use binutils/cctools at run time should depend on __targetPackages.stdenv.cc.bintools. This sort of weird, effectively referring to the same stage as the "next stage's previous stage", but I can't think of a better way.

This was made from #30484, which now depends on this. It just adds the stdenv.cc.binutils for now. Then down the road (along with other changes), binutils-raw can become binutils without changing anything.

Things done

Almost no hashes are changed by this PR.

Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
Built on platform(s) by CI
- NixOS
- macOS
- other Linux distributions
Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
Tested execution of all binary files (usually in ./result/bin/)
Fits CONTRIBUTING.md.

CC @bgamari

Ericson2314 · 2017-10-18T18:33:59Z

[@peti thanks for super-quickly reviewing the other; all the GHC changes are now in this one but exactly the same.]

edolstra · 2017-10-31T09:00:43Z

What is __targetPackages and why does it start with two underscores?

Ericson2314 · 2017-10-31T15:32:39Z

Answered in person, but I'll copy here for anyone else that's curious. targetPackages refers to the package set used for "emit time" dependencies, i.e. packages used by compilers not for themselves, but for their standard libraries.

targetPackages formerly had the __ prefix because its a somewhat unfortunate concept---compilers really ought not to build their own standard library, but I removed that, as I do intend this to be an official interface, albeit one to avoid if possible.

Ericson2314 · 2017-11-05T21:31:40Z

Poll: `targetPackages.stdenv.bintools` vs `targetPlatform`-determined `binutils`

OK, there's two ways to get things done, and I need everyone's help to decide which is best. I'm hoping to merge by the end of Wednesday, Novermber 7, so please way in quick!

For context, buildPackages is the package set for resolving build-time dependencies, and targetPackages is the package set for resolving "emit-time" dependencies. [For those who didn't see my informal talk, "emit time" is the sort of made-up concept for the run time of packages produced by the current package/stage (assume are are talking about compilers. It is the dual of build time.] When not doing cross builds, rest assured these are all the same, so please just assume cross compilation / they are not the same for evaluation the options.

The 3 package sets mean that if you need a build tool, you use buildPackages.foo, and if you need the same thing at run-time, you use just plain foo or pkgs.foo, and if you need it at emit time, you use targetPackages.foo. Alternatively, due to my crazy splicing stuff, if you put plain foo in nativeBuildInputs it will automatically resolve the buildPackages one.

Stdenv is handled weirdly here, not like other packages. (Indeed, I wish it weren't a derivation, in order to not raise false expectations). stdenv is a build-time dep of the packages in the same stage as it, not run-time one like the other packages. that means that pkgs.foo is built with pkgs.stdenv, buildPackages.foo is built with buildPackages.stdenv, and targetPackages.foo is built with targetPackages.stdenv. That means stdenv.cc is likely to be buildPackages.gcc or buildPackages.clang, not pkgs.gcc or pkgs.clang; any derivation
Because of this weirdness, no such splicing happens for the stdenv---buildPackages.stdenv or targetPackages.stdenv must be used explicitly.

Option 1: `targetPackages.stdenv.bintools`

This is what this PR does currently for run-time dependencies. Since the next stage draws the stdenv derivations from its previous stages, this means that this value is probably equal to pkgs.clang or pkgs.gcc. However, if the next stage is customized to use some non-standard tools, this will change the run-time deps of this stage's packages to respect that.

Pros

Correctness, arguably GHC should be baked to use ICC or whatever the next stage chooses

Cons

Cache misses: maybe those run-time deps don't matter, in which case it would be nicer to just use the cached build with the default
Potential for cycles / infinite recursion: It's much nicer for stages to only depend on themselves or the previous stage, which rules out inter-stage infinite recursion by construction

Option 2: `targetPlatform`-determined `binutils`

Make cc and bintools attributes in the stage, using the stage's targetPlatform to pick sane default tools. Splicing would work for these attributes, but buildPackages.cc need not match stdenv.cc (likewise for buildPackages.bintools for stdenv.cc.bintools), and cc also need not match targetPackages.stdenv.cc (likewise for bintools and targetPackages.stdenv.cc.bintools).

Pros

Cache hits: Since deps aren't affected by the next stage's stdenv, will always hit cache
No extra infinite recursion footgun: targetPackages is a nice datum, finite, and fully serializable as JSON.

Cons

Correctness: reverse of argument from before
buildPackages.cc or buildPackages.binutils should never be used: otherwise there is no point of the stdenv! But there's no good way to enforce this without doing splicing for native builds, but this is prohibitively costly. [We skip splicing today since pkgs == buildPackages == targetPackages in the native case; Adding warnings, adding errors, or removing attributes would break those equalities.]

…2314-cross-base

One should do this when needed executables at build time. It is more honest and cross-friendly than refering to binutils directly.

…tils directly One should do this when needed executables at run time. It is more honest and cross-friendly than refering to binutils directly, if one neeeds the default binary tools for the target platform, rather than binutils in particular.

orivej · 2017-11-06T05:31:13Z

First, I'd like to recap the structure of cross compilation, more like I imagine that than what it realy is. (Please correct me if I'm wrong.) Then I'd comment on your proposal.

To support usage of binutils and gcc with their notions of build, host and target platforms, the function that returns all packages is parametrized with these three variables. Let's mean exactly this when we say that each derivation is configured with the platform on which it should be built (PB), the platform on which it can run (PH), and the platform for which it should generate code when it runs (TP). (Certainly a derivation may in fact be independent from the value of one of these variables.)

Let A and B be the names of platforms, such as "x86_64-linux" and "aarch64-linux". Then the simplest bootstrap sequence to build a package using gcc with the configuration A B B (i.e. pb=A ph=B tp=B), so that we could build it on a A host, copy it to a B host and run there, is:

Stage packages	PB	PH	TP
bootstrap compiler and tools	N/A	A	A
cross compiler, build tools	A	A	B
target program, runtime dependencies	A	B	B

The bootstrap is downloaded and unpacked using Nix primitives, so its builder does not run any programs (at least from the perspective of Nixpkgs) which makes it independent from the build platform.

However, we don't trust the bootstrap compiler to build programs during the native compilation, and we won't trust it to build the build tools for the cross compilation. We will introduce another stage, and build the build tools with the native compiler:

Stage packages	PB	PH	TP
(0) bootstrap compiler and tools	N/A	A	A
(1) native compiler, tools for tools for target program	A	A	A
(2) cross compiler, tools for target program	A	A	B
(3) target program, runtime dependencies	A	B	B

(Note that stage 1 tools and stage 2 tools that don't depend on TP — almost everything besides gcc and binutils — evaluate to the same derivations. However to confirm this Nix has to evaluate each definition twice — once for A A A and once for A A B.)

To be safe, the builder of a derivation of each stage should run programs only from the previous stage and link to libraries (or save references to programs for use at the run time) only from the current stage, with a few exceptions:

bootstrap programs may run other bootstrap programs
the effective previous stage of stage 1 is a composition of bootstrap tools with stage 1 (not just bare bootstrap tools)
a compiler configured for A A B also generates libraries such as ~~libc~~ libgcc effectively configured for A B, so it's fine that the programs in the next stage configured for A B B link with them

We can model this pipeline in Nix with nested sets:

bootstrap.pkgs is a set of pkgs in stage 0
bootstrap.stage.A.pkgs is a set of pkgs in stage 1 (configured as A A A)
bootstrap.stage.A.stage.B.pkgs is a set of pkgs in stage 2 (configured as A A B)
bootstrap.stage.A.stage.B.stage.B.pkgs is a set of pkgs in stage 3 (configured as A B B)

To make a derivation for a certain stage, we call its function with pkgs that refer to the packages in the current stage, and buildPackages that refer to pkgs in the previous stage.

stdenv is a wrapper for some build tools, so to build a derivation for a certain stage, we should use stdenv from the previous stage.

@Ericson2314 Overall I'm in favor of option 1. If I understand the situation correctly, the overhead from using the target stdenv (the cache miss contra) is minimal because about everything depends on stdenv. However, I do not see why stdenv should violate the separation of stages and why callPackage of the current stage can't just use stdenv from the previous stage.

Ericson2314 · 2017-11-06T18:51:27Z

@orivej Fantastic recap. That's almost entirely exactly correct [1], in as much detail as I've seen anyone else write.

Overall I'm in favor of option 1.

I think me too, it's better to get those run-time deps correct while we're not sure whether it matters.

If I understand the situation correctly, the overhead from using the target stdenv (the cache miss contra) is minimal because about everything depends on stdenv.

That's true that if we're building stage 4 with a non-standard stdenv, we'll be rebuilding the world anyways, good point. I guess the case I was thinking is stage 3 caching: e.g. if stage 4 is has a Linux host platform, but uses clang, should we rebuild stage 3 compilers that depend on CC to (run-time) use clang instead of gcc, or can we rely on wrapper scripts etc to force them to run-time use stdenv.cc? But you are totally right that in general far more stage 4 packages than stage 3 packages will be built (I don't think very many things are used as build-time deps in practice), so stage 3 caching may well be not worth thinking about.

However, I do not see why stdenv should violate the separation of stages and why callPackage of the current stage can't just use stdenv from the previous stage.

Hehehe. So first of all, I don't really like our notion of stdenv, I rather just have mkDerivation; that along might clear up some things :). But let me give stdenv its due. stdenv makes most sense not as a package, and not as value in any package set, but just as a parameter to a stage. It's a way of asking "what tools should I use for this stage"; a limited solution to the general problem of there being multiple implementation packages to a given interface.

It doesn't belong in the previous stage because for the same reason we don't like earlier stages depending on later stages: we don't like earlier stages determining later stages. Think with a GC'd heap and immutable linked lists, many link lists could share the same tail, and the tail has no idea its any linked list's tail, as it's a full-fledged linked list in its own right. Likewise, with a bootstrapping sequence, the goal is any subsequence from 0 should be just as valid--stages 0-1 really is the native chain, in addition to being part of the 0-4 cross chain. Now yes, since we are effectively doing a doubly-linked chain with dfold in pkgs/stdenv/booter.nix, there is less modularity / induction in reality, but that's still the aspiration.

So since as you point out, it doesn't belong in the current stage, and as I point out it doesn't belong in the previous stage, my parameter-only idea is just concluding it doesn't involve in any stage! It would just exist as an "ephemeral" parameter in order to select the right derivations without having to override anything (as that's anti-modular / bad for sharing and thus expensive, depending on how one looks at it). Now again, that ignores the reality of these backwards dependencies and what-not, but that's the aspiration.

tl;dr is yes we can be better, but that's out of scope of this PR :).

[1]: Well, except for the tiny quibble that its libgcc not libc that's built with the compiler, libc does need to be built with the final compiler, but that means a backlink from stage 3 to 2, whereas libgcc like you said creates an invisible run-time dep from 2-3 (invisible because the derivation is gcc's.

periklis · 2017-11-07T08:10:14Z

@orivej Thanks from me too for this cc recap. The text is good candidate for an intro/motivation section on CC in the docs.

edolstra · 2017-11-08T16:10:30Z

Looks good to me.

copumpkin

LGTM, thanks!

Ericson2314 · 2017-11-08T19:20:24Z

The travis failure is real, but due some other PR or darwin version impurity. gtikgnutella built fine on this machine on this branch w/o merge.

Thanks, for the final reviews!

periklis · 2017-11-09T07:31:50Z

👏 🎉

Ericson2314 requested review from copumpkin, LnL7, edolstra, abbradar and vcunat October 18, 2017 18:29

Ericson2314 requested a review from peti as a code owner October 18, 2017 18:29

Ericson2314 mentioned this pull request Oct 18, 2017

bfd, opcodes: Init separate derivations for binutils libraries #30484

Merged

8 tasks

Ericson2314 force-pushed the bintools branch 3 times, most recently from 562e1dd to 1751bff Compare October 18, 2017 20:15

Ericson2314 added the 6.topic: cross-compilation label Oct 18, 2017

Ericson2314 force-pushed the bintools branch from 1751bff to de705bf Compare October 19, 2017 20:18

Ericson2314 force-pushed the bintools branch from de705bf to 56642f4 Compare October 30, 2017 18:42

GrahamcOfBorg added 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 labels Oct 30, 2017

Ericson2314 force-pushed the bintools branch from 56642f4 to 5f1b67c Compare October 31, 2017 15:18

Ericson2314 force-pushed the bintools branch from 5f1b67c to d689a98 Compare November 5, 2017 20:05

Ericson2314 requested a review from nbp as a code owner November 5, 2017 20:05

Ericson2314 added 10.rebuild-darwin-stdenv 10.rebuild-linux-stdenv and removed 10.rebuild-darwin-stdenv 10.rebuild-linux-stdenv labels Nov 5, 2017

Ericson2314 added 4 commits November 5, 2017 17:09

Merge remote-tracking branch 'channels/nixpkgs-unstable' into ericson…

Verified

This commit was signed with the committer’s verified signature.

fpletz Franz Pletz

GPG key ID: 846FDED7792617B4

Verified
Learn about vigilant mode

a8f3d72

…2314-cross-base

Rename __targetPackages to targetPackages

Verified

This commit was signed with the committer’s verified signature.

fpletz Franz Pletz

GPG key ID: 846FDED7792617B4

Verified
Learn about vigilant mode

5ae8f18

treewide: Depend on stdenv.cc.bintools instead of binutils directly

70d91ba

One should do this when needed executables at build time. It is more honest and cross-friendly than refering to binutils directly.

Ericson2314 force-pushed the bintools branch from d689a98 to 4d4f94c Compare November 5, 2017 22:14

copumpkin approved these changes Nov 8, 2017

View reviewed changes

Ericson2314 merged commit 0101856 into NixOS:master Nov 8, 2017

Ericson2314 deleted the bintools branch November 8, 2017 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

treewide: Introduce stdenv.cc.bintools #30549

treewide: Introduce stdenv.cc.bintools #30549

Ericson2314 commented Oct 18, 2017 •

edited

Loading

Ericson2314 commented Oct 18, 2017 •

edited

Loading

edolstra commented Oct 31, 2017

Ericson2314 commented Oct 31, 2017

Ericson2314 commented Nov 5, 2017 •

edited

Loading

orivej commented Nov 6, 2017 •

edited

Loading

Ericson2314 commented Nov 6, 2017

periklis commented Nov 7, 2017

edolstra commented Nov 8, 2017

copumpkin left a comment

Ericson2314 commented Nov 8, 2017

periklis commented Nov 9, 2017

treewide: Introduce stdenv.cc.bintools #30549

treewide: Introduce stdenv.cc.bintools #30549

Conversation

Ericson2314 commented Oct 18, 2017 • edited Loading

Motivation for this change

Things done

Ericson2314 commented Oct 18, 2017 • edited Loading

edolstra commented Oct 31, 2017

Ericson2314 commented Oct 31, 2017

Ericson2314 commented Nov 5, 2017 • edited Loading

Poll: targetPackages.stdenv.bintools vs targetPlatform-determined binutils

Option 1: targetPackages.stdenv.bintools

Pros

Cons

Option 2: targetPlatform-determined binutils

Pros

Cons

orivej commented Nov 6, 2017 • edited Loading

Ericson2314 commented Nov 6, 2017

periklis commented Nov 7, 2017

edolstra commented Nov 8, 2017

copumpkin left a comment

Choose a reason for hiding this comment

Ericson2314 commented Nov 8, 2017

periklis commented Nov 9, 2017

Ericson2314 commented Oct 18, 2017 •

edited

Loading

Ericson2314 commented Oct 18, 2017 •

edited

Loading

Ericson2314 commented Nov 5, 2017 •

edited

Loading

Poll: `targetPackages.stdenv.bintools` vs `targetPlatform`-determined `binutils`

Option 1: `targetPackages.stdenv.bintools`

Option 2: `targetPlatform`-determined `binutils`

orivej commented Nov 6, 2017 •

edited

Loading