Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preview: deterministic build #2281

Closed

Conversation

alexanderkjeldaas
Copy link
Contributor

This is a set of changes that makes the system_tarball_pc derivation deterministic.

Stdenv bootstrap:

  • The stdenvs created during bootstrap are numbered to make it easier to debug/understand
  • BUG: for the bootstrap stdenvs, binutils was after the bootstrap binaries in path.
  • Deterministic archives are enabled early during bootstrapping of stdenvs.

Stdenv builder:

  • The generic builder depends on libfaketime. libfaketime needs a home. libfaketime is currently distributed through github, that that does not work during bootstrap when SSL is not available. It should be moved to tarballs.nixos.org.
  • Feature: Setting useFakeTime fixes the time during builds. This can break builds, but is easy to manage compared to patching. Additional environment variables are available for configuring libfaketime.
  • Feature: A fake "date" utility is prepended to the path
  • Check: The build directory (/tmp/nix-build-foo-x.y.z) is not allowed to appear in artifacts ($out) to avoid non-deterministic output.

Gcc:

  • Gcc by default now defines DATE and TIME to be (time_t)0.
  • TODO: -frandom-seed is not set. This is relevant for c++ code.
  • PGO is turned off for gcc. Note: there is no consensus on doing this as it reduces compilation performance.

Various changes:

  • The xsltproc utility creates random identifiers. A post-processing stage using perl was added to the nixos manual. This could be generalized.
  • All uses of 'gzip -9' has been replaced with 'gzip -9n'.
  • Added options to cpio for default uid/gid, and default mtime for tar.
  • File lists are sorted before being added to cpio/tar.

@alexanderkjeldaas
Copy link
Contributor Author

Known issues:

  1. The inode changes done in perl seems to be buggy (boot problems), but they might not be needed.
  2. Some patches are not required because of the gcc-wrapper and libfaketime features.
  3. I'm not sure the overridden 'date' binary works correctly.

@shlevy
Copy link
Member

shlevy commented Apr 16, 2014

I think nix uses the equivalent of (time_t) 1 for its file a/m/ctime changes

@domenkozar
Copy link
Member

Wonderful. @alexanderkjeldaas that means most of other PRs can be closed?

@thoughtpolice
Copy link
Member

@vcunat @alexanderkjeldaas Can we get some of these things merged on the pending stdenv branch? I know that we haven't solved the GCC question re: PGO, so we'll have to leave that commit out, but a lot of these changes are not very intrusive, and merging them would reduce burdens later and get us much closer to a deterministic build.

@vcunat
Copy link
Member

vcunat commented Jun 9, 2014

Ah, I completely forgot this series of work. The currently staged stdenv has been quite tested by me, so I would merge it about the current state (after Hydra verifies gcc on Darwin). I'll look at this afterwards, as I wanted to do another iteration of stdenv stuff (there were some others I missed this time).

@alexanderkjeldaas
Copy link
Contributor Author

@vcunat I'll just leave it as-is then I guess.

On Mon, Jun 9, 2014 at 7:40 PM, Vladimír Čunát notifications@github.com
wrote:

Ah, I completely forgot this series of work. The currently staged stdenv
has been quite tested by me, so I would merge it about the current state
(after Hydra verifies gcc on Darwin). I'll look at this afterwards, as I
wanted to do another iteration of stdenv stuff (there were some others I
missed this time).


Reply to this email directly or view it on GitHub
#2281 (comment).

@vcunat vcunat added the stdenv label Jun 9, 2014
@thoughtpolice
Copy link
Member

@alexanderkjeldaas I'm going to begin merging some of this work into HEAD soon. I'm probably not going to merge everything in one go, so feel free to rebase this when you get a chance. I'll update with what I've pushed upstream.

@alexanderkjeldaas
Copy link
Contributor Author

I've rebased

On Sun, Jun 29, 2014 at 3:47 AM, Austin Seipp notifications@github.com
wrote:

@alexanderkjeldaas https://github.com/alexanderkjeldaas I'm going to
begin merging some of this work into HEAD soon. I'm probably not going to
merge everything in one go, so feel free to rebase this when you get a
chance. I'll update with what I've pushed upstream.


Reply to this email directly or view it on GitHub
#2281 (comment).

@alexanderkjeldaas
Copy link
Contributor Author

I've added a minor fix for python 2.7.7 that I forgot to cherry-pick from my internal branch.

@7c6f434c
Copy link
Member

So, what is the status of cherry-picking? Github doesn't easily show this, unfortunately…

Obviously, this will never get directly in master (only in staging) and it gets chery-picked in small pieces.

I actually support reproducibility, although some people seem to like PGO too much…

@vcunat
Copy link
Member

vcunat commented Aug 30, 2014

PGO is probably the only questionable thing here, IIRC. I'm planning to really review and test this within the next 10 days.

Remove datetime from nscd.
This includes two changes:
1) Fix a bug where the bootstrap-tools is always used instead of binutils
2) Enable strip --enable-deterministic-archives as soon as a new binutils
   is available.
1) Make the core python libraries deterministic.
2) Make the python libraries created by glib deterministic.
@joachifm
Copy link
Contributor

Note that 3157dbe is probably no longer required due to tytso/e2fsprogs@a2143b5

@joachifm
Copy link
Contributor

I've experimented with patching gzip -9 -> gzip -9n, but it doesn't seem to matter (to nix-build --check, anyway). Could this be because we now fix timestamps after unpacking sources? Or are there still reasons for doing it, even with SOURCE_DATE_EPOCH?

@joachifm
Copy link
Contributor

Hm, I guess gzip -n still makes sense if the thing being compressed was created as part of the build.

@joachifm
Copy link
Contributor

I've been going through most of these. A brief summary so far:

I think the following are obsolete

  • smartmontools passes nix-build --check & the string that is patched out
    in fd1101a
    no longer occurs in the source.
  • glibc currently contains a patch that removes datetime from nscd
  • the openldap patch is no longer relevant, I think, the build no longer skips
    elf patching & the build passes --check
  • groff and opensp pass --check on my end
  • the improvement to the linux stdenv are at least partially covered (strip is called with --deterministic-archives)

The following need more work

  • syslinux fails --check, but the patch in this PR is insufficient
  • as reported previously, the manual is still indeterministic

I've not looked at libgpg-error, busybox, python, perl, or the gcc stuff, nor the general libfaketime support.

@joachifm
Copy link
Contributor

Regarding the manual, https://wiki.debian.org/ReproducibleBuilds/ExperimentalToolchain#libxslt indicates that the issue of random ids is fixed/worked on upstream, so perhaps we want to just skip that for now.

@cleverca22
Copy link
Contributor

cleverca22 commented Dec 20, 2016

something that can help to stress-test determinism, http://manpages.ubuntu.com/manpages/xenial/man1/disorderfs.1.html

this is a fuse filesystem that randomizes the order of files in a directory, so you cant accidentally rely on the fs doing that most of the time

@joachifm
Copy link
Contributor

Some of the perl stuff was lost when perl16 was removed, but otherwise it seems like @vcunat (or whomever) picked up most of the specific package fixes. I think we're left with gcc, faketime in stdenv, stdenv numbering, and the fake date command thing, all of which seem like they could be profitably dealt with on their own.

@vcunat
Copy link
Member

vcunat commented Dec 21, 2016

IIRC I had tried hard to pick whatever I could verify/make clearly advantegous. For some issues (like PGO) I didn't succeed in a reasonable amount of time, so I left those behind.

@cbarrett
Copy link

FYI this PR is being linked to from https://reproducible-builds.org/who/. I volunteer to ping whoever's necessary to get an update posted (don't have the knowledge for more, unfortunately).

@Ekleog
Copy link
Member

Ekleog commented Sep 26, 2018

(triage) My reading of the comments in this thread is that most of the changes either have been merged or are no longer needed. The remaining changes that would require being split out to separate PRs would be:

Does that sound correct to those actually involved?

@rht
Copy link
Member

rht commented Dec 4, 2018

Note: the GCC stuff (force-setting __DATE__ and __TIME__) in #2281 (comment) has been superseded by SOURCE_DATE_EPOCH, which is already set in stdenv, and also that the feature for checking for SOURCE_DATE_POCH has been patched to the early gcc versions as well (4.9.x, 5.x, 6.x).

@rht
Copy link
Member

rht commented Dec 4, 2018

The fake date command is no longer necessary, where faketime by itself is sufficient, so I suppose the remaining points are 1. numbering the names of stdenv, and 2. using faketime in stdenv.

@mmahut
Copy link
Member

mmahut commented Aug 12, 2019

Are there any updates on this pull request, please?

@worldofpeace
Copy link
Contributor

This is not actionable in its current state. Please open issues with references to the changes within this PR if needed.

@davidak
Copy link
Member

davidak commented Oct 14, 2019

There is a GitHub project for this topic: https://github.com/NixOS/nixpkgs/projects/10

@worldofpeace can you add this and every related issue (even closed) to it?

Then https://reproducible-builds.org/who/ can link to that.

The current status is visible on https://r13y.com/ (at least for nixos-unstable's iso_minimal).

@vcunat vcunat added this to others in Deterministic builds Oct 14, 2019
@vcunat
Copy link
Member

vcunat commented Oct 14, 2019

Perhaps it might make more sense to have one ticket for each (important) irreproducible package? (or "unreproducible"? whatever) At least for those that someone started to investigate and didn't immediately manage to fix it completely.

@worldofpeace
Copy link
Contributor

Sure, it makes sense to have issues in the tracker for this purpose. Though I'm not sure if I'll have the time to distill them from this thread. @vcunat that is an important point for those who did take time to investigate that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet