Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging next #90058

Merged
merged 193 commits into from Jun 15, 2020
Merged

Staging next #90058

merged 193 commits into from Jun 15, 2020

Conversation

vcunat
Copy link
Member

@vcunat vcunat commented Jun 10, 2020

No description provided.

gnprice and others added 30 commits March 30, 2020 22:56
The ./configure script prints a warning when passed this flag,
starting with 3.7:

  configure: WARNING: unrecognized options: --with-threads

The reason is that there's no longer such a thing as a build
without threads.

Eliminate the warning, by only passing the flag on the older releases
that accept it.

Upstream change and discussion:
  python/cpython@a6a4dc816
  https://bugs.python.org/issue31370
When running `cargo test --release`, the artifacts from `buildPhase`
will be reused here. Previously, most of the stuff had to be recompiled
without optimizations.
This is done by gathering all binaries to install before running the
checkPhase.
Hydra nixpkgs: ?compare=1586582
Without this flag, the configure script prints a warning at the end,
like this (reformatted):

  If you want a release build with all stable optimizations active
  (PGO, etc), please run ./configure --enable-optimizations

We're doing a build to distribute to people for day-to-day use,
doing things other than developing the Python interpreter.  So
that's certainly a release build -- we're the target audience for
this recommendation.

---

And, trying it out, upstream isn't kidding!  I ran the standard
benchmark suite that the CPython developers use for performance
work, "pyperformance".  Following its usage instructions:
  https://pyperformance.readthedocs.io/usage.html
I ran the whole suite, like so:

  $ nix-shell -p ./result."$variant" --run '
      cd $(mktemp -d); python -m venv venv; . venv/bin/activate
      pip install pyperformance
      pyperformance run -o ~/tmp/result.'"$variant"'.json
    '

and then examined the results with commands like:

  $ python -m pyperf compare_to --table -G \
      ~/tmp/result.{$before,$after}.json

Across all the benchmarks in the suite, the median speedup was 16%.
(Meaning 1.16x faster; 14% less time).

The middle half of them ranged from a 13% to a 22% speedup.

Each of the 60 benchmarks in the suite got faster, by speedups
ranging from 3% to 53%.

---

One reason this isn't just the default to begin with is that, until
recently, it made the build a lot slower.  What it does is turn on
profile-guided optimization, which means first build for profiling,
then run some task to get a profile, then build again using the
profile.  And, short of further customization, the task it would use
would be nearly the full test suite, which includes a lot of
expensive and slow tests, and can easily take half an hour to run.

Happily, in 2019 an upstream developer did the work to carefully
select a more appropriate set of tests to use for the profile:
  python/cpython@4e16a4a31
  https://bugs.python.org/issue36044
This suite takes just 2 minutes to run.  And the resulting final
build is actually slightly faster than with the much longer suite,
at least as measured by those standard "pyperformance" benchmarks.
That work went into the 3.8 release, but the same list works great
if used on older releases too.

So, start passing that --enable-optimizations flag; and backport
that good-for-PGO set of tests, so that we use it on all releases.
Stripping reduces closure sizes.
There are several tarballs (such as the `rust-lang/rust`-source) with a
`Cargo.toml` at root and several sub-packages (with their own Cargo.toml)
without using workspaces[1].

In such a case it's needed to move into a subdir to only build the
specified sub-package (e.g. `rustfmt` or `rsl`), however the artifacts
are at `/target` in the root-dir of the build environment. This breaks
the build since `buildRustPackage` searches for executables in `target`
(which is at the build-env's root) at the end of the `buildPhase`.

With the optional `buildAndTestSubdir`-argument, the builder moves into
the specified subdir using `pushd`/`popd` during `buildPhase` and
`checkPhase`.

Also moved the logic to find executables and libs to the end of the `buildPhase`
from a custom `postBuild`-hook to fix packages with custom `build`/`install`-procedures
such as `uutils-coreutils`.

[1] https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html
A lot of tests are using `debug_assert!` which isn't available in
release-mode.
When running the default builder for Rust, the artifacts would be stored
in `target/<arch>/<profile>`, however the `install`-target expects the
default structure (`target/<profile>`) of `cargo`-builds.

When using the Makefile for building as well, the expected structure is
created instead.
Needed since we store artifacts in `target/<arch>/<profile>`.
In particular this will let us use patches that apply to configure.ac.
I took a close look at how Debian builds the Python interpreter,
because I noticed it ran substantially faster than the one in nixpkgs
and I was curious why.

One thing that I found made a material difference in performance was
this pair of linker flags (passed to the compiler):

    -Wl,-O1 -Wl,-Bsymbolic-functions

In other words, effectively the linker gets passed the flags:

    -O1 -Bsymbolic-functions

Doing the same thing in nixpkgs turns out to make the interpreter
run about 6% faster, which is quite a big win for such an easy
change.  So, let's apply it.

---

I had not known there was a `-O1` flag for the *linker*!
But indeed there is.

These flags are unrelated to "link-time optimization" (LTO), despite
the latter's name.  LTO means doing classic compiler optimizations
on the actual code, at the linking step when it becomes possible to
do them with cross-object-file information.  These two flags, by
contrast, cause the linker to make certain optimizations within the
scope of its job as the linker.

Documentation is here, though sparse:
  https://sourceware.org/binutils/docs-2.31/ld/Options.html

The meaning of -O1 was explained in more detail in this LWN article:
  https://lwn.net/Articles/192624/
Apparently it makes the resulting symbol table use a bigger hash
table, so the load factor is smaller and lookups are faster.  Cool.

As for -Bsymbolic-functions, the documentation indicates that it's a
way of saving lookups through the symbol table entirely.  There can
apparently be situations where it changes the behavior of a program,
specifically if the program relies on linker tricks to provide
customization features:
  https://bugs.launchpad.net/ubuntu/+source/xfe/+bug/644645
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637184#35
But I'm pretty sure CPython doesn't permit that kind of trick: you
don't load a shared object that tries to redefine some symbol found
in the interpreter core.

The stronger reason I'm confident using -Bsymbolic-functions is
safe, though, is empirical.  Both Debian and Ubuntu have been
shipping a Python built this way since forever -- it was introduced
for the Python 2.4 and 2.5 in Ubuntu "hardy", and Debian "lenny",
released in 2008 and 2009.  In those 12 years they haven't seen a
need to drop this flag; and I've been unable to locate any reports
of trouble related to it, either on the Web in general or on the
Debian bug tracker.  (There are reports of a handful of other
programs breaking with it, but not Python/CPython.)  So that seems
like about as thorough testing as one could hope for.

---

As for the performance impact: I ran CPython upstream's preferred
benchmark suite, "pyperformance", in the same way as described in
the previous commit.  On top of that commit's change, the results
across the 60 benchmarks in the suite are:

The median is 6% faster.

The middle half (aka interquartile range) is from 4% to 8% faster.

Out of 60 benchmarks, 3 come out slower, by 1-4%.  At the other end,
5 are at least 10% faster, and one is 17% faster.

So, that's quite a material speedup!  I don't know how big the
effect of these flags is for other software; but certainly CPython
tends to do plenty of dynamic linking, as that's how it loads
extension modules, which are ubiquitous in the stdlib as well as
popular third-party libraries.  So perhaps that helps explain why
optimizing the dynamic linker has such an impact.
osmesa & gallium-nine are not needed for all systems. So this adds a
flag to disable them if you don’t want them.
/cc roundup #88306; the issue seems quite serious to me.

I also made two other patches non-conditional, as we rebuild
all platforms anyway.
It needs to be synced with sqlite itself.
@FRidh
Copy link
Member

FRidh commented Jun 11, 2020

Let's see how far we get with b8f7c6e. According to upstream Python 3.8 should work, its just those regressions since 3.8.1.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/upcoming-python-breakage/7554/11

@nixos-discourse

This comment has been minimized.

@domenkozar
Copy link
Member

How come patchelf bump is not here as it was in staging a few days ago?

@vcunat
Copy link
Member Author

vcunat commented Jun 11, 2020

That's because I started pre-building staging slightly before patchelf was pushed, so I used that commit. Patchelf was merged yesterday (CommitDate: Wed Jun 10 10:13:09 2020 +0200). EDIT: another reason for me was to avoid yet another slightly risky change in batch containing very important gnutls security fix (and also e.g. nss/firefox).

@jonringer
Copy link
Contributor

Yes ok I will look at it tomorrow morning. I am thinking of actually reverting the change to 3.8. There are so many applications also depending on aiohttp that it would mean many people will end up with both 3.7 and 3.8 on their system. cc @jonringer

We could revert the 3.8 bump in staging and push that change to the python-unstable branch if it makes stabilizing staging-next easier.

I can go through and pin all the applications that use aiohttp to python37. But in general I don't think we should put off the python38 bump much longer, it's been available for many months now.

@FRidh
Copy link
Member

FRidh commented Jun 12, 2020

Potential Python 3 fix for darwin #90174.

Let's wait and see with 3.8. I am actually quite confident that we won't get many issues due to aiohttp despite the failing tests.

Also, don't use autoreconfHook on Darwin with Python 3.
Darwin builds are still impure and fail with

    ld: warning: directory not found for option '-L/nix/store/6yhj9djska835wb6ylg46d2yw9dl0sjb-configd-osx-10.8.5/lib'
    ld: warning: directory not found for option '-L/nix/store/6yhj9djska835wb6ylg46d2yw9dl0sjb-configd-osx-10.8.5/lib'
    ld: warning: object file (/nix/store/0lsij4jl35bnhqhdzla8md6xiswgig5q-Libsystem-osx-10.12.6/lib/crt1.10.6.o) was built for newer OSX version (10.12) than being linked (10.6)
    DYLD_LIBRARY_PATH=/private/tmp/nix-build-python3-3.8.3.drv-0/Python-3.8.3 ./python.exe -E -S -m sysconfig --generate-posix-vars ;\
    if test $? -ne 0 ; then \
            echo "generate-posix-vars failed" ; \
            rm -f ./pybuilddir.txt ; \
            exit 1 ; \
    fi
    /nix/store/dsb7d4dwxk6bzlm845z2zx6wp9a8bqc1-bash-4.4-p23/bin/bash: line 5: 72015 Killed: 9               DYLD_LIBRARY_PATH=/private/tmp/nix-build-python3-3.8.3.drv-0/Python-3.8.3 ./python.exe -E -S -m sysconfig --generate-posix-vars
    generate-posix-vars failed
    make: *** [Makefile:592: pybuilddir.txt] Error 1
@vcunat
Copy link
Member Author

vcunat commented Jun 12, 2020

An interesting error in blender:

CMake Error at CMakeLists.txt:1517 (message):
  Missing:
  "/nix/store/n0b076p351ma864q38if4yglsg99hw2s-python3-3.8.3/include/python3.8m/Python.h",


  Set the cache entry 'PYTHON_INCLUDE_DIR' to point to a valid python include
  path.  Containing Python.h for python version "3.8"

Since Python 3.8 an `m` is no longer added when pymalloc is used.
https://bugs.python.org/issue36707
@vcunat
Copy link
Member Author

vcunat commented Jun 13, 2020

I see no real blocker so far in this iteration, but for darwin there are no builds yet.

@FRidh
Copy link
Member

FRidh commented Jun 13, 2020

That's good.

Note for the new iteration I would like to have #89723 in. It's only leaf packages that are left and that can be fixed on staging-next and master.

FRidh and others added 2 commits June 13, 2020 11:02
    ld: library not found for -lintl
    clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
    make[3]: *** [Makefile:544: libxsltmod.la] Error 1
and cause no rebuild elsewhere.
In commit 2988780 I forgot that patchelf makes no sense there.
Most of the changes are in already, but we referenced these commit
hashes from 20.03, so let's include them in the history.
@bhipple
Copy link
Contributor

bhipple commented Jun 14, 2020

Found a simple fix while continuing to troubleshoot towards #89746

#90357

@FRidh FRidh merged commit ff5f776 into master Jun 15, 2020
@nh2
Copy link
Contributor

nh2 commented Jun 15, 2020

With this merged, kombu (and thus e.g. cura) will fail with a version dependency error on amqp in https://hydra.nixos.org/build/122252840#tabs-details

(I noticed it because i accidentally tried to build master.)

@FRidh
Copy link
Member

FRidh commented Jun 15, 2020

Yes, as announced on Discourse we switched to Python 3.8 as default so there will be quite some regressions.

@ksevelyar
Copy link

Yes, as announced on Discourse we switched to Python 3.8 as default so there will be quite some regressions.

Is it possible to override python to 3.7 globally? Both Cura and Octoprint broken now.

@jonringer
Copy link
Contributor

I'll look into when I wake up.

but yes, it's possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet