Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: NixOS/nixpkgs
base: d2d070015b6d
Choose a base ref
...
head repository: NixOS/nixpkgs
compare: a2be64bf13bf
Choose a head ref
  • 5 commits
  • 5 files changed
  • 2 contributors

Commits on Mar 31, 2020

  1. cpython: Drop unrecognized --with-threads configure flag.

    The ./configure script prints a warning when passed this flag,
    starting with 3.7:
    
      configure: WARNING: unrecognized options: --with-threads
    
    The reason is that there's no longer such a thing as a build
    without threads.
    
    Eliminate the warning, by only passing the flag on the older releases
    that accept it.
    
    Upstream change and discussion:
      python/cpython@a6a4dc816
      https://bugs.python.org/issue31370
    gnprice committed Mar 31, 2020
    Copy the full SHA
    9d8831c View commit details
    Browse the repository at this point in the history

Commits on May 12, 2020

  1. cpython: Use --enable-optimizations, for a 16% speedup.

    Without this flag, the configure script prints a warning at the end,
    like this (reformatted):
    
      If you want a release build with all stable optimizations active
      (PGO, etc), please run ./configure --enable-optimizations
    
    We're doing a build to distribute to people for day-to-day use,
    doing things other than developing the Python interpreter.  So
    that's certainly a release build -- we're the target audience for
    this recommendation.
    
    ---
    
    And, trying it out, upstream isn't kidding!  I ran the standard
    benchmark suite that the CPython developers use for performance
    work, "pyperformance".  Following its usage instructions:
      https://pyperformance.readthedocs.io/usage.html
    I ran the whole suite, like so:
    
      $ nix-shell -p ./result."$variant" --run '
          cd $(mktemp -d); python -m venv venv; . venv/bin/activate
          pip install pyperformance
          pyperformance run -o ~/tmp/result.'"$variant"'.json
        '
    
    and then examined the results with commands like:
    
      $ python -m pyperf compare_to --table -G \
          ~/tmp/result.{$before,$after}.json
    
    Across all the benchmarks in the suite, the median speedup was 16%.
    (Meaning 1.16x faster; 14% less time).
    
    The middle half of them ranged from a 13% to a 22% speedup.
    
    Each of the 60 benchmarks in the suite got faster, by speedups
    ranging from 3% to 53%.
    
    ---
    
    One reason this isn't just the default to begin with is that, until
    recently, it made the build a lot slower.  What it does is turn on
    profile-guided optimization, which means first build for profiling,
    then run some task to get a profile, then build again using the
    profile.  And, short of further customization, the task it would use
    would be nearly the full test suite, which includes a lot of
    expensive and slow tests, and can easily take half an hour to run.
    
    Happily, in 2019 an upstream developer did the work to carefully
    select a more appropriate set of tests to use for the profile:
      python/cpython@4e16a4a31
      https://bugs.python.org/issue36044
    This suite takes just 2 minutes to run.  And the resulting final
    build is actually slightly faster than with the much longer suite,
    at least as measured by those standard "pyperformance" benchmarks.
    That work went into the 3.8 release, but the same list works great
    if used on older releases too.
    
    So, start passing that --enable-optimizations flag; and backport
    that good-for-PGO set of tests, so that we use it on all releases.
    gnprice committed May 12, 2020
    Copy the full SHA
    f8a8243 View commit details
    Browse the repository at this point in the history

Commits on May 14, 2020

  1. cpython: Use autoreconfHook to rebuild configure script.

    In particular this will let us use patches that apply to configure.ac.
    gnprice committed May 14, 2020
    Copy the full SHA
    52c04b0 View commit details
    Browse the repository at this point in the history
  2. cpython: Optimize dynamic symbol tables, for a 6% speedup.

    I took a close look at how Debian builds the Python interpreter,
    because I noticed it ran substantially faster than the one in nixpkgs
    and I was curious why.
    
    One thing that I found made a material difference in performance was
    this pair of linker flags (passed to the compiler):
    
        -Wl,-O1 -Wl,-Bsymbolic-functions
    
    In other words, effectively the linker gets passed the flags:
    
        -O1 -Bsymbolic-functions
    
    Doing the same thing in nixpkgs turns out to make the interpreter
    run about 6% faster, which is quite a big win for such an easy
    change.  So, let's apply it.
    
    ---
    
    I had not known there was a `-O1` flag for the *linker*!
    But indeed there is.
    
    These flags are unrelated to "link-time optimization" (LTO), despite
    the latter's name.  LTO means doing classic compiler optimizations
    on the actual code, at the linking step when it becomes possible to
    do them with cross-object-file information.  These two flags, by
    contrast, cause the linker to make certain optimizations within the
    scope of its job as the linker.
    
    Documentation is here, though sparse:
      https://sourceware.org/binutils/docs-2.31/ld/Options.html
    
    The meaning of -O1 was explained in more detail in this LWN article:
      https://lwn.net/Articles/192624/
    Apparently it makes the resulting symbol table use a bigger hash
    table, so the load factor is smaller and lookups are faster.  Cool.
    
    As for -Bsymbolic-functions, the documentation indicates that it's a
    way of saving lookups through the symbol table entirely.  There can
    apparently be situations where it changes the behavior of a program,
    specifically if the program relies on linker tricks to provide
    customization features:
      https://bugs.launchpad.net/ubuntu/+source/xfe/+bug/644645
      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637184#35
    But I'm pretty sure CPython doesn't permit that kind of trick: you
    don't load a shared object that tries to redefine some symbol found
    in the interpreter core.
    
    The stronger reason I'm confident using -Bsymbolic-functions is
    safe, though, is empirical.  Both Debian and Ubuntu have been
    shipping a Python built this way since forever -- it was introduced
    for the Python 2.4 and 2.5 in Ubuntu "hardy", and Debian "lenny",
    released in 2008 and 2009.  In those 12 years they haven't seen a
    need to drop this flag; and I've been unable to locate any reports
    of trouble related to it, either on the Web in general or on the
    Debian bug tracker.  (There are reports of a handful of other
    programs breaking with it, but not Python/CPython.)  So that seems
    like about as thorough testing as one could hope for.
    
    ---
    
    As for the performance impact: I ran CPython upstream's preferred
    benchmark suite, "pyperformance", in the same way as described in
    the previous commit.  On top of that commit's change, the results
    across the 60 benchmarks in the suite are:
    
    The median is 6% faster.
    
    The middle half (aka interquartile range) is from 4% to 8% faster.
    
    Out of 60 benchmarks, 3 come out slower, by 1-4%.  At the other end,
    5 are at least 10% faster, and one is 17% faster.
    
    So, that's quite a material speedup!  I don't know how big the
    effect of these flags is for other software; but certainly CPython
    tends to do plenty of dynamic linking, as that's how it loads
    extension modules, which are ubiquitous in the stdlib as well as
    popular third-party libraries.  So perhaps that helps explain why
    optimizing the dynamic linker has such an impact.
    gnprice committed May 14, 2020
    Copy the full SHA
    480c8d1 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2020

  1. Merge pull request #84072 from gnprice/python-build

    cpython: Use optimizations, for a 25% speedup.
    FRidh committed Jun 4, 2020
    Copy the full SHA
    a2be64b View commit details
    Browse the repository at this point in the history