Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make python interpreters manylinux aware #75757

Closed
wants to merge 2 commits into from

Conversation

gilligan
Copy link
Contributor

@gilligan gilligan commented Dec 16, 2019

Motivation for this change

Following up on #73866 this PR makes python interpreters announce manylinux compatibility when actually necessary (ie when a manylinux package is a build input). The way this works is:

  • manylinux packages set NIX_PYTHON_MANYLINUX in a setupHook
  • Python interepreters have a _manylinux.py file that checks the presence of the environment variable and its value and set the support level accordingly
Things done

Abandoned in favor of #75763

@FRidh @andir and me had a short chat on irc and agreed that the easiest way forward is to drop the _manylinux.py files completely. With the file gone pip will instead assume compatibility by default. It is now up to the user to select an appropriate manylinux package and add it to the respective derivation to ensure that all libraries required by the standard are available.

@gilligan gilligan requested a review from FRidh as a code owner December 16, 2019 12:31
@adisbladis adisbladis self-requested a review December 16, 2019 12:32
@gilligan gilligan changed the base branch from master to staging December 16, 2019 12:39
This adds a _manylinux.py file to the python interpreters which doesn't
simply enable or disable the support for manylinux but instead checks
the presence and value of the environment variable NIX_PYTHON_MANYLINUX.
Depending on the value (1/2010/2014) it sets manylinux1_compatible,
manylinux2010_compatible or manylinux2010_compatible.
This adds a setupHook to all manylinux packages which sets an
environment variable indicating the respectively supported manylinux
level. This environment variable can be used by interpreters to
identify that manylinux dependencies are actually available.
Copy link
Member

@zimbatm zimbatm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how this works so I can only provide with a superficial review.

I noticed the same block of code is copied 3 times. Are there any other places in nixpkgs where it might be used again?

@@ -234,8 +242,7 @@ in with passthru; stdenv.mkDerivation ({
ln -s $out/lib/${libPrefix}/pdb.py $out/bin/pdb${sourceVersion.major}.${sourceVersion.minor}
ln -s $out/share/man/man1/{python2.7.1.gz,python.1.gz}

# Python on Nix is not manylinux1 compatible. https://github.com/NixOS/nixpkgs/issues/18484
echo "manylinux1_compatible=False" >> $out/lib/${libPrefix}/_manylinux.py
ln -s ${manyLinux} $out/lib/${libPrefix}/_manylinux.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ln -s ${manyLinux} $out/lib/${libPrefix}/_manylinux.py
ln -s ${./_manylinux.py} $out/lib/${libPrefix}/_manylinux.py

Since the file has no dynamic components, it could be copied into nixpkgs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that duplication is an unfortunate result of the duplicated python build expressions. I have the feeling that this duplication is still okay. If there would be yet another we should think of a solution. Also newer manylinux specifications might not be valid for python2.

Regardless of that I think we should use cp instead of ln since otherwise we create a depenency on a single file output for everything that depends on python while it might as well be copied in there.

@FRidh
Copy link
Member

FRidh commented Dec 16, 2019

@zimbatm there are some parts in the various interpreters that can be factored out. These parts have grown over time so now it makes sense. That's out of scope of this PR though.

I think it is hard to decide on whether we should protect our users (by writing a _manylinux.py with False) or not. Environment variables give freedom, but they always leak. If we claim compatibility, all that happens is that when users use a manylinux wheel, it will fail during runtime. When building wheels there is no issue actually; in order to have them manylinux compatible you use the auditwheel utility which will notice it is not compatible and fail as it should.

All in all, I am leaning now more to removing _manylinux.py again. It makes sense when only dealing with pure builds (but we don't really need the info then either), but with impure it's harder, and at that point users are on their own anyway, so let's not hinder them.

@andir
Copy link
Member

andir commented Dec 16, 2019

@zimbatm there are some parts in the various interpreters that can be factored out. These parts have grown over time so now it makes sense. That's out of scope of this PR though.

I think it is hard to decide on whether we should protect our users (by writing a _manylinux.py with False) or not. Environment variables give freedom, but they always leak.
If we claim compatibility, all that happens is that when users use a manylinux wheel, it will fail during runtime. When building wheels there is no issue actually; in order to have them manylinux compatible you use the auditwheel utility which will notice it is not compatible and fail as it should.

What am I missing here?

I've seen a few build-time errors (due to missing many linux deps) while we were working on this. Since you have to patch the binaries (for example with the autoPatchElfHook) it will be pretty obvious if there are some library dependencies that were not provided. And of course you only will know if that works if you try to run the application. That is no different from current python packaging. You always have to execute the code and/or run tests to see if it works or just builds. (As with all/most other programming languages…)

An alternative approach would probably to provide a tailor made _manylinux.py with the manyLinuxPackages that pip then picks up during installation to figure out if manylinux is supported. I am not sure how much effort that would require and if it could work without patching pip (expressions). Idea?

@adisbladis
Copy link
Member

adisbladis commented Dec 16, 2019

All in all, I am leaning now more to removing _manylinux.py again. It makes sense when only dealing with pure builds, but with impure it's harder, and at that point users are on their own anyway, so let's not hinder them.

👍 This approach will simplify things so much.

@FRidh
Copy link
Member

FRidh commented Dec 16, 2019

it will be pretty obvious if there are some library dependencies that were not provided.

When you don't claim manylinux compatibility, it won't even try to use those wheels and build from source instead. Thus the failure happens during build-time instead of run-time.

An alternative approach would probably to provide a tailor made _manylinux.py with the manyLinuxPackages that pip then picks up during installation to figure out if manylinux is supported. I am not sure how much effort that would require and if it could work without patching pip (expressions). Idea?

The essential part to achieve that is in this PR.

@gilligan
Copy link
Contributor Author

If we claim compatibility, all that happens is that when users use a manylinux wheel, it will fail during runtime

In our experience that isn't actually true. What will happen is that autoPatchElfHook is going to fail and your entire build will fail.

All in all, I am leaning now more to removing _manylinux.py again. It makes sense when only dealing with pure builds (but we don't really need the info then either), but with impure it's harder, and at that point users are on their own anyway, so let's not hinder them.

I want to provide users the ability to make use of wheels. There are packages that only provide wheel files and no source at all. Is using source packages the preferred option from our point of view? Sure, but that doesn't mean we cannot or should not provide users with the ability to make use of the manylinux feature. If a wheel is built correctly according to the spec defined in the PEP it will work.

This change doesn't make python use wheel files in some sneaky fashion. The use of environment variables sure isn't air-tight but it doesn't seem likely to me that this would lead to random accidental misbehavior and runtime errors unless users explicitly set out to use some manylinux stuff that ends up failing.

@FRidh
Copy link
Member

FRidh commented Dec 16, 2019

In our experience that isn't actually true. What will happen is that autoPatchElfHook is going to fail and your entire build will fail.

If you use it in a nix-build, yes. I was talking about impurely using wheels.

So, I propose we:

  • remove _manylinux.py
  • have separate derivations for each spec that propagate the libs and autoPatchelfHook. These can be manylinux1, manylinux2010 and manylinux2014. The lists could then be put in a let...in.

Is there anything that is not considered?

@gilligan
Copy link
Contributor Author

@FRidh I'm being told via side-channels that i am missing your point about what you were suggesting, but your last comment (which i saw just now) makes it a bit clearer.

@andir andir closed this Dec 16, 2019
@andir andir deleted the manylinux-interp branch December 16, 2019 15:27
@andir
Copy link
Member

andir commented Dec 16, 2019

Lets continue this in #75763

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants