Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: add sitecustomize.py, listen to NIX_PYTHONPATH and NIX_PYTHON_SCRIPT_NAME #25985

Closed
wants to merge 2 commits into from

Conversation

FRidh
Copy link
Member

@FRidh FRidh commented May 22, 2017

This commit adds a Nix-specific module that recursively adds paths that
are on NIX_PYTHONPATH to sys.path. In order to process possible
.pth files site.addsitedir is used.

The paths listed in PYTHONPATH are added to sys.path afterwards, but
they will be added before the entries we add here and thus take
precedence.

The reason for adding support for this environment variable is that we
can set it in a wrapper without breaking support for PYTHONPATH.

Additionally, this commit introduces NIX_PYTHON_SCRIPT_NAME which
can be used for setting sys.argv[0].

  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@mention-bot
Copy link

@FRidh, thanks for your PR! By analyzing the history of the files in this pull request, we identified @domenkozar, @vcunat, @edolstra and @FRidh to be potential reviewers.

@FRidh FRidh requested review from Mic92, lsix and abbradar May 22, 2017 09:08
@abbradar
Copy link
Member

What's the usecase? I'm afraid if this would lead to #17749.

@bjornfor
Copy link
Contributor

How about extending it to NIX_PYTHON${MAJOR}${MINOR}PATH?

I'm looking for something to improve this config snippet:

  environment.profileRelativeEnvVars.PYTHONPATH = [ "/lib/python2.7/site-packages" ];

@FRidh
Copy link
Member Author

FRidh commented May 22, 2017

What's the usecase? I'm afraid if this would lead to #17749.

In the PR you referred to you initially proposed using set PYTHONPATH. I was against that since that would remove the possibility to use PYTHONPATH as a user, it would have broken importing modules with ipython and nix-shell, and it would have broken the checkPhase. If we, instead, use NIX_PYTHONPATH in Nixpkgs, then users can still use PYTHONPATH.

How about extending it to NIX_PYTHON${MAJOR}${MINOR}PATH?

I've thought of that before, but I don't think it is much of an improvement. It works, as long as we assume that all site-packages of a certain Python interpreter version should also be used together. This may not be the case. Consider e.g. if we were still using --prefix PYTHONPATH. In this case that could be NIX_PYTHONPATH35PATH. Now, we have a Python 3.5 application calling another Python 3.5 application, but both use different versions of the same library. Having versioned PYTHONPATH wouldn't solve this issue.

Instead, I would like to add providesPythonModules = pythonDerivation; Nix attribute to derivations that provide Python modules for that specific Python derivation. We could then use Nix to create a PYTHONPATH instead of our current setupHook (although we likely would still have to add $out ourselves.

@abbradar
Copy link
Member

@FRidh Indeed! An idea: we could also move our argv[0] handling to this module.

I'll review this properly a bit later today.

@FRidh
Copy link
Member Author

FRidh commented May 22, 2017

An idea: we could also move our argv[0] handling to this module.

That was also my idea! 😄 Unfortunately it is not possible though, because at that time sys.argv isn't available/setup yet.

@FRidh
Copy link
Member Author

FRidh commented May 22, 2017

ohh, but it is possible to use sys.path_hooks with sitecustomize.py...
https://mail.python.org/pipermail/python-list/2016-June/710183.html

Copy link
Member

@abbradar abbradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite intrigued by this -- can you try using sys.path_hooks? If done correctly we can finally completely avoid patching Python programs. Imagine -- no more magicalSedExpression! :D

@FRidh
Copy link
Member Author

FRidh commented May 23, 2017

@abbradar I checked out sys.path_hook. Unfortunately that's not going to work either.

See the following code

import sys

def argv_setter_hook(path):
    import imp
    sys = imp.reload(sys)
    sys.argv[0] = os.environ.get('NIX_PROGRAM_NAME', sys.argv[0])
    raise ImportError # let the real import machinery do its work

sys.path_hooks[:0] = [argv_setter_hook]

Because sys doesn't yet have sys.argv we need to reload it in the hook. However, when we try to reload it we get an infinite recursion, I think because the hook is called whenever a module is imported.

Moving import imp and import sys entirely out of the hook results in a UnboundLocalError: local variable 'sys' referenced before assignment, which is quite funny because considering scope, sys would normally have to be available inside the hook. I guess Python still acts a bit different at this point 😄

Therefore, if imports outside the hook are not considered, then I might as well just import sys there again. Unfortunately, I still got the same version of sys which doesn't yet have argv. So, I tried again using imp.reload

def argv_setter_hook(path):
    import sys
    import imp
    sys = imp.reload(sys)
    sys.argv[0] = os.environ.get('NIX_PROGRAM_NAME', sys.argv[0])
    raise ImportError # let the real import machinery do its work

sys.path_hooks[:0] = [argv_setter_hook]

but that resulted in a stack overflow.

It seems we'll be stuck with our magicalSedExpression for a bit longer.

@abbradar
Copy link
Member

@FRidh I succeeded doing this with the code provided in the mailing list;

import sys

edit_done = False
def argv_setter_hook(path):
    global edit_done
    if edit_done:
        return
    try:
        argv = sys.argv
    except AttributeError:
        pass
    else:
        print("Success!")
        argv.append("/foo/bar")
        edit_done = True
    raise ImportError # let the real import machinery do its work

sys.path_hooks[:0] = [argv_setter_hook]

I placed this into ~/.local/lib/python3.5/site-packages and voila:

 ~  python3 
Python 3.5.3 (default, Jan 17 2017, 07:57:56) 
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Success!
>>> import sys
>>> sys.argv
['', '/foo/bar']
>>> 

Maybe when it's called from home directory sys.argv is already set up somehow?...

@FRidh
Copy link
Member Author

FRidh commented May 23, 2017

@abbradar interesting. I didn't try the user site-packages, but instead modified the global sitecustomize.py.

So having the following in the user or system sitecustomize.py works:

import sys
import os

def argv_setter_hook(path):
    try:
        argv = sys.argv
    except AttributeError:
        pass
    else:
        argv[0] = os.environ.get('NIX_PROGRAM_NAME', argv[0])
    raise ImportError # let the real import machinery do its work

sys.path_hooks[:0] = [argv_setter_hook]

Putting this together with the code from 4ee6ee37600c61df53aa7c51458ebdd13d40f3c7 doesn't work. However, if I remove 4ee6ee37600c61df53aa7c51458ebdd13d40f3c7 then this does work.

@abbradar
Copy link
Member

Yay, success!

import sys

def argv_setter_hook(path):
    if hasattr(sys, 'argv'):
        # Remove this hook
        sys.path_hooks.remove(argv_setter_hook) 

        import os
        import functools
        import site

        sys.argv[0] = os.environ.get('NIX_PROGRAM_NAME', sys.argv[0])
        paths = os.environ.get('NIX_PYTHONPATH', '').split(':')
        functools.reduce(lambda k, p: site.addsitedir(p, k), paths, site._init_pathinfo())

    raise ImportError # Let the real import machinery do its work

sys.path_hooks[:0] = [argv_setter_hook]

Notice how infinite recursion is avoided by removing the hook after argv is found.

@FRidh
Copy link
Member Author

FRidh commented May 23, 2017

Yay indeed! This is neat.

The question is now how we look after these two env variables regarding leaking.

  • if the script calls itself, then it will now refer to the correct script and would therefore no longer need these env vars. We could therefore clean them.
  • if the script calls another script that is in libexec, then it would likely want NIX_PYTHONPATH but not NIX_PROGRAM_NAME because its value is incorrect. An example is hplip. However, if they call themselves, they would have to be wrapped as well. This remains tricky when the code is both a script and an importable module.
  • if the script calls another script from another derivation, then we shouldn't leak any of the env vars. An example is a Python 3 tool like Nuitka calling the Python 2-only SCons.

@abbradar
Copy link
Member

@FRidh Hmmm, indeed. I don't see any better way than patching Python modules in the case scripts reimport themselves (like before). We definitely don't want to leak both of those variables to child processes.

@abbradar
Copy link
Member

abbradar commented May 23, 2017

@FRidh I feel stupid: actually if Python script reimports itself then all paths and fixed sys.argv[0] are already in place since it's the same Python process! And if it restarts itself as a process hopefully it uses its sys.argv[0] for that. So we have all cases covered apart from rare ones when a script would try to rerun itself using __file__ or something.

@FRidh
Copy link
Member Author

FRidh commented May 24, 2017

I've added two more commits

  1. one where wrapPythonPrograms only wraps items in $out/bin. Sometimes we have scripts elsewhere, which might break due to this change. Then again, sometimes we have scripts that also work as modules. I'm not entirely sure about this change.
  2. use the new NIX_PYTHONPATH in our builder

@abbradar when you say covered, do you mean our current implementation, or what we've been discussing here?

@abbradar
Copy link
Member

@FRidh I mean what we have discussed. Let's again consider different cases when we use NIX_PROGRAM_NAME and unset both it and NIX_PYTHONPATH after reading to avoid env polution:

  1. scipt calls itself via sys.argv[0] -- covered as environment variables would be set again by wrappers;
  2. script reimports itself in Python (but not from /bin) -- covered as sys.argv[0] and sys.path will already be modified;
  3. script runs another one in /libexec -- we need to wrap script in libexec but then it should be okay;
  4. script imports another one in /libexec -- same as (2);
  5. script sometimes runs something from the same path and sometimes imports it -- ...why would one do that? Either way we'll need patching then.

So it seems that we can move to NIX_PROGRAM_NAME and finally drop sed-patching (it may help with (5) so we could leave the magicalSedExpression for those cases if there are any). I could certainly miss some case though, the world of Python is a world of wonders :D

@FRidh
Copy link
Member Author

FRidh commented May 24, 2017

  1. script sometimes runs something from the same path and sometimes imports it -- ...why would one do that? Either way we'll need patching then.

Yea, that's the really messy case. Luckily you won't find these in /bin though.

But now the case where I think this might break:

  1. a wrapped script calls another, unwrapped, script. In this case the unwrapped script will update its sys.argv[0] to the NIX_PROGRAM_NAME that was included for its parent process. How do we prevent that from happening? The parent therefore has to unset NIX_PROGRAM_NAME.

Unsetting is supported, we just need to test whether it works for us.

If the platform supports the unsetenv() function, you can delete items in this mapping to unset environment variables. unsetenv() will be called automatically when an item is deleted from os.environ, and when one of the pop() or clear() methods is called.

https://docs.python.org/2.7/library/os.html#os.environ
https://docs.python.org/3.5/library/os.html#os.environ

@abbradar
Copy link
Member

abbradar commented May 25, 2017

@FRidh I think I have possibly found an actual case for (5): meson. See 0ee8160. sigh

(That's not strictly (5), it's just that sometimes meson is run by user and sometimes as python ${meson}/bin/meson IIUC)

@abbradar
Copy link
Member

I have a new idea on how can we make this work: what if we used custom Python wrappers? Like this:

  1. For a given derivation, we compute full NIX_PYTHONPATH like in your latest patchset;
  2. We then do makeWrapper ${python}/bin/python $out/python-wrapper --set NIX_PYTHONPATH $NIX_PYTHONPATH;
  3. We patch all Python scripts' shebangs to point to this interpreter.

Advantages:

  • Covers cases when you run scripts directly and when you import them;
  • No patching;

Disadvantages:

  • Doesn't cover a new case that I mentioned before -- when someone runs scripts by passing them to Python, like in Meson case.

Not sure how this could be fixed so it seems we are at least partially stuck with sed patching...Any idea on how to improve upon this?

@FRidh
Copy link
Member Author

FRidh commented May 28, 2017

@abbradar nice idea, but I think you're missing a couple of cases here.

  • sometimes we have to add a wrapper to extend e.g. PATH, thereby changing the sys.argv[0] of the script. This needs to be done for scripts in both applications and in libraries, and will thereby require either patching with magicalSedExpression or NIX_PROGRAM_NAME. Then again, when is it we really need this. Instead of extending PATH we could require patching the code to point to a program.
  • how will this work with a python.buildEnv? We set PYTHONHOME but again, sys.argv[0] is wrong.

A nice test for the latter is ipython.

$ nix-shell -p 'python3.withPackages(ps: with ps; [ipython])' --pure --run "ipython -c 'import sys; print(sys.argv[0])'"
/nix/store/h1cs1hp2amswdvidl99n7vhmb3lv1v0h-python3.5-ipython-6.0.0/bin/ipython

shows that sys.argv[0] points to the ipython derivation instead of the env.

In any case, I think the Python wrapper you suggest here we could have also as a "lightweight" alternative to the current python.buildEnv.

About NIX_PROGRAM_NAME. Maybe we should call it NIX_PYTHON_SCRIPT_NAME for now. To somewhat prevent leaking, we could match the current sys.argv[0] to NIX_PYTHON_SCRIPT_NAME, requiring that at least a part of the name matches before actually setting sys.argv[0].

@FRidh FRidh requested a review from ryantm as a code owner December 30, 2018 16:17
@FRidh FRidh changed the title Python: add sitecustomize.py, listen to NIX_PYTHONPATH Python: add sitecustomize.py, listen to NIX_PYTHONPATH and NIX_PYTHON_SCRIPT_NAME Dec 30, 2018
@FRidh FRidh requested a review from thoughtpolice as a code owner July 9, 2019 13:45
FRidh added 2 commits July 9, 2019 15:47
…_SCRIPT_NAME

This commit adds a Nix-specific module that recursively adds paths that
are on `NIX_PYTHONPATH` to `sys.path`. In order to process possible
`.pth` files `site.addsitedir` is used.

The paths listed in `PYTHONPATH` are added to `sys.path` afterwards, but
they will be added before the entries we add here and thus take
precedence.

The reason for adding support for this environment variable is that we
can set it in a wrapper without breaking support for `PYTHONPATH`.

Additionally, this commit introduces `NIX_PYTHON_SCRIPT_NAME` which
can be used for setting `sys.argv[0]`.
@FRidh
Copy link
Member Author

FRidh commented Jul 11, 2019

The NIX_PYTHONPATH part is now in #64634.

@stale
Copy link

stale bot commented Jun 1, 2020

Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the
    related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on
    irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 1, 2020
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 3, 2020
@dotlambda dotlambda mentioned this pull request Mar 20, 2021
10 tasks
@stale
Copy link

stale bot commented Jun 9, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 9, 2021
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label May 4, 2022
@lucasew
Copy link
Contributor

lucasew commented Aug 24, 2022

Closing as it is basically abandoned, or solved by another PR.

@lucasew lucasew closed this Aug 24, 2022
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/terminal-emulator-leaks-environment-variables-to-shell/33673/11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet