Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Python library wrappers #53816

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

timokau
Copy link
Member

@timokau timokau commented Jan 11, 2019

Motivation for this change

Some python libraries require specific environment variables to be set. One example are those that use gtk libraries. On most other distros, those environment variables would either be set in the global environment or the dependencies would be found in standard paths. Since that is not the case for nixos, we need to hardcode those environment variables into the libraries.

We have makeWrapper for binaries. Here I try (purely as a proof of concept, kept as simple as possible) to add a makePythonWrapper function that does the same for python libraries. In particular, this would make the gtk3 matplotlib backend usable without using nix-shell and without manually setting environment variables.

python3 -c 'import matplotlib.pyplot as p; p.switch_backend("gtk3agg")'

Should now succeed. This works because python always executes all parent modules before importing a module. So by adding some code to the top of the topmost module (taking care to leave module docstrings in tact), we can set environment variables as soon as a module is imported.

Some next steps/improvements would be:

  • factor out makePythonWrapper
  • provide the full functionality makeWrapper provides
  • automatically determine the right __init__.py to modify
  • provide a GTK setup hook or maybe adapt the existing one

What do you think?

CC @jtojnar @FRidh

(This is not about enabling the gtk backend in matplotlib by default, that is just for easier testing)

@FRidh
Copy link
Member

FRidh commented Jan 12, 2019

Note it's not a wrapper, but a function that injects code.

wrapPythonPrograms both wraps and injects code in executables. That functionality should be separated. Should we, then, wrap, or inject code? Injecting code is not easy to do correctly (see e.g. the magicalSedExpression, we need to consider shebangs and __future__ imports that occur before the docstring) but is in-line with non-Python code.

I think the direction we should go is that packages declare their requirements on the environment and what they offer. E.g., in this case, matplotlib could declare it requires GI_TYPELIB_PATH, and gtk3 declares it provides GI types. We then have e.g. a buildEnv "on steroids" that knows how to handle this. The buildEnv would go through a list of functions, a function per requirement, thereby building up the wrappers or code that needs to be injected. These functions could also be used directly when constructing a derivation.

@timokau
Copy link
Member Author

timokau commented Jan 12, 2019

Note it's not a wrapper, but a function that injects code.

Yeah I'm playing a bit loose with the terminology here. Although it may not be accurate to the letter, I think it communicates the intention and is in line with the naming of similar functions in nixpkgs.

wrapPythonPrograms both wraps and injects code in executables. That functionality should be separated. Should we, then, wrap, or inject code? Injecting code is not easy to do correctly (see e.g. the magicalSedExpression, we need to consider shebangs and __future__ imports that occur before the docstring) but is in-line with non-Python code.

Is wrapping any easier? How does one wrap a library module without changing the way it is imported and without messing with automated tooling detecting docs etc.?

I forgot about shebangs. How do those interact with module docstrings? The python docs state that they have to occur first in the file, are shebangs exempt from this?

Why do __future__ imports have to be handled differently?

I think the direction we should go is that packages declare their requirements on the environment and what they offer. E.g., in this case, matplotlib could declare it requires GI_TYPELIB_PATH, and gtk3 declares it provides GI types. We then have e.g. a buildEnv "on steroids" that knows how to handle this. The buildEnv would go through a list of functions, a function per requirement, thereby building up the wrappers or code that needs to be injected. These functions could also be used directly when constructing a derivation.

I agree, I was thinking in a similar direction. This would play nice with your "declarative wrapper" idea.

@FRidh
Copy link
Member

FRidh commented Jan 13, 2019

Is wrapping any easier?

Yes and no. In principle the utility we have for it makes it easy, however, before sys.argv[0] may change we can cause trouble (wrapper solution for this #25985).

How does one wrap a library module without changing the way it is imported and without messing with automated tooling detecting docs etc.?

A library can't be wrapped, we can only inject code in it. That's why we patch references to executables to make them absolute paths.

Why do __future__ imports have to be handled differently?

From the Python docs:

A future statement must appear near the top of the module. The only lines that can appear before a future statement are:

the module docstring (if any),
comments,
blank lines, and
other future statements.

The items are thus in the following order:

  1. shebang
  2. __future__ imports or docstrings
  3. other imports

When injecting code, it needs to be done after a __future__ import, but before other imports, and ignoring docstrings. In your code you use a certain docstring delimiter, but there are multiple ones. And we have to consider they can be nested as well.

@timokau
Copy link
Member Author

timokau commented Jan 13, 2019

How does one wrap a library module without changing the way it is imported and without messing with automated tooling detecting docs etc.?

A library can't be wrapped, we can only inject code in it. That's why we patch references to executables to make them absolute paths.

For the purposes of this PR only working for executables is kind of a non-starter.

Why do __future__ imports have to be handled differently?

From the Python docs:

A future statement must appear near the top of the module. The only lines that can appear before a future statement are:
the module docstring (if any),
comments,
blank lines, and
other future statements.

The items are thus in the following order:

1. shebang

2. `__future__` imports or docstrings

3. other imports

Oh I didn't know that. Though those were just normal imports.

When injecting code, it needs to be done after a __future__ import, but before other imports, and ignoring docstrings. In your code you use a certain docstring delimiter, but there are multiple ones. And we have to consider they can be nested as well.

I think I managed to put together a good solution using pythons ast module. That way we can run the injection with the proper python and always use the right ast parser. In contrast to regex / string compare based solution it should also be robust enough to be used basically everywhere (which we'd need when we want to automatically propagate env vars).

I've run out of time for now, but I'll polish and test that when I have more time.

@timokau
Copy link
Member Author

timokau commented Jan 14, 2019

@FRidh what do you think? https://github.com/timokau/python-inject

It injects a call to exec(), which makes it possible to insert arbitrary python code anywhere and execute it without polluting the environment.

@FRidh
Copy link
Member

FRidh commented Jan 15, 2019

@timokau that tool looks quite nice. I suppose we could put the code to inject in $out/nix-support, however, consider this can also be used for executables, I think it would be better to inject the actual code instead of a path, because there one might want to inject different things.

@timokau
Copy link
Member Author

timokau commented Jan 15, 2019

I don't understand the point about executables. Why wouldn't exec work there? Seems much more flexible to me in either case.

We could just use writeTextFile to store the file to exec in the nix store.

@FRidh
Copy link
Member

FRidh commented Jan 15, 2019

Then you're going to need two components, a Nix function that creates the file, and a bash function that injects the code.

@timokau
Copy link
Member Author

timokau commented Jan 15, 2019

Right. The point is that it is agnostic to where the file is stored. exec has two big advantages

  • completely separates the environment, which means we can use variables and do imports without potentially messing with the execution of the model
  • is a simple statement, which means it can be injected anywhere (from __future__ import print_function; print(42)) without worrying about semicolons vs. newlines, indentation etc.

Why would that be problematic in the binary case? It could still be put in nix-support or similar (I don't know if nix-support has special propagation semantics).

@timokau
Copy link
Member Author

timokau commented Jan 15, 2019

In which of the nix phases are the .pyc files generated?

@timokau
Copy link
Member Author

timokau commented Jun 22, 2019

This library might be useful: https://github.com/google/pasta

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/declarative-wrappers-another-idea/6661/1

@stale
Copy link

stale bot commented Jun 6, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 6, 2021
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/low-effort-and-high-impact-tasks/21095/41

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/terminal-emulator-leaks-environment-variables-to-shell/33673/11

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 1, 2023
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024
@wegank wegank marked this pull request as draft March 20, 2024 15:01
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants