Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeBinaryWrapper: create binary wrappers #95569

Closed
wants to merge 2 commits into from

Conversation

FRidh
Copy link
Member

@FRidh FRidh commented Aug 16, 2020

When packaging programs we often need to change the environment the program is run in.
Typically this is done using wrappers. Thus far, the wrappers produced were typically
shell scripts, thus interpreted, and with a shebang. This is an issue on e.g. Darwin,
where shebangs cannot point at interpreted scripts.

This commit introduces a new program for creating binary wrappers. The program, written
currently in Python, produces an instruction in JSON which is loaded into a Nim program
and then compiled into a binary.

With interpreted wrappers one can easily check the code to see what it is doing, but with
a binary that is more difficult. To that end, an environment variable NIX_DEBUG_WRAPPER
is introduced which, when set, causes an executed wrapper to print the embedded JSON
instead of exec'ing into the wrapped executable.

The Python program aims to be compatible with the existing makeWrapper but is lacking
features, mostly because of difficulties with implementing them using argparse.

Relevant issues

To do

  • documentation
  • tests
  • check cross-compilation
  • fix prefix/suffix
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@FRidh
Copy link
Member Author

FRidh commented Aug 16, 2020

I would have preferred writing the Python part in Nim as well, but did not because I dislike using optparse-like parsing. However, it seems if we want to achieve full compatibility with makeWrapper there is really no choice.

Note the wrappers are around 83 kB. It takes about 1.6 seconds wall clock time to produce a wrapper.

let wrapper = to(wrapperDescription, Wrapper)
processEnvironment(wrapper.environment)
let argv = wrapper.original # convert target to cstring
let argc = allocCStringArray(wrapper.flags)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags don't work yet


# Wrapper type as used by the wrapper-generation code as well in the actual wrapper.

type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON created for a wrapper needs to follow this exact structure.

@FRidh FRidh mentioned this pull request Aug 16, 2020
10 tasks
Copy link
Contributor

@doronbehar doronbehar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good. But I'm not sure I fully figured out what is the final result - is it a python depending executable?

type
SetVar* = object
variable*: string
value*: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like it to be possible to SetVar multiple values from an array in wrapper.json.

Copy link
Member Author

@FRidh FRidh Aug 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it is now you can have

    "environment": {
        "set": [
            {
                "variable": "HELLO",
                "value": "foo"
            },
            {
                "variable": "HELLO",
                "value": "bar"
            }
        ]
    }

Thus, order will matter.

Am I correct you are suggesting you want the following?

    "environment": {
        "set": [
            {
                "variable": "HELLO",
                "values": [ 
                    "foo",
                    "bar"
                ]
            }
        ]
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I want it to be an array, and also there should be a separator key as well, because some env vars use e.g ; and not :. : should be the default though.

Am I correct to understand there's both SetVar and environment.set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment.set is a sequence of SetVar objects where SetVar has the fields variable and value.

I see what you're after. What I did here is basically take the makeWrapper arguments as is. If you want export HELLO="foo:bar" then in case of makeWrapper write --set HELLO "foo:bar", and so you have variable HELLO and value foo:bar. In that case one does not need to know what separator is used.

@doronbehar
Copy link
Contributor

doronbehar commented Aug 16, 2020

So I understand that makeBinaryWrapper is a shell command that should be accessible just like makeWrapper and wrapProgram. If so, the whole design looks too complicated to me, at least for the purpose of declarative wrappers' current POC design (#85103).

There, I calculated the arguments to makeWrapper, but a significant part of the whole idea is to create a separate derivation of a wrapper that will call the unwrapped executable from it's original drv. What if we'd implement / integrate this binary wrapper creator as a Nix builder, and not as a shell command available during a build? It should make your life easier since you could read and write the JSON wrap data via Nix and not Python (assuming I figured out your design). Plus, it should enable you to construct the Nim code or whatever you'd choose to be compiled as the wrapper, via Nix replacement during build.

Please note that another significant advantage of using a separate derivation (examples include #88620 & #88110) for a wrapper is the reduce of rebuilds due to a request to change the environment.

I also think we should be able to get along with C as the language with which we'll compile the wrapper - that's more minimal but considering most of us have stdenv in our closure and we don't have Nim, it may come about quicker to edit wrapper derivations if building them only requires stdenv.

What I imagine as a binary wrapper creator, is a Nix builder that writes during build a super dumb C program with env vars given by a Nix. Super roughly:

let
  # overly simplified
  env.NIX_PYTHONPATH = "$out/${python.sitePackages}";
in
  buildCommand = ''
    echo 'setenv("NIX_PYTHONPATH", ${env.NIX_PYTHONPATH})' > wrapper.c
    echo 'exec("${placeholder "out"}/bin/.unwrapped")' > wrapper.c
    gcc wrapper.c -o $out/bin/wrapper
  '';

EDIT1: Edited nix expression to be a bit more real.
EDIT2: Naturally, for more complex env settings such as what --prefix implements, a bit more complex getenv and then setenv should be used I guess...

@FRidh
Copy link
Member Author

FRidh commented Aug 16, 2020

The Nim part is the actual wrapper code. It could have been C, but I find Nim much easier to work with. It's also lightweight and fast so no issue regarding bootstrapping. Closure size is slightly larger than just the C compiler, but the difference is small (36 MB).

The Python part invokes the compilation of the binary wrapper. Some code needs to invoke the compiler, and that's done there now. It also checks the instruction.

The Python part also offers the makeWrapper program which aims to be compatible with the non-binary wrapper creation shell function makeWrapper. Next step is to introduce another executable that takes a JSON instruction for a set of wrappers to be created, the idea here being that hooks have created that JSON instruction.

So I understand that makeBinaryWrapper is a shell command that should be accessible just like makeWrapper and wrapProgram. If so, the whole design looks too complicated to me, at least for the purpose of declarative wrappers' current POC design (#85103).

This is about creating the actual wrappers, not on how we from a derivation, instruct how they should be created. This is solely to get a binary wrapper built. Declarative wrappers is a separate part that is further on top, which eventually leads to the JSON instruction describing all wrappers to be build.

What I imagine as a binary wrapper creator, is a Nix builder that writes during build a super dumb C program with env vars given by a Nix. Super roughly:

In any derivation you will have typically more than one wrapper that needs to be created. Each of these wrappers may need a different instruction.

@doronbehar
Copy link
Contributor

In any derivation you will have typically more than one wrapper that needs to be created. Each of these wrappers may need a different instruction.

I understand. But still, even with different environment and args to different executables, it's all about constructing the right buildCommand - You can instruct what wrapping do be done either via Nix, before buildCommand is evaluated, or you can do that with makeBinWrapper at the buildCommand itself.

Declarative wrappers is a separate part that is further on top, which eventually leads to the JSON instruction describing all wrappers to be build.

Indeed, in it's current design, this PR shouldn't necessarily relate or depend upon declarative wrappers and this fact might help it reach reception easier, especially due to having the same interface as makeWrapper which is already well known. But, I still think a Nix builder constructing a proper buildCommand is better, although such builder's usage might intimidate casual contributors, due to the lack of Nix programming experience in the community.

When packaging programs we often need to change the environment the program is run in.
Typically this is done using wrappers. Thus far, the wrappers produced were typically
shell scripts, thus interpreted, and with a shebang. This is an issue on e.g. Darwin,
where shebangs cannot point at interpreted scripts.

This commit introduces a new program for creating binary wrappers. The program, written
currently in Python, produces an instruction in JSON which is loaded into a Nim program
and then compiled into a binary.

With interpreted wrappers one can easily check the code to see what it is doing, but with
a binary that is more difficult. To that end, an environment variable `NIX_DEBUG_WRAPPER`
is introduced which, when set, causes an executed wrapper to print the embedded JSON
instead of exec'ing into the wrapped executable.

The Python program aims to be compatible with the existing `makeWrapper` but is lacking
features, mostly because of difficulties with implementing them using `argparse`.
For `makeBinaryWrapper` we preferably keep the closure small so it can
be used early on during bootstrapping.
boehmgc, sfml, sqlite }:
{ stdenv, lib, fetchurl, makeWrapper, openssl, pcre, readline
, boehmgc, sfml, sqlite
, minimal ? false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-compiling the minimal nim is currently broken. I think I went to far with dropping dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the Nim compiler for Nixpkgs cross-compilation is broken generally. We need to use a wrapper with compiler, but at least we can reuse a single buildPackages compiler binary for different host platforms. I will make a PR.

@andrew-d
Copy link
Contributor

FWIW, I'm pretty 👎 on introducing another language into a core part of Nixpkgs like this, especially one that is fairly niche like Nim - presumably this means that it would need to be supported/ported/etc. on new platforms, in addition to what we already support.

If Nim is indeed that much easier to use (haven't tried it personally!), what do you think about instead checking in the generated .c files? That way you can continue to use Nim (or whatever) to generate them, but we won't need to depend on it at build time.

"-lpcre"
"-lreadline"
"-lsqlite3"
]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me, if these libraries aren't linked into the compiler explicitly the compiler will still potentially load the libraries at runtime with dlopen. Is it possible to leave them in or does that create a dependency loop? Ref #95692

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as we would use it to replace makeWrapper it would create a dependency loop. Now, we deal with those already during bootstrapping, but if we can avoid it that would be good.

I prefer to get the build-time closure down in size because that's more convenient during bootstrapping, however, all of these packages are built very early on already anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW sfml is not needed, I think thats an artifact from when we ran the compiler and stdlib tests.

echo(jsonString)
else:
# Unfortunately parsing JSON during compile-time is not supported.
let wrapperDescription = parseJson(jsonString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Npeg library could parse the environment at compile time - https://github.com/zevv/npeg

Copy link
Contributor

@ehmry ehmry Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… or just generate Nim code directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate what you mean with generating Nim code here directly? It's important that a generated wrapper does what it should do, but that we can also still check how it was configured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean generating a const Nim tuple of arrays rather than JSON. I was just thinking that the wrapper shouldn't neeed to do much heap allocation.

@Profpatsch
Copy link
Member

Can you give some insight in why you chose nim over, like C/C++/rust/go for this rather base feature?
As far as I know we don’t have any nim code in any stdenv-related building blocks so far, so this adds another bootstrapping requirement to nixpkgs.

@FRidh
Copy link
Member Author

FRidh commented Sep 9, 2020

Can you give some insight in why you chose nim over, like C/C++/rust/go for this rather base feature?
As far as I know we don’t have any nim code in any stdenv-related building blocks so far, so this adds another bootstrapping requirement to nixpkgs.

I want to solve an issue that is recurring using the tools and knowledge I have or can use. For my other projects I favor Nim over the others, and I also think in Nixpkgs it would be very valuable (mostly because of NimScript). For in Nixpkgs I don't care what language this functionality is implemented in, although it would for sure be beneficial if more are familiar with it. I won't work with the other languages myself though because I don't have the time for that.

Regarding makeBinaryWrapper, I am actually thinking of just having it in a separate repo and fetching it like we do any other tool. There is no reason for this to be Nixpkgs-specific. See also patchelf.

@ehmry
Copy link
Contributor

ehmry commented Sep 9, 2020

Nim may also be more portable (or cheaper to maintain portability) than Rust or Go, because the runtime is simple and stable.

@Profpatsch
Copy link
Member

Profpatsch commented Sep 9, 2020

With interpreted wrappers one can easily check the code to see what it is doing, but with
a binary that is more difficult. To that end, an environment variable NIX_DEBUG_WRAPPER
is introduced which, when set, causes an executed wrapper to print the embedded JSON
instead of exec'ing into the wrapped executable.

I like this idea, why not exec nonetheless? Sounds like it would be hard to get the trace of exactly the tool you need to debug if there is a chain of wrapped executables and it stops at the first.


A general point: from what I gathered in the past, the nested shebang problem only appears on Darwin (and some BSDs, but we don’t support them officially), not on Linux. Given there is a substantial decrease in debuggability with compiled wrappers, it would make sense to me to only use compiled wrappers when they are strictly necessary.

@FRidh
Copy link
Member Author

FRidh commented Sep 9, 2020

I like this idea, why not exec nonetheless? Sounds like it would be hard to get the trace of exactly the tool you need to debug if there is a chain of wrapped executables and it stops at the first.

That's a good point.

A general point: from what I gathered in the past, the nested shebang problem only appears on Darwin (and some BSDs, but we don’t support them officially), not on Linux. Given there is a substantial decrease in debuggability with compiled wrappers, it would make sense to me to only use compiled wrappers when they are strictly necessary.

There are two issues:

  1. darwin cannot execute nested shebangs
  2. process name typically cannot be changed with interpreted programs. E.g., for Python we use a custom environment variable.

Now that I am writing this, I don't think it would actually solve 2.

@Profpatsch
Copy link
Member

2\. process name typically cannot be changed with interpreted programs. E.g., for Python we use a custom environment variable.

Now that I am writing this, I don't think it would actually solve 2.

the process name is set via argv[0], right? And that is set in the execve(2) call coming from the caller.

2\. E.g., for Python we use a custom environment variable.

Do you have a reference to where you set that (and why)?

@Profpatsch
Copy link
Member

Can you give some insight in why you chose nim over, like C/C++/rust/go for this rather base feature?

For the record, I think because of the nice split we can go forward with nim right now and rewrite it to a plain C program later if necessary, provided we keep the scope small.

@Profpatsch
Copy link
Member

Do we have any explicit blockers on this PR? I’d love to see this merged.

@FRidh
Copy link
Member Author

FRidh commented Sep 23, 2020

I've moved the code into a flake at https://github.com/FRidh/make-binary-wrapper. We can of course still package it here as well. Will need to update this PR then.

Do you have a reference to where you set that (and why)?

In a derivation using buildPythonPackage we set sys.argv[0] to the absolute path of the executable. We do this by injecting a piece of code. This is part of wrapPython.

In a python.buildEnv (and python.withPackages) derivation we need to wrap the existing executables. I thought we set sys.argv[0] in sitecustomize.py (pkgs/development/interpreters/python/sitecustomize.py) through an environment variable, but turns out I am wrong. We set only sys.executable and sys.prefix. This part really needs to be revised but that's another topic.

@Profpatsch
Copy link
Member

We can of course still package it here as well. Will need to update this PR then.

I would love to merge this.

@stale
Copy link

stale bot commented Apr 7, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 7, 2021
@ehmry ehmry added the 6.topic: nim Nim programing language label Sep 3, 2021
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 3, 2021
@roberth
Copy link
Member

roberth commented Sep 30, 2021

#124556 seems to be a simpler solution with fewer dependencies, so it can be used more widely in Nixpkgs; including places where a minimal number of (bootstrapping) dependencies is desirable. For instance, it could feasibly be added to stdenv, whereas this could not, because it has Nim in the derivation closure.
Considering that this PR has been stale, wip, with open todos, and we have a more viable successor, I'll close it for now.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/low-effort-and-high-impact-tasks/21095/41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants