Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentencepiece: split into multiple outputs, optional gperftools #81029

Merged
merged 2 commits into from Mar 14, 2020

Conversation

danieldk
Copy link
Contributor

@danieldk danieldk commented Feb 25, 2020

Motivation for this change

I am using sentencepiece in a downstream application where I want to
minimize the resulting closures. This commit makes changes to make
sentencepiece a leaner dependency:

  • Split the outputs, so that the binaries/headers do not end up in the
    transitive closure in a library dependency.

  • Add the withGPerfTools option, which is enabled by default, to
    make it possible to disable the gperftools dependency. According to
    the sentencepiece README, this dependency gives a 10-40% performance
    improvement. But in many cases this is overshadowed by the neural
    networks that use piece identifiers as input anyway.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

danieldk added a commit to stickeritis/nix-packages that referenced this pull request Feb 26, 2020
Vendor the nixpkgs sentencepiece derivation with the modifications
from:

NixOS/nixpkgs#81029

This splits the sentenencepiece in several outputs, allowing sticker2
to depend on just the dynamic libraries. Moreover, we can disable
support for gperftools, which reduces the closure size.
@danieldk danieldk requested a review from FRidh as a code owner February 26, 2020 10:33
@FRidh
Copy link
Member

FRidh commented Feb 26, 2020

Why not explicitly select the output you want in buildInputs?

@danieldk
Copy link
Contributor Author

Why not explicitly select the output you want in buildInputs?

Do you mean changing

buildInputs = [ sentencepiece ];

to

buildInputs = [ sentencepiece.dev ];

in the Python derivation? Don't they result in the same derivation?

@jonringer
Copy link
Contributor

in the Python derivation? Don't they result in the same derivation?

IIRC, no. different outputs will be exposed at different stages, so headers will be available during build, but not runtime.

@danieldk
Copy link
Contributor Author

danieldk commented Feb 28, 2020

in the Python derivation? Don't they result in the same derivation?

IIRC, no. different outputs will be exposed at different stages, so headers will be available during build, but not runtime.

Thanks, I learned something new today 👍. I was under the wrong impression due to nix repl:

nix-repl> sentencepiece
«derivation /nix/store/6ljlm4vp6mjr6p78w4bnia9k53kdjgrf-sentencepiece-0.1.85.drv»
nix-repl> sentencepiece.dev
«derivation /nix/store/6ljlm4vp6mjr6p78w4bnia9k53kdjgrf-sentencepiece-0.1.85.drv»

Which I guess is true, because the outputs come from the same derivation. But for some reason this pretty printing led me to believe that they only evaluate to different things in certain contexts (e.g. in string interpolation). But of course, they are different:

nix-repl> foo = sentencepiece.dev
nix-repl> foo
«derivation /nix/store/6ljlm4vp6mjr6p78w4bnia9k53kdjgrf-sentencepiece-0.1.85.drv»
nix-repl> foo.outPath
"/nix/store/xmza2ajh2gj0hdrq8j5lbqwvfbx5n7bi-sentencepiece-0.1.85-dev"

I'll update the PR.

I am using sentencepiece in a downstream application where I want to
minimize the resulting closures. This commit makes changes to make
sentencepiece a leaner dependency:

- Split the outputs, so that the binaries/headers do not end up in the
  transitive closure in a library dependency.

- Add the `withGPerfTools` option, which is enabled by default, to
  make it possible to disable the gperftools dependency. According to
  the sentencepiece README, this dependency gives a 10-40% performance
  improvement. But in many cases this is overshadowed by the neural
  networks that use piece identifiers as input anyway.
danieldk added a commit to stickeritis/nix-packages that referenced this pull request Mar 4, 2020
Vendor the nixpkgs sentencepiece derivation with the modifications
from:

NixOS/nixpkgs#81029

This splits the sentenencepiece in several outputs, allowing sticker2
to depend on just the dynamic libraries. Moreover, we can disable
support for gperftools, which reduces the closure size.
@danieldk
Copy link
Contributor Author

danieldk commented Mar 9, 2020

I forgot to @ you as a maintainer @pashashocky, sorry for that!

@pashashocky
Copy link
Contributor

I forgot to @ you as a maintainer @pashashocky, sorry for that!

Works for me if it works for Jon! Feel free to add yourself to the maintainers list too if you want.

@danieldk
Copy link
Contributor Author

danieldk commented Mar 9, 2020

I forgot to @ you as a maintainer @pashashocky, sorry for that!

Works for me if it works for Jon!

Thanks!

Feel free to add yourself to the maintainers list too if you want.

Ack. Since I now depend on this derivation for work stuff, I'll do that!

Copy link
Contributor

@jonringer jonringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

[8 built, 38 copied (148.5 MiB), 16.2 MiB DL]
https://github.com/NixOS/nixpkgs/pull/81029
7 package built:
python27Packages.sentencepiece python27Packages.transformers python37Packages.sentencepiece python37Packages.transformers python38Packages.sentencepiece python38Packages.transformers sentencepiece

@jonringer jonringer merged commit 7b3802d into NixOS:master Mar 14, 2020
@danieldk danieldk deleted the leaner-sentencepiece branch July 6, 2020 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants