New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentencepiece: split into multiple outputs, optional gperftools #81029
Conversation
Vendor the nixpkgs sentencepiece derivation with the modifications from: NixOS/nixpkgs#81029 This splits the sentenencepiece in several outputs, allowing sticker2 to depend on just the dynamic libraries. Moreover, we can disable support for gperftools, which reduces the closure size.
abfab47
to
90fcfc7
Compare
Why not explicitly select the output you want in |
Do you mean changing buildInputs = [ sentencepiece ]; to buildInputs = [ sentencepiece.dev ]; in the Python derivation? Don't they result in the same derivation? |
IIRC, no. different outputs will be exposed at different stages, so headers will be available during build, but not runtime. |
Thanks, I learned something new today 👍. I was under the wrong impression due to
Which I guess is true, because the outputs come from the same derivation. But for some reason this pretty printing led me to believe that they only evaluate to different things in certain contexts (e.g. in string interpolation). But of course, they are different:
I'll update the PR. |
I am using sentencepiece in a downstream application where I want to minimize the resulting closures. This commit makes changes to make sentencepiece a leaner dependency: - Split the outputs, so that the binaries/headers do not end up in the transitive closure in a library dependency. - Add the `withGPerfTools` option, which is enabled by default, to make it possible to disable the gperftools dependency. According to the sentencepiece README, this dependency gives a 10-40% performance improvement. But in many cases this is overshadowed by the neural networks that use piece identifiers as input anyway.
90fcfc7
to
8cf1cdd
Compare
Vendor the nixpkgs sentencepiece derivation with the modifications from: NixOS/nixpkgs#81029 This splits the sentenencepiece in several outputs, allowing sticker2 to depend on just the dynamic libraries. Moreover, we can disable support for gperftools, which reduces the closure size.
I forgot to @ you as a maintainer @pashashocky, sorry for that! |
Works for me if it works for Jon! Feel free to add yourself to the maintainers list too if you want. |
Thanks!
Ack. Since I now depend on this derivation for work stuff, I'll do that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[8 built, 38 copied (148.5 MiB), 16.2 MiB DL]
https://github.com/NixOS/nixpkgs/pull/81029
7 package built:
python27Packages.sentencepiece python27Packages.transformers python37Packages.sentencepiece python37Packages.transformers python38Packages.sentencepiece python38Packages.transformers sentencepiece
Motivation for this change
I am using sentencepiece in a downstream application where I want to
minimize the resulting closures. This commit makes changes to make
sentencepiece a leaner dependency:
Split the outputs, so that the binaries/headers do not end up in the
transitive closure in a library dependency.
Add the
withGPerfTools
option, which is enabled by default, tomake it possible to disable the gperftools dependency. According to
the sentencepiece README, this dependency gives a 10-40% performance
improvement. But in many cases this is overshadowed by the neural
networks that use piece identifiers as input anyway.
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)