-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3Packages.tokenizers: init at 0.8.0 #91342
Conversation
@GrahamcOfBorg build python3Packages.tokenizers |
Also cc @pashashocky , since they are the maintainer of the tranformers derivation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to improve upon expression, so LGTM
able to be imported fine from python
https://github.com/NixOS/nixpkgs/pull/91342
2 packages built:
python37Packages.tokenizers python38Packages.tokenizers
still would like to see @FRidh 's thoughts on this
Added one more |
@FRidh does this look good to you? Would be nice to be able to update the transformers package, since it's trailing behind quite a bit. |
Huggingface transformers 3.0.0 is out, together with tokenizers 0.8.0. So, I have updated this PR to 0.8.0. |
I had a similar issue when packaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving the package disable to the tokenizers file, in keeping with the style of the python-packages.nix file.
I find it highly preferable to build from source if possible. Also, the derivation can be simplified once tokenizers switches to pyo3 0.11.0, since it does not require Rust nightly. |
Agree about build from source. I didn't realize that the path of building via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Diff LGTM (except minor comments below)
- Commits LGTM
- Builds clean
- Tests clean with ~
python; import tokenizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. You may want to use buildRustPackage
to build the wheel, and then pass that wheel to buildPythonPackage
, but there is no need for that.
02eafdd
to
e4d16bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(repeat of above checklist)
- Diff LGTM
- Commits LGTM
- Builds clean
- Tests clean with ~
python; import tokenizers
Thank you for doing these explicit approvals, it makes it so much easier to filter on what may be merged and what not. |
Motivation for this change
I wanted to update the
transformers
Python package. However, newer versions require thetokenizers
Python package. This package is quite interesting by itself:pyo3
crate for binding Python.pyo3
needs Rust nightly.RUSTC_BOOTSTRAP
, sincepyo3
s build script checks the actual version.So, to actually make this work I chose to:
buildRustPackage
as the main driver, so that Rust crates are properly vendored.maturin
to build a Python wheel from the combined Rust + Python sources.pip
to install the wheel in the right location.I had to patch the build script of the vendored
pyo3
to remove the explicit Rust version check or it is not happy with theRUSTC_BOOTSTRAP
cheat code.As icing on the 🍰, the tests attempt to download some vocabularies. So, we have to put them in place ourselves for purity.
I haven't been able to test this on macOS yet, it will probably needBuilds without issues on my MacBook.Security
.Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)