Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.tokenizers: 0.8.1 -> 0.9.2 #100700

Closed
wants to merge 2 commits into from

Conversation

danieldk
Copy link
Contributor

Motivation for this change

Changelog:

https://github.com/huggingface/tokenizers/releases/tag/python-v0.9.0
https://github.com/huggingface/tokenizers/releases/tag/python-v0.9.1
https://github.com/huggingface/tokenizers/releases/tag/python-v0.9.2

Changes in the derivation:

  • Tokenizers uses a newer version of pyo3 that does not require Rust
    nightly anymore. So, we do not have to use any cheat codes anymore.
  • Tokenizers is now a mixed Rust/Python project. The way it is set
    up does not work with Maturin, so switch to setuptools-rust instead.
  • Add additional data files needed for tests.
  • Use pytestCheckHook.

Draft PR, because Huggingface transformers is not yet compatible with this version of tokenizers.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Changelog:

https://github.com/huggingface/tokenizers/releases/tag/python-v0.9.0
https://github.com/huggingface/tokenizers/releases/tag/python-v0.9.1
https://github.com/huggingface/tokenizers/releases/tag/python-v0.9.2

Changes in the derivation:

* Tokenizers uses a newer version of pyo3 that does not require Rust
  nightly anymore. So, we do not have to use any cheat codes anymore.
* Tokenizers is now a mixed Rust/Python project. The way it is set
  up does not work with Maturin, so switch to setuptools-rust instead.
* Add additional data files needed for tests.
* Use `pytestCheckHook`.
@danieldk
Copy link
Contributor Author

@ofborg build python38Packages.tokenizers python37Packages.tokenizers

@danieldk
Copy link
Contributor Author

Superseded by #101443.

@danieldk danieldk closed this Oct 23, 2020
@danieldk danieldk deleted the tokenizers-0.9.2 branch October 23, 2020 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant