Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pythonPackages.textacy: init at 0.4.1 #29334

Merged
merged 8 commits into from Sep 16, 2017
Merged

pythonPackages.textacy: init at 0.4.1 #29334

merged 8 commits into from Sep 16, 2017

Conversation

rvl
Copy link
Contributor

@rvl rvl commented Sep 13, 2017

Motivation for this change

Natural language processing with Python under Nix.

  • This change adds the textacy python package and its dependencies.
  • Versions have been tweaked on some dependencies where necessary.
  • Finally, this change also adds python package derivations for the language model datasets required for many NLP functions.
Datasets

The language data models can't be downloaded in the way described by the spaCy docs because their download script is a wrapper around pip install. The data files total a couple GB and need enough /tmp space to assemble. I'm assuming that because the derivations exist under an attrset, they won't be automatically built by Hydra (good).

Things done

  • Tested using sandboxing (nix.useSandbox on NixOS, or option build-use-sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip" -- didn't use nox but have tested build of mopidy-gmusic, but not the "oslo" packages because they are already marked broken.
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

/cc @sdll

@rvl rvl requested a review from FRidh as a code owner September 13, 2017 22:44
@mention-bot
Copy link

@rvl, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jgillich, @carlsverre and @FRidh to be potential reviewers.

@@ -0,0 +1,29 @@
{ stdenv
, pkgs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no pkgs

@@ -0,0 +1,31 @@
{ stdenv
, pkgs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no pkgs

@@ -0,0 +1,128 @@
{ stdenv
, pkgs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no pkgs

, nose
}:
let
ijson = buildPythonPackage rec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these packages not called from python-packages.nix?

@@ -14313,6 +14313,20 @@ in {
};
};

cachetools = buildPythonPackage rec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate file in python-modules/

@@ -24150,11 +24164,11 @@ EOF


unidecode = buildPythonPackage rec {
name = "Unidecode-0.04.18";
name = "Unidecode-0.04.21";

src = pkgs.fetchurl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchPypi and preferable separate file

@@ -27308,6 +27322,12 @@ EOF

spacy = callPackage ../development/python-modules/spacy { };

spacy_models = callPackage ../development/python-modules/spacy/models.nix { };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to add this to passthru.models of spacy. No need though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will pass on this because the data model packages have dependencies on spacy.

@rvl
Copy link
Contributor Author

rvl commented Sep 15, 2017

Thanks for the review @FRidh. I have addressed your comments in new commits which can be autosquashed.

@FRidh
Copy link
Member

FRidh commented Sep 15, 2017

@rvl please do the squashing yourself; commit per package.

Higher version is a requirement of textacy.
Textacy requires a later version.

v2.0.0 introduced breaking changes, so keep two versions for other
packages.
This module is a requirement of textacy.
Model                   Version
es_core_web_md          1.0.0
fr_depvec_web_lg        1.0.0
en_core_web_md          1.2.1
en_depent_web_md        1.2.1
en_core_web_sm          1.2.0
de_core_news_md         1.0.0
en_vectors_glove_md     1.0.0
@rvl
Copy link
Contributor Author

rvl commented Sep 16, 2017

@FRidh OK, have pushed the squashed commits.

@FRidh FRidh merged commit e8f8736 into NixOS:master Sep 16, 2017
@rvl
Copy link
Contributor Author

rvl commented Sep 16, 2017

Thanks @FRidh :-)

@rvl rvl deleted the python-textacy branch September 16, 2017 10:35
@disassembler disassembler mentioned this pull request Sep 16, 2017
8 tasks
rvl added a commit to rvl/nixpkgs that referenced this pull request Sep 18, 2017
@rvl
Copy link
Contributor Author

rvl commented Sep 18, 2017

Very sorry @FRidh, it appears I forgot to retest the spacy_models packages. Would you be able to apply this patch? rvl@b7e602a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants