Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some python libraries, mostly scrapying-related #60775

Closed
wants to merge 14 commits into from

Conversation

mredaelli
Copy link
Contributor

Motivation for this change

Some libraries I needed and were missing

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@mredaelli mredaelli requested a review from FRidh as a code owner May 2, 2019 16:23
buildPythonPackage {
pname = "langdetect";
version = "1.0.7";
src = fetchzip {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchPyPi
Maybe you need to pass in a different pname here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's because there's only a zip?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extension = "zip";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worked


propagatedBuildInputs = [ requests retrying six ];

# Prevent unicode decoding error in setup.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed anymore...What's the exact issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't see it anymore either :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably you wrote these expressions while using an older Nixpkgs (18.09)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did

# Tests are network-based
doCheck = false;

# Prevent unicode decoding error in setup.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't see it anymore either :/

LC_ALL = "en_US.UTF-8";

# Tests are network-based
doCheck = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't have tests scraping from localhost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but this is not the scraping library. Rather, the library that manages the crawlers running in their cloud environment, like aws-cli, I'd say.
Anyway, I see lots of messages like this in the tests:

pool_timeout = None, release_conn = False, chunked = False, body_pos = None
response_kw = {'decode_content': False, 'preload_content': False}, conn = None
release_this_conn = True, err = None, clean_exit = False
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7ffff537ab00>
is_new_proxy_conn = False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know what that is, probably just test-related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I disabled the tests that give a problem. At least some are passing :)

}:

buildPythonPackage rec {
pname = "shub";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mredaelli
Copy link
Contributor Author

mredaelli commented May 3, 2019

I think I covered the changes
And added a few more :)

@mredaelli
Copy link
Contributor Author

@FRidh do I have to do more changes?

@c0bw3b
Copy link
Contributor

c0bw3b commented May 9, 2019

langdetect and pdftotext were also added in another PR (#44086) so you can drop those 2 commits to resolve the current conflicts.

@mredaelli
Copy link
Contributor Author

@c0bw3b done :)

@c0bw3b
Copy link
Contributor

c0bw3b commented May 9, 2019

Testing on NixOS with sandboxing :

Result of nix-review pr 60775 1

3 package failed to build:
  • python27Packages.backports-datetime-fromisoformat
  • python27Packages.betterreads
  • python27Packages.enolib
22 package were build:
  • python27Packages.cfscrape
  • python27Packages.docxtotxt
  • python27Packages.python-datemath
  • python27Packages.rauth
  • python27Packages.rules
  • python27Packages.scrapinghub
  • python27Packages.scrapy-crawlera
  • python27Packages.sqlitedict
  • python37Packages.backports-datetime-fromisoformat
  • python37Packages.betterreads
  • python37Packages.cfscrape
  • python37Packages.docxtotxt
  • python37Packages.enolib
  • python37Packages.monkeytype
  • python37Packages.python-datemath
  • python37Packages.rauth
  • python37Packages.retype
  • python37Packages.rules
  • python37Packages.scrapinghub
  • python37Packages.scrapy-crawlera
  • python37Packages.shub
  • python37Packages.sqlitedict

Log excerpt for python27Packages.backports-datetime-fromisoformat (Py3 only?)

creating build/lib.linux-x86_64-2.7/backports/datetime_fromisoformat
copying backports/datetime_fromisoformat/__init__.py -> build/lib.linux-x86_64-2.7/backports/datetime_fromisoformat
running build_ext
building 'backports._datetime_fromisoformat' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/backports
creating build/temp.linux-x86_64-2.7/backports/datetime_fromisoformat
gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/nix/store/29xfvq39h5z7af90id0zaa4a09vgpsm7-python-2.7.16/include/python2.7 -c backports/datetime_fromisoformat/module.c -o build/temp.linux-x86_64-2.7/backports/datetime_fromisoformat/module.o
backports/datetime_fromisoformat/module.c: In function ‘init_datetime_fromisoformat’:
backports/datetime_fromisoformat/module.c:75:30: error: ‘module’ undeclared (first use in this function); did you mean ‘modfl’?
     initialize_timezone_code(module);
                              ^~~~~~
                              modfl
backports/datetime_fromisoformat/module.c:75:30: note: each undeclared identifier is reported only once for each function it appears in
error: command 'gcc' failed with exit status 1

Log excerpt for python27Packages.enolib (Py3 only?)

============================= test session starts ==============================
platform linux2 -- Python 2.7.16, pytest-4.2.1, py-1.7.0, pluggy-0.8.1
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/build/source/.hypothesis/examples')
rootdir: /build/source, inifile: setup.cfg
plugins: cov-2.6.1, hypothesis-4.7.3
collected 0 items / 117 errors

@mredaelli
Copy link
Contributor Author

Sorry, was away.
Yes, they are py3 only. I forget, how do I handle that?

@c0bw3b
Copy link
Contributor

c0bw3b commented May 22, 2019

No problem.
You can set disabled = !isPy3k; like alot does

@stale
Copy link

stale bot commented Jun 2, 2020

Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the
    related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on
    irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 2, 2020
@onny
Copy link
Contributor

onny commented Sep 4, 2022

Needs a rebase if still relevant

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 4, 2022
@mredaelli
Copy link
Contributor Author

Needs a rebase if still relevant

I don't really use these anymore, so I'll close the PR

@mredaelli mredaelli closed this Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants