Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc: init at 0.24.3 #54530

Merged
merged 1 commit into from Oct 29, 2019
Merged

dvc: init at 0.24.3 #54530

merged 1 commit into from Oct 29, 2019

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Jan 24, 2019

Motivation for this change

Adds dvc. Fixes #49438

DVC is a version control system for data science that integrates with Git. But doesn't really actually need git.

Since it's a application and not a library, I thought to just put it in pkgs/applications/version-management/dvc/default.nix.

Futhermore, I've decided to use python3Packages set instead of just pythonPackages as we should be using Python 3 for any applications now.

They only have a wheel on the on Pypi, and I've asked them to push up their source distribution to the Pypi as well iterative/dvc#1509.

In the case that is done, this can be changed to use the source distribution and hopefully run some tests as well.

One thing is that after I built this, and tried to run dvc, I get this at the end:

/nix/store/vs6d2fjkl4kb3jb7rwibsd76k9v2n4xy-bash-4.4-p23/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/home/cmcdragonkai/.includes_sh/shell_environment_common.conf: line 31: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
Error: the following arguments are required: COMMAND

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
usage: dvc [-h] [-q | -v] [-V] COMMAND ...

Data Version Control

optional arguments:
  -h, --help     show this help message and exit
  -q, --quiet    Be quiet.
  -v, --verbose  Be verbose.
  -V, --version  Show program's version.

Available Commands:
  COMMAND        Use dvc COMMAND --help for command-specific help.
    init         Initialize dvc over a directory (should already be a git dir).
    destroy      Destroy dvc. Will remove all project's information, data files and cache.
    add          Add files/directories to dvc.
    remove       Remove outputs of DVC file.
    move         Move output of DVC file.
    unprotect    Unprotect data file/directory.
    run          Generate a stage file from a given command and execute the command.
    repro        Reproduce DVC file. Default file name - 'Dvcfile'.
    pull         Pull data files from the cloud.
    push         Push data files to the cloud.
    fetch        Fetch data files from the cloud.
    status       Show the project status.
    gc           Collect garbage.
    add          Add files/directories to dvc.
    import       Import files from URL.
    config       Get or set config options.
    checkout     Checkout data files from cache.
    remote       Manage set of tracked repositories.
    cache        Manage cache settings.
    metrics      Get metrics from all branches.
    install      Install dvc hooks into the repository.
    root         Relative path to project's directory.
    lock         Lock DVC file.
    unlock       Unlock DVC file.
    pipeline     Manage pipeline.
    daemon       Service daemon.
/nix/store/ydk0mfpvn9smcmn72wc9i20slv1d2b79-python3-3.7.2/bin/python3.7: No module named dvc
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@CMCDragonkai
Copy link
Member Author

What would be the solution to: /nix/store/ydk0mfpvn9smcmn72wc9i20slv1d2b79-python3-3.7.2/bin/python3.7: No module named dvc

@CMCDragonkai
Copy link
Member Author

Also I still need to figure out how to setup the remotes: https://github.com/iterative/dvc/blob/master/setup.py#L29-L43

I imagine that we can do these as options on the default.nix but suggestions would be good.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jan 24, 2019

Should it be like:

  propagatedBuildInputs = [
    ply
    configparser
    zc_lockfile
    future
    colorama
    configobj
    networkx
    pyyaml
    GitPython
    setuptools
    nanotime
    pyasn1
    schema
    jsonpath_rw
    requests
    grandalf
    asciimatics
    distro
    appdirs
  ]
  ++ lib.optional enableGoogle google_cloud_storage
  ++ lib.optional enableAWS boto3
  ++ lib.optional enableAzure azure-storage-blob
  ++ lib.optional enableSSH paramiko;

?


I have done this with option flags.

@CMCDragonkai
Copy link
Member Author

Given that the cloud remotes are in the extra_requires of setup.py. Is there anything special required in the buildPythonApplication to make sure the extra_requires dependency is in fact installed?

@CMCDragonkai
Copy link
Member Author

Theres's a problem here: iterative/dvc#1509 (comment)

@dotlambda
Copy link
Member

Theres's a problem here: iterative/dvc#1509 (comment)

I don't have any problem executing dvc.

@CMCDragonkai
Copy link
Member Author

@dotlambda You don't see this problem:

/nix/store/ydk0mfpvn9smcmn72wc9i20slv1d2b79-python3-3.7.2/bin/python3.7: No module named dvc

At the end of the dvc command?

@dotlambda
Copy link
Member

Nope, never saw it while playing around a little with dvc using nix-review pr 54530.

@worldofpeace
Copy link
Contributor

Nope, never saw it while playing around a little with dvc using nix-review pr 54530.

I can reproduce the message @CMCDragonkai is seeing when executing dvc like result/bin/dvc but not from within a nix-shell, which is probably why you didn't see it with nix-review.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jan 25, 2019

This might be the actual problem: iterative/dvc#1509 (comment)

Basically here https://github.com/iterative/dvc/blob/master/dvc/daemon.py#L69-L72

It tries to launch dvc daemon as a subcommand through the Python executable. Like "sys.executable" -m dvc. Because it's not critical to the process, things still work, but we end up reporting that error.

Other than the change they are going to make to the source code, what would normally be the solution here on Nixpkgs? Should the package somehow set the PYTHONHOME as well so that -m X works as a subprocess fork+exec style? See the _spawn_posix in the same file.

@CMCDragonkai
Copy link
Member Author

We have this problem: iterative/dvc#1509 (comment)

The issue is that dvc is Python application that uses the entry_points configuration to convert the libray module dvc into a callable script. This is usually fine as Nix figures it out and creates result/bin/dvc and result/bin/.dvc-wrapped.

However the problem is that within dvc they try to do a fork+exec on the dvc program. They do this here: https://github.com/iterative/dvc/blob/master/dvc/daemon.py#L66-L90

Basically try try to get the Python executable with with sys.executable and perform fork and exec of python -m dvc daemon -q.

Initially this did not work because within the Nix built package there was no way to find the dvc library module. Now that they have done iterative/dvc#1531 they can find it, but the executed subprocess cannot load any of the Python dependencies like colorama.

I'm not sure what's the best way to solve this problem.

@FRidh are you able to help here?

@dotlambda
Copy link
Member

You can simply patch the source to call $out/bin/dvc daemon.

@CMCDragonkai CMCDragonkai changed the title dvc: init at 0.24.0 dvc: init at 0.24.3 Jan 31, 2019
@CMCDragonkai
Copy link
Member Author

I think I fixed it. Basically patched it out so that it just calls dvc daemon -q. Interesting even though I didn't add its own store bin path to the PATH, it still works. Not sure if that's a fluke or designed. However this is annoying as subsequent commits will require fixes of that patch each time. It seems the Git patch is more stringent on its requirements, wouldn't a unified patch be more flexible? I don't know how to use it though.

@jeromebaum jeromebaum mentioned this pull request Jun 2, 2019
10 tasks
@CMCDragonkai
Copy link
Member Author

Feel free to take over @jeromebaum.

@schlichtanders
Copy link

looking forward to have dvc in nixos

what is the status here?

@CMCDragonkai CMCDragonkai mentioned this pull request Oct 2, 2019
10 tasks
@schlichtanders
Copy link

@jeromebaum @CMCDragonkai any update?

@jeromebaum
Copy link

jeromebaum commented Oct 3, 2019 via email

@jeromebaum
Copy link

Oh, sorry I just noticed that I was replying on a different PR than I thought. Please ignore my comment on it being unmergeable. That said, I haven't done anything further on this PR either and I'd assume that @CMCDragonkai is not opposed to you taking it over to bring it to completion.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 14, 2019

Just one thing left to do and this can be merged. But somebody else will need to update the derivation expression to the latest version. Don't have the bandwidth to do this.

@Rakesh4G
Copy link
Contributor

Hi @CMCDragonkai , I am attaching updated default.nix and patch file for you reference here.

dvc_nix.txt
dvc_patch.txt

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 28, 2019

This now works. I've tested a nix-build for both targets. Executed the bin commands, no more errors. Ready to be merged.

@worldofpeace @dotlambda

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 28, 2019 via email

Copy link
Contributor

@worldofpeace worldofpeace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think the remotes variant is reasonable.

@worldofpeace worldofpeace merged commit 9121c91 into NixOS:master Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request to package dvc
8 participants