Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r-modules: add GitHub only r packages #93883

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cfhammill
Copy link
Contributor

@cfhammill cfhammill commented Jul 26, 2020

Motivation for this change

Frequently for an analysis a package is required that isn't released on CRAN or bioconductor, but has a usable version on github. This PR adds the ability to integrate these packages with the rPackages set and adds two example packages https://github.com/jtimonen/lgpr and https://github.com/stan-dev/posterior.

Things done

In order to do this I:

  • Updated cran-packages.nix to ensure up to date dependencies
  • Added a github-packages.nix file
  • Modified generate-r-packages.R to accept the argument github
  • When github is passed, a fixed list of github packages are traversed, using nix-prefetch-git to get the commit hash and sha256 for the current commit at HEAD. (This list is currently kept as an R vector in generate-R-packages.R, but could easily be made external as a text file).
  • Modifies r-modules/default.nix to have a deriveGithub that constructs the homepage as the repo url and uses fetchgit to download the sources.
  • Adds the github packages first to r-modules/default.nix so that a stable release on either bioconductor or CRAN would supersede the github version.
  • Tested with nix-shell -p rPackages.posterior works, nix-shell -p rPackages.lgpr fails to build due to failure to build rPackages.StanHeaders

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@bcdarwin
Copy link
Member

Having lpgr, INLA, and ITKR available would certainly be useful.

@bcdarwin
Copy link
Member

The CRAN bump should probably be done separately.

@bcdarwin
Copy link
Member

Since Github is not even minimally reviewed unlike CRAN/bioc, the script to generate the package set should probably also take commits and hashes as inputs as well as owner and repo for at least security reasons.

@bcdarwin
Copy link
Member

bcdarwin commented Jul 27, 2020

Is there anything special about GitHub here or should this be generalized to arbitrary source repos?

@cfhammill
Copy link
Contributor Author

cfhammill commented Jul 27, 2020

Happy to split that out, it may resolve the current breakage with lgpr (although it would then break building posterior). Fixing the StanHeaders build failure is important because many packages depend on it.

I think I agree regarding freezing commit and sha256 for the github packages (instead of refreshing each time DESCRIPTION has a version bump). This logic could be separated to another script. Alternatively, since generate-r-packages.R doesn't delete old entries in e.g. github-packages.nix, we could make it so the list of packages to add or version bump needs to be passed explicitly on the command line (I think I like this idea best).

As for generalizing to other sources of git packages, some of the logic is github specific. For example downloading the raw DESCRIPTION file without cloning the repo is likely github (possibly gitlab) exclusive.

Addendum: This doesn't stop us from generalizing, since we're running nix-prefetch-git we have the DESCRIPTION file locally, so we could presumably use that copy, but we'd have to parse the store path out of the nix-prefetch-git verbose output. But this may be better saved for the next iteration.

@cfhammill
Copy link
Contributor Author

I've added commandline handling, fixed the code so old packages are maintained if not passed on the commandline, modified the code to use the parallel map for the git fetches and parsing. And squashed the history.

@peti what are your thoughts on adding this feature?

@cfhammill cfhammill force-pushed the cfh/add-github-only-r-packages branch 3 times, most recently from 32c515d to 072af71 Compare August 7, 2020 13:24
- adds a (currently two package github-packages.nix)
- add github case to generate-R-packages.R
- accept repo specs of the form user/repo@commit on the commandline
- preserves old versions unchanged if not passed on the commandline
- use the multicore map to handle the repo fetching/parsing
- use nix-prefetch-git to fetch sha256 from repo
- reads DESCRIPTION directly from github (TODO use prefetched repo)
@cfhammill cfhammill force-pushed the cfh/add-github-only-r-packages branch from 072af71 to 9406644 Compare August 7, 2020 14:12
@alexvorobiev
Copy link
Contributor

Thanks for this! We also have a number of proprietary R packages in the internal git repository. Can the PR be extended to support that?

@bcdarwin
Copy link
Member

bcdarwin commented Oct 7, 2020

@alexvorobiev It could, but I think we're basically waiting for some signal from @peti that this will be merged at some point or what sort of modifications to make. I think we need to split the CRAN package regeneration into a separate PR at least but other than that this should be ready to go.

@alexvorobiev
Copy link
Contributor

@bcdarwin Since the R packages in nixpkgs are updated so infrequently I am working on an overlay that uses the copies of the files from r-modules. I am going to apply the changes from this PR to my overlay because I have an immediate use case for them. I will also try to add our internal CRAN-like repository (we use Artifactory) and some way to install packages from multiple snapshots of MRAN (the new data.table doesn't build for some reason). Thanks!

Copy link
Contributor

@mnacamura mnacamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I think it is fine except that some documentation needs to be updated.

Comment on lines 41 to 43
# Templates for generating Bioconductor and CRAN packages
# from the name, version, sha256, and optional per-package arguments above
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that github packages are supported, imho this comment and /doc/languages-frameworks/r.section.md should be updated appropriately.

Comment on lines +2 to +4
# Execute the following command to update the file.
#
# Rscript generate-r-packages.R github >new && mv new github-packages.nix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instruction is incorrect for github so misleading for users.

@alexvorobiev
Copy link
Contributor

alexvorobiev commented Jan 15, 2021

Are there any updates to this PR or obstacles? In general I think it would be very useful to have mechanisms to

  1. provide custom derive* functions (overriding the current ones if needed)
  2. additional package derivation lists (my-own-packages.nix)
  3. adjust the lists like packagesWithNativeBuildInput

without modifying https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/r-modules/default.nix or https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/r-modules/generate-r-packages.R. I have an internal mirror of CRAN and a number of proprietary packages, so every time there are updates to that file (I copied it to an overlay) I need to merge my changes to the new version.

Copy link
Member

@peti peti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have discussed my thoughts about this issue in great detail at https://www.twitch.tv/videos/878608118?t=1h5m36s.

The gist of if it is that IMHO the complexity of generate-r-packages.R is getting out of hand and that I feel less and less able to maintain that code. I am a moderately competent user of R, but I am by no means a developer. I have never written an R package myself. I came up with some simple and straightforward solution to generate our package set a few years ago that worked, but my impression is that this code should really be taken over by someone more knowledgeable than myself.

So, I am perfectly happy to see these changes merged, but I would prefer someone else to take over the responsibility of maintaining of the R package set in that case because I don't want to find myself in a situation where that generate-r-packages.R script no longer works and I have to feel like I'm responsible for fixing it.

If anyone is up to the task, please let me know and I'll make the necessary changes in .github/CODEOWNERS to make sure that all pull requests go through you. Anyone?

@bcdarwin
Copy link
Member

bcdarwin commented Feb 5, 2021

Hi @peti,

I have discussed this with my colleague @cfhammill and we're willing to become co-maintainers. Realistically we can't promise to refactor this code in the near term but we'll certainly try to stop it from growing further out of control.

(Of course, this PR shouldn't be merged yet due to unresolved issues.)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/automated-package-updates/12908/9

@jbedo
Copy link
Contributor

jbedo commented May 7, 2021

@peti @bcdarwin @cfhammill I'd also be willing to be a co-maintainer. I've reasonably frequently updated this package set and am a frequent user of R and it's packages in Nix.

As for this pull request, I also feel it's a bit too github specific and that it should be generalised, especially since the repository is cloned anyway to compute a hash so we may as well read the DESCRIPTION file from there.

@bcdarwin
Copy link
Member

I have discussed my thoughts about this issue in great detail at https://www.twitch.tv/videos/878608118?t=1h5m36s.

The gist of if it is that IMHO the complexity of generate-r-packages.R is getting out of hand and that I feel less and less able to maintain that code. I am a moderately competent user of R, but I am by no means a developer. I have never written an R package myself. I came up with some simple and straightforward solution to generate our package set a few years ago that worked, but my impression is that this code should really be taken over by someone more knowledgeable than myself.

So, I am perfectly happy to see these changes merged, but I would prefer someone else to take over the responsibility of maintaining of the R package set in that case because I don't want to find myself in a situation where that generate-r-packages.R script no longer works and I have to feel like I'm responsible for fixing it.

If anyone is up to the task, please let me know and I'll make the necessary changes in .github/CODEOWNERS to make sure that all pull requests go through you. Anyone?

Hi @peti -- any word on this? Three people in this thread have responded to your call to volunteer to co-maintain the R packages set. Admittedly no-one has generalized this PR yet but maybe out of a sense that merging of changes to the R infrastructure has somewhat stalled, r-packages is not being regularly evaluated on Hydra, etc. Thanks!!

cc @cfhammill @jbedo

@PhDyellow PhDyellow mentioned this pull request Aug 20, 2021
11 tasks
@@ -36,11 +36,31 @@ let
meta.platforms = R.meta.platforms;
meta.hydraPlatforms = hydraPlatforms;
meta.broken = broken;
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
});
});

@@ -3,7 +3,7 @@
{ R, pkgs, overrides }:

let
inherit (pkgs) cacert fetchurl stdenv lib;
inherit (pkgs) cacert fetchurl fetchgit stdenv lib;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use fetchFromGitHub?

Comment on lines +9 to +10
lgpr = derive2 { name="lgpr"; version="0.33.2"; sha256="14i1vf9119cypjnacy43flfzx3lsjhax5zg33h4kjsyjbfl68ngi"; depends=[MASS Rcpp bayesplot ggplot2 ggpubr rstan rstantools]; url = "https://github.com/jtimonen/lgpr"; rev = "5a25fa8a1b21a34f915461802e21f592184d6e67"; };
posterior = derive2 { name="posterior"; version="0.1.0"; sha256="08psb9cknkva4awbrz38pdr7ssic0za31hji4i90vjb0vdnjn4k0"; depends=[abind checkmate rlang tibble]; url = "https://github.com/stan-dev/posterior"; rev = "4794130b548511c9861fc0085ba5a677afeba6fe"; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lgpr = derive2 { name="lgpr"; version="0.33.2"; sha256="14i1vf9119cypjnacy43flfzx3lsjhax5zg33h4kjsyjbfl68ngi"; depends=[MASS Rcpp bayesplot ggplot2 ggpubr rstan rstantools]; url = "https://github.com/jtimonen/lgpr"; rev = "5a25fa8a1b21a34f915461802e21f592184d6e67"; };
posterior = derive2 { name="posterior"; version="0.1.0"; sha256="08psb9cknkva4awbrz38pdr7ssic0za31hji4i90vjb0vdnjn4k0"; depends=[abind checkmate rlang tibble]; url = "https://github.com/stan-dev/posterior"; rev = "4794130b548511c9861fc0085ba5a677afeba6fe"; };
lgpr = derive2 { name = "lgpr"; version = "0.33.2"; sha256 = "14i1vf9119cypjnacy43flfzx3lsjhax5zg33h4kjsyjbfl68ngi"; depends = [ MASS Rcpp bayesplot ggplot2 ggpubr rstan rstantools ]; url = "https://github.com/jtimonen/lgpr"; rev = "5a25fa8a1b21a34f915461802e21f592184d6e67"; };
posterior = derive2 { name = "posterior"; version = "0.1.0"; sha256 = "08psb9cknkva4awbrz38pdr7ssic0za31hji4i90vjb0vdnjn4k0"; depends = [ abind checkmate rlang tibble ]; url = "https://github.com/stan-dev/posterior"; rev = "4794130b548511c9861fc0085ba5a677afeba6fe"; };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PhDyellow PhDyellow mentioned this pull request Aug 23, 2021
11 tasks
@PhDyellow
Copy link

I want to know if I understand this PR. Does the workflow for github packages look like this:

  1. generate_r_packages.R github user/repo@commit user2/repo2@commit2 ....
  2. Move the output to overwrite github-packages.nix
  3. Push the changes to NixOS/nixpkgs

So users wanting to update the github package set do so on a per package basis, with no bulk update of github packages for security reasons?

I frequently use overlays to bring in a package from Github, sometimes specifically the development version of a CRAN package, which needs to override the CRAN version. How would I do that?

Similar to @alexvorobiev, I would like to see functions like buildRpackage exposed, so I can write things like

environment.systemPackages = [
pkgs.rstudioWrapper.override {
  packages = [
    (pkgs.buildRpackage { name="lgpr"; version="0.33.2"; sha256="14i1vf9119cypjnacy43flfzx3lsjhax5zg33h4kjsyjbfl68ngi"; depends=[MASS Rcpp bayesplot ggplot2 ggpubr rstan rstantools]; url = "https://github.com/jtimonen/lgpr"; rev = "5a25fa8a1b21a34f915461802e21f592184d6e67"; }; };)
  ]; 
];

@peti
Copy link
Member

peti commented Aug 23, 2021

Hi @peti -- any word on this? Three people in this thread have responded to your call to volunteer to co-maintain the R packages set.

Yeah, this is great! Is there anything you guys need from me to get started maintaining the package set? What is blocking you?

r-packages is not being regularly evaluated on Hydra, etc.

Hydra frequently evaluates https://hydra.nixos.org/jobset/nixpkgs/r-updates. One just needs to push the to-be-evaluated code to the r-updates branch, e.g. via a PR.

@PhDyellow
Copy link

Yeah, this is great! Is there anything you guys need from me to get started maintaining the package set? What is blocking you?

I think you just need to add some of them to CODEOWNERS for R. It looks like you could add all of @cfhammill @jbedo @bcdarwin. Take yourself off if you want.

/pkgs/applications/science/math/R @peti

Hydra frequently evaluates https://hydra.nixos.org/jobset/nixpkgs/r-updates. One just needs to push the to-be-evaluated code to the r-updates branch, e.g. via a PR.

Thanks, I didn't know that it needed to go to a specific branch.

peti added a commit that referenced this pull request Aug 23, 2021
@peti
Copy link
Member

peti commented Aug 23, 2021

I added @jbedo and @bcdarwin in 430a043. I didn't add @cfhammill because that user name is not listed in maintainers/maintainer-list.nix and I wasn't sure whether that would be a problem or not.

@zowoq
Copy link
Contributor

zowoq commented Aug 23, 2021

https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-code-owners
The people you choose as code owners must have write permissions for the repository. When the code owner is a team, that team must have write permissions, even if all the individual members of the team already have write permissions directly, through organization membership, or through another team membership.

You need write access to the repo to get notifications via CODEOWNERS.

@bcdarwin
Copy link
Member

bcdarwin commented Aug 23, 2021

I am starting a hydra job for r-updates but it seems that Hydra's config causes it to check out @peti's r-updates branch which is somewhat stale, and I don't have permission to edit the config. Should this be changed to point to my own r-updates (or nixpkgs's [a bit less plausibly])?

@jbedo
Copy link
Contributor

jbedo commented Aug 23, 2021

I am starting a hydra job for r-updates but it seems that Hydra's config causes it to check out @peti's r-updates branch which is somewhat stale, and I don't have permission to edit the config. Should this be changed to point to my own r-updates (or nixpkgs's [a bit less plausibly])?

Haskell's update branch is off NixOS/nixpkgs, so maybe we should be the same?

@TTTPOB
Copy link
Contributor

TTTPOB commented Oct 28, 2021

how is this pr going on? I've used code from this pr and added a file like this, and source it to get github packages in generate-r-packages.R

image

I think file like this should be more convenient that call generate-r-packages.R with all github packages

@jbedo
Copy link
Contributor

jbedo commented Oct 28, 2021

I think it'd be great to have some support for github based r packages, but I'm now not sure they should be part of rPackages. Is there any reason to have them part of the rPackages tree? I'm concerned about conflicts and how to resolve them. Specifically, it's not unusual to have a github hosted R package get listed in CRAN/BIOC, at which point there is a conflict to resolve as it will show up as part of the CRAN/BIOC package set automatic update.

Another option to consider is to just manually manage these github packages outside of the automatically generated rPackages tree, like any other package in nixpkgs. This would allow us to leverage tools like nix-update instead to keep these github packages up to date, rather than reimplementing the same functionality in generate-r-packages. It also would allow an opportunity to properly fill out the metadata (including importantly maintainers), something that is currently missing from the generated rPackages and that I'm keen on fixing.

One final though is that many use cases might be covered by an r2nix tool, or with a recursive nix expression that parses DESCRIPTION. We can't consider recursive nix as part of nixpkgs but it would allow the easy addition of arbitrary packages as part of an R environment.

@cfhammill
Copy link
Contributor Author

@jbedo I see where you're coming from. It was so long ago now, but I think the order preference was CRAN, BioC, then git for the case of cross-listed packages. This issue would already be occurring with many packages appearing both on CRAN and in BioC.

I think it is probably preferable to append the git packages to the rPackages tree, perhaps with a e.g. .gitPackages subtree. Nice to keep everything in one place in my opinion. Its tough to know at this moment the scale of non CRAN non BioC packages that people would want though, if it's 5 then maybe a nix file per package is fine, but if there are hundreds maybe not. What do you think?

@jbedo
Copy link
Contributor

jbedo commented Dec 23, 2021

This issue would already be occurring with many packages appearing both on CRAN and in BioC.

It's part of BioC naming policy to avoid naming collisions with CRAN so this should be very infrequent (from https://www.bioconductor.org/developers/package-submission/):

Packages should be named in a way that does not conflict (irrespective of case) with any current or past BIOCONDUCTOR package, nor any current CRAN package.

Currently there is one package (interacCircos) that is both in CRAN and BioC, and it probably should be reported upstream to resolve the issue.

I think it is probably preferable to append the git packages to the rPackages tree, perhaps with a e.g. .gitPackages subtree. Nice to keep everything in one place in my opinion. Its tough to know at this moment the scale of non CRAN non BioC packages that people would want though, if it's 5 then maybe a nix file per package is fine, but if there are hundreds maybe not. What do you think?

Even if it's hundreds I don't think it will be an issue, after all most of nixpkgs is handled this way. I still feel like generate-r-packages is the wrong machinery for updating github hosted packages and that other existing tooling (nix-update) is more suitable. This also already handles non-github hosted repositories (at least gitlab).

I also agree with you and think it's useful to have them in grouped in a subtree not top-level, maybe in another top-level tree called rPackagesExtra so there's a clear distinction between the CRAN/BioC repositories and packages hosted elsewhere. I'm not keen on a subtree of rPackages as we may end up with a conflict if a CRAN/BioC package with the same name appears.

@cfhammill
Copy link
Contributor Author

Ok all sounds good to me. I will take a stab at redoing this PR, adding a top level rPackagesExtra tree with up to date versions of the packages above.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-there-a-way-to-add-r-packages-straight-from-github/17871/2

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 21, 2022
@nviets
Copy link
Contributor

nviets commented Feb 1, 2023

@cfhammill - do you have a branch started for adding a top-level rPackagesExtra? I'm interested in picking this up again.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Feb 1, 2023
@cfhammill
Copy link
Contributor Author

@nviets I haven't been working on it, if you wanted to start a new PR with the feature I'd be happy to review

@nviets
Copy link
Contributor

nviets commented Feb 1, 2023

Thanks for confirming. I'll try to take a look soon.

@nviets nviets mentioned this pull request Mar 23, 2023
4 tasks
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024
@wegank wegank marked this pull request as draft March 20, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 10.rebuild-darwin: 0 10.rebuild-linux: 0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet