Experimenting with submodules in nixpkgs #37527

catern · 2018-03-21T04:06:35Z

This is an exploratory pull request to examine one possible future for a project like nixpkgs.

These changes add support to nixpkgs for using git submodules in the nixpkgs git repository to find sources for derivations. This uses builtins.fetchgit to remove the need to specify a sha256, while still maintaining complete reproducibility.

To demonstrate these changes, two submodules are added, and the corresponding two packages are updated to use those two submodules for their sources.

To keep things simple, most of the git-interaction is currently implemented in a simple Python script, not in Nix expressions; thus you must run ./prepare_sources.py > sources.json in the root of the Nixpkgs repository whenever you init or deinit a submodule. This limitation can be removed with additional work by porting that Python code to Nix expressions.

First off, the most pedestrian benefit. Representing the source version as a submodule makes it easier to update the version of sources used in Nixpkgs. Just git pull in the submodule, git add in Nixpkgs, and commit, and you're done. It's equally easy to update to a new tag; just git fetch && git checkout instead of git pull. In either case, there's no need to edit any Nix expression.

Beyond that, this simple change enables an extremely powerful new workflow for open source software developers.

If I wish to make a change to some software, I just initialize the corresponding submodule in nixpkgs, and start hacking:

git submodule init sources/tools/system/supervise
cd sources/tools/system/supervise
vi foo.c # hack hack hack

Any changes in the now-initialized submodule will be automatically picked up when I next build:

nix-build ~/my-nixpkgs -A supervise

If I have committer privileges to the project, pushing my changes then as easy as git push in the submodule. If I don't have commit privileges, I can just go through the project's normal contribution workflow; with Github, that would be as easy as hub fork && hub pull-request.

In either case, to add my change to nixpkgs, all I have to do is git add the now-changed submodule, commit, and push:

git add sources/tools/system/supervise
git commit -m "supervise: update from blah to blah"
git push

This is already a huge win on its own. But it's what happens when I'm working on multiple pieces of software at once that really makes this transformative.

I can check out any number of different projects with arbitrary dependency relationships, and it's easy as nix-build -A somepkg to automatically rebuild the tree with all my changes.

If, for example, I'm working on a project written in Python packaged in nixpkgs, and I see an issue in CPython that I should fix, I just open up the CPython submodule, fix the issue, and my project immediately benefits. If I didn't quite fix the problem, I can easily keep iterating, tweaking my own project and CPython simultaneously.

It's likewise just as easy to distribute in-progress changes to multiple projects. I commit my in-progress changes and push it to my nixpkgs fork. Then if someone clones my nixpkgs fork, they can immediately start working on the same changed codebases with the same changes. (This can be made incredibly simple with some tooling that uses techniques such as on-demand creation of an "omega repo".)

I suspect there are numerous other advantages as well which are not yet obvious, and that making common operations so much cheaper may unlock a radically different way of working on open source.

I think this way of organizing projects would be a truly transformative way to organize software development, and it would be a major incentive for using Nix as the backbone of a software project. In a certain way, this would position Nixpkgs as an "open source monorepo", a place where cross-project integration work could be done with ease, without any of the scaling issues of traditional monorepos.

Of course, tying Nixpkgs so deeply to Git may be undesirable, though it wouldn't prevent us from doing anything we currently do. There are some practical downsides as well; git submodules can be tricky to work with, though there are projects attempting to make them easier to use. builtins.fetchgit currently has some scaling problems with its git cache, which may make it difficult to do something like this before the issues are fixed.

Nevertheless, for projects other than Nixpkgs, such as separate Nixpkgs overlays for a few related packages, I think this kind of organization makes a lot of sense. It would be nice to see some projects openly experiment with a submodule-based Nixpkgs overlay.

What do you think?

1 Description ============= This is an exploratory pull request to examine one possible future for a project like nixpkgs. These changes add support to nixpkgs for using git submodules in the nixpkgs git repository to find sources for derivations. This uses builtins.fetchgit to remove the need to specify a sha256, while still maintaining complete reproducibility. To demonstrate these changes, two submodules are added, and the corresponding two packages are updated to use those two submodules for their sources. To keep things simple, most of the git-interaction is currently implemented in a simple Python script, not in Nix expressions; thus you must run `./prepare_sources.py > sources.json' in the root of the Nixpkgs repository whenever you init or deinit a submodule. This limitation can be removed with additional work by porting that Python code to Nix expressions. 2 Motivation and purpose ======================== First off, the most pedestrian benefit. Representing the source version as a submodule makes it easier to update the version of sources used in Nixpkgs. Just `git pull' in the submodule, `git add' in Nixpkgs, and commit, and you're done. It's equally easy to update to a new tag; just `git fetch && git checkout' instead of `git pull'. In either case, there's no need to edit any Nix expression. Beyond that, this simple change enables an extremely powerful new workflow for open source software developers. If I wish to make a change to some software, I just initialize the corresponding submodule in nixpkgs, and start hacking: ,---- | git submodule init sources/tools/system/supervise | cd sources/tools/system/supervise | vi foo.c # hack hack hack `---- Any changes in the now-initialized submodule will be automatically picked up when I next build: ,---- | nix-build ~/my-nixpkgs -A supervise `---- If I have committer privileges to the project, pushing my changes then as easy as `git push' in the submodule. If I don't have commit privileges, I can just go through the project's normal contribution workflow; with Github, that would be as easy as `hub fork && hub pull-request'. In either case, to add my change to nixpkgs, all I have to do is `git add' the now-changed submodule, commit, and push: ,---- | git add sources/tools/system/supervise | git commit -m "supervise: update from blah to blah" | git push `---- This is already a huge win on its own. But it's what happens when I'm working on multiple pieces of software at once that really makes this transformative. I can check out any number of different projects with arbitrary dependency relationships, and it's easy as `nix-build -A somepkg' to automatically rebuild the tree with all my changes. If, for example, I'm working on a project written in Python packaged in nixpkgs, and I see an issue in CPython that I should fix, I just open up the CPython submodule, fix the issue, and my project immediately benefits. If I didn't quite fix the problem, I can easily keep iterating, tweaking my own project and CPython simultaneously. It's likewise just as easy to distribute in-progress changes to multiple projects. I commit my in-progress changes and push it to my nixpkgs fork. Then if someone clones my nixpkgs fork, they can immediately start working on the same changed codebases with the same changes. (This can be made incredibly simple with some tooling that uses techniques such as on-demand creation of an "[omega repo]".) I suspect there are numerous other advantages as well which are not yet obvious, and that making common operations so much cheaper may unlock a radically different way of working on open source. [omega repo] https://github.com/twosigma/git-meta/wiki/The-Omega-Repo 3 Conclusion ============ I think this way of organizing projects would be a truly transformative way to organize software development, and it would be a major incentive for using Nix as the backbone of a software project. In a certain way, this would position Nixpkgs as an "open source monorepo", a place where cross-project integration work could be done with ease, without any of the scaling issues of traditional monorepos. Of course, tying Nixpkgs so deeply to Git may be undesirable, though it wouldn't prevent us from doing anything we currently do. There are some practical downsides as well; git submodules can be tricky to work with, though there are [projects] attempting to make them easier to use. `builtins.fetchgit' currently has some scaling problems with its git cache, which may make it difficult to do something like this before the issues are fixed. Nevertheless, for projects other than Nixpkgs, such as separate Nixpkgs overlays for a few related packages, I think this kind of organization makes a lot of sense. It would be nice to see some projects openly experiment with a submodule-based Nixpkgs overlay. What do you think? [projects] https://github.com/twosigma/git-meta

See previous commit message for details.

FRidh · 2018-03-21T06:31:09Z

Thank you for looking into this. There are pros and cons to using submodules.

These changes add support to nixpkgs for using git submodules in the nixpkgs git repository to find sources for derivations. This uses builtins.fetchgit to remove the need to specify a sha256, while still maintaining complete reproducibility.

With buildins.fetchGit you do not need a hash with a local checkout of the submodule (e.g. when editing). But, when have committed and pushed a certain version, ~~.gitmodules~~ the submodule path will contain a revision.

The advantages you've listed are nice ones. If we would move not just the source but an expression in a submodule, we get the added benefit of being able to set up different permissions, although it also has a clear disadvantage: not being able to see all expressions directly. Simply initializing all submodules is not an option as it takes to long.

catern · 2018-04-04T19:01:01Z

If we would move not just the source but an expression in a submodule, we get the added benefit of being able to set up different permissions, although it also has a clear disadvantage: not being able to see all expressions directly.

Yes, I think that with regard to putting package expressions in the "same place" as package sources, submodules-in-nixpkgs has all the same issues as tarballs. Getting the package expression out of the submodule will have the same advantages and the same disadvantages as getting a package expression out of a tarball.

I think submodules might reduce the need to move package expressions to the "same place" as package sources, though. If the package source is a submodule of the nixpkgs repo, the package expressions and package source are already conceptually a lot "closer together", and maybe that obsoletes some of the reasons for putting expressions next to sources. (Sorry, super vague I know :))

On the other hand, Nix expressions in submodules could be really interesting for another use case: An overlay repo (or even an individual project) could put Nixpkgs in a submodule, rather than doing pinning by other means. Or to reverse it: maybe you could modularize Nixpkgs into multiple overlays, pulled in as submodules, which themselves have submodules pointing to the source code for their packages. Of course, then you'd have nested submodules, which sounds like a nightmare, but maybe with sufficient tooling could actually be very cool.

FRidh · 2019-01-06T12:36:02Z

While it was an interesting experiment, it won't go in so I am closing this.

catern added 2 commits March 21, 2018 03:45

add submodules and update packages to use them

5122dc4

See previous commit message for details.

catern requested a review from FRidh as a code owner March 21, 2018 04:06

GrahamcOfBorg added 6.topic: fetch 6.topic: python labels Mar 21, 2018

FRidh closed this Jan 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimenting with submodules in nixpkgs #37527

Experimenting with submodules in nixpkgs #37527

catern commented Mar 21, 2018

FRidh commented Mar 21, 2018 •

edited

Loading

catern commented Apr 4, 2018

FRidh commented Jan 6, 2019

Experimenting with submodules in nixpkgs #37527

Experimenting with submodules in nixpkgs #37527

Conversation

catern commented Mar 21, 2018

FRidh commented Mar 21, 2018 • edited Loading

catern commented Apr 4, 2018

FRidh commented Jan 6, 2019

FRidh commented Mar 21, 2018 •

edited

Loading