Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimenting with submodules in nixpkgs #37527

Closed
wants to merge 2 commits into from
Closed

Conversation

catern
Copy link
Contributor

@catern catern commented Mar 21, 2018

This is an exploratory pull request to examine one possible future for a project like nixpkgs.

These changes add support to nixpkgs for using git submodules in the nixpkgs git repository to find sources for derivations. This uses builtins.fetchgit to remove the need to specify a sha256, while still maintaining complete reproducibility.

To demonstrate these changes, two submodules are added, and the corresponding two packages are updated to use those two submodules for their sources.

To keep things simple, most of the git-interaction is currently implemented in a simple Python script, not in Nix expressions; thus you must run ./prepare_sources.py > sources.json in the root of the Nixpkgs repository whenever you init or deinit a submodule. This limitation can be removed with additional work by porting that Python code to Nix expressions.

First off, the most pedestrian benefit. Representing the source version as a submodule makes it easier to update the version of sources used in Nixpkgs. Just git pull in the submodule, git add in Nixpkgs, and commit, and you're done. It's equally easy to update to a new tag; just git fetch && git checkout instead of git pull. In either case, there's no need to edit any Nix expression.

Beyond that, this simple change enables an extremely powerful new workflow for open source software developers.

If I wish to make a change to some software, I just initialize the corresponding submodule in nixpkgs, and start hacking:

git submodule init sources/tools/system/supervise
cd sources/tools/system/supervise
vi foo.c # hack hack hack

Any changes in the now-initialized submodule will be automatically picked up when I next build:

nix-build ~/my-nixpkgs -A supervise

If I have committer privileges to the project, pushing my changes then as easy as git push in the submodule. If I don't have commit privileges, I can just go through the project's normal contribution workflow; with Github, that would be as easy as hub fork && hub pull-request.

In either case, to add my change to nixpkgs, all I have to do is git add the now-changed submodule, commit, and push:

git add sources/tools/system/supervise
git commit -m "supervise: update from blah to blah"
git push

This is already a huge win on its own. But it's what happens when I'm working on multiple pieces of software at once that really makes this transformative.

I can check out any number of different projects with arbitrary dependency relationships, and it's easy as nix-build -A somepkg to automatically rebuild the tree with all my changes.

If, for example, I'm working on a project written in Python packaged in nixpkgs, and I see an issue in CPython that I should fix, I just open up the CPython submodule, fix the issue, and my project immediately benefits. If I didn't quite fix the problem, I can easily keep iterating, tweaking my own project and CPython simultaneously.

It's likewise just as easy to distribute in-progress changes to multiple projects. I commit my in-progress changes and push it to my nixpkgs fork. Then if someone clones my nixpkgs fork, they can immediately start working on the same changed codebases with the same changes. (This can be made incredibly simple with some tooling that uses techniques such as on-demand creation of an "omega repo".)

I suspect there are numerous other advantages as well which are not yet obvious, and that making common operations so much cheaper may unlock a radically different way of working on open source.

I think this way of organizing projects would be a truly transformative way to organize software development, and it would be a major incentive for using Nix as the backbone of a software project. In a certain way, this would position Nixpkgs as an "open source monorepo", a place where cross-project integration work could be done with ease, without any of the scaling issues of traditional monorepos.

Of course, tying Nixpkgs so deeply to Git may be undesirable, though it wouldn't prevent us from doing anything we currently do. There are some practical downsides as well; git submodules can be tricky to work with, though there are projects attempting to make them easier to use. builtins.fetchgit currently has some scaling problems with its git cache, which may make it difficult to do something like this before the issues are fixed.

Nevertheless, for projects other than Nixpkgs, such as separate Nixpkgs overlays for a few related packages, I think this kind of organization makes a lot of sense. It would be nice to see some projects openly experiment with a submodule-based Nixpkgs overlay.

What do you think?

1 Description
=============

  This is an exploratory pull request to examine one possible future for
  a project like nixpkgs.

  These changes add support to nixpkgs for using git submodules in the
  nixpkgs git repository to find sources for derivations.  This uses
  builtins.fetchgit to remove the need to specify a sha256, while still
  maintaining complete reproducibility.

  To demonstrate these changes, two submodules are added, and the
  corresponding two packages are updated to use those two submodules for
  their sources.

  To keep things simple, most of the git-interaction is currently
  implemented in a simple Python script, not in Nix expressions; thus
  you must run `./prepare_sources.py > sources.json' in the root of the
  Nixpkgs repository whenever you init or deinit a submodule.  This
  limitation can be removed with additional work by porting that Python
  code to Nix expressions.

2 Motivation and purpose
========================

  First off, the most pedestrian benefit.  Representing the source
  version as a submodule makes it easier to update the version of
  sources used in Nixpkgs.  Just `git pull' in the submodule, `git add'
  in Nixpkgs, and commit, and you're done.  It's equally easy to update
  to a new tag; just `git fetch && git checkout' instead of `git pull'.
  In either case, there's no need to edit any Nix expression.

  Beyond that, this simple change enables an extremely powerful new
  workflow for open source software developers.

  If I wish to make a change to some software, I just initialize the
  corresponding submodule in nixpkgs, and start hacking:
  ,----
  | git submodule init sources/tools/system/supervise
  | cd sources/tools/system/supervise
  | vi foo.c # hack hack hack
  `----

  Any changes in the now-initialized submodule will be automatically
  picked up when I next build:
  ,----
  | nix-build ~/my-nixpkgs -A supervise
  `----

  If I have committer privileges to the project, pushing my changes then
  as easy as `git push' in the submodule.  If I don't have commit
  privileges, I can just go through the project's normal contribution
  workflow; with Github, that would be as easy as `hub fork && hub
  pull-request'.

  In either case, to add my change to nixpkgs, all I have to do is `git
  add' the now-changed submodule, commit, and push:
  ,----
  | git add sources/tools/system/supervise
  | git commit -m "supervise: update from blah to blah"
  | git push
  `----

  This is already a huge win on its own.  But it's what happens when I'm
  working on multiple pieces of software at once that really makes this
  transformative.

  I can check out any number of different projects with arbitrary
  dependency relationships, and it's easy as `nix-build -A somepkg' to
  automatically rebuild the tree with all my changes.

  If, for example, I'm working on a project written in Python packaged
  in nixpkgs, and I see an issue in CPython that I should fix, I just
  open up the CPython submodule, fix the issue, and my project
  immediately benefits.  If I didn't quite fix the problem, I can easily
  keep iterating, tweaking my own project and CPython simultaneously.

  It's likewise just as easy to distribute in-progress changes to
  multiple projects.  I commit my in-progress changes and push it to my
  nixpkgs fork.  Then if someone clones my nixpkgs fork, they can
  immediately start working on the same changed codebases with the same
  changes.  (This can be made incredibly simple with some tooling that
  uses techniques such as on-demand creation of an "[omega repo]".)

  I suspect there are numerous other advantages as well which are not
  yet obvious, and that making common operations so much cheaper may
  unlock a radically different way of working on open source.

[omega repo] https://github.com/twosigma/git-meta/wiki/The-Omega-Repo

3 Conclusion
============

  I think this way of organizing projects would be a truly
  transformative way to organize software development, and it would be a
  major incentive for using Nix as the backbone of a software project.
  In a certain way, this would position Nixpkgs as an "open source
  monorepo", a place where cross-project integration work could be done
  with ease, without any of the scaling issues of traditional monorepos.

  Of course, tying Nixpkgs so deeply to Git may be undesirable, though
  it wouldn't prevent us from doing anything we currently do.  There are
  some practical downsides as well; git submodules can be tricky to work
  with, though there are [projects] attempting to make them easier to
  use.  `builtins.fetchgit' currently has some scaling problems with its
  git cache, which may make it difficult to do something like this
  before the issues are fixed.

  Nevertheless, for projects other than Nixpkgs, such as separate
  Nixpkgs overlays for a few related packages, I think this kind of
  organization makes a lot of sense.  It would be nice to see some
  projects openly experiment with a submodule-based Nixpkgs overlay.

  What do you think?

[projects] https://github.com/twosigma/git-meta
See previous commit message for details.
@FRidh
Copy link
Member

FRidh commented Mar 21, 2018

Thank you for looking into this. There are pros and cons to using submodules.

These changes add support to nixpkgs for using git submodules in the nixpkgs git repository to find sources for derivations. This uses builtins.fetchgit to remove the need to specify a sha256, while still maintaining complete reproducibility.

With buildins.fetchGit you do not need a hash with a local checkout of the submodule (e.g. when editing). But, when have committed and pushed a certain version, .gitmodules the submodule path will contain a revision.

The advantages you've listed are nice ones. If we would move not just the source but an expression in a submodule, we get the added benefit of being able to set up different permissions, although it also has a clear disadvantage: not being able to see all expressions directly. Simply initializing all submodules is not an option as it takes to long.

@catern
Copy link
Contributor Author

catern commented Apr 4, 2018

If we would move not just the source but an expression in a submodule, we get the added benefit of being able to set up different permissions, although it also has a clear disadvantage: not being able to see all expressions directly.

Yes, I think that with regard to putting package expressions in the "same place" as package sources, submodules-in-nixpkgs has all the same issues as tarballs. Getting the package expression out of the submodule will have the same advantages and the same disadvantages as getting a package expression out of a tarball.

I think submodules might reduce the need to move package expressions to the "same place" as package sources, though. If the package source is a submodule of the nixpkgs repo, the package expressions and package source are already conceptually a lot "closer together", and maybe that obsoletes some of the reasons for putting expressions next to sources. (Sorry, super vague I know :))

On the other hand, Nix expressions in submodules could be really interesting for another use case: An overlay repo (or even an individual project) could put Nixpkgs in a submodule, rather than doing pinning by other means. Or to reverse it: maybe you could modularize Nixpkgs into multiple overlays, pulled in as submodules, which themselves have submodules pointing to the source code for their packages. Of course, then you'd have nested submodules, which sounds like a nightmare, but maybe with sufficient tooling could actually be very cool.

@FRidh
Copy link
Member

FRidh commented Jan 6, 2019

While it was an interesting experiment, it won't go in so I am closing this.

@FRidh FRidh closed this Jan 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants