Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetchGit: use a better caching scheme #2358

Merged
merged 1 commit into from Nov 20, 2018

Conversation

graham-at-target
Copy link
Contributor

The current usage technically works by putting multiple different
repos in to the same git directory. However, it is very slow as
Git tries very hard to find common commits between the two
repositories. If the two repositories are large (like Nixpkgs and
another long-running project,) it is maddeningly slow.

This change busts the cache for existing deployments, but users
will be promptly repaid in per-repository performance.


I ran the tests and the fetchGit test passed. There is an unrelated failure in brotli.sh, which is already failing on Darwin: https://hydra.nixos.org/build/79518003/nixlog/1 which shouldn't block this merging.

The current usage technically works by putting multiple different
repos in to the same git directory. However, it is very slow as
Git tries very hard to find common commits between the two
repositories. If the two repositories are large (like Nixpkgs and
another long-running project,) it is maddeningly slow.

This change busts the cache for existing deployments, but users
will be promptly repaid in per-repository performance.
@graham-at-target
Copy link
Contributor Author

(doing some real-life testing now, outside of the automated tests.)

@graham-at-target
Copy link
Contributor Author

🥗 seems to work okay from here on Darwin!

Copy link
Member

@shlevy shlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@edolstra
Copy link
Member

I'm not really in favor of this because it multiplies the storage required for Nixpkgs (e.g. if you have a Nixpkgs clone from several repositories, like NixOS/nixpkgs and NixOS/nixos-channels).

@LnL7
Copy link
Member

LnL7 commented Aug 19, 2018

The problem is that this becomes unusable when used in combination with other projects. I've seen a warning like WARNING: no common commits when using a new repository for the first time, maybe that could be used somehow?

Alternatively using eg. commit 0 as the identity instead of the url might be an option.

@grahamc
Copy link
Member

grahamc commented Aug 19, 2018 via email

@LnL7
Copy link
Member

LnL7 commented Aug 19, 2018

True, but there's no point in sharing history with orphan branches since they won't have anything in common. 😄

@grahamc
Copy link
Member

grahamc commented Aug 19, 2018

I did a fetch of the Linux kernel in to a repo already containing nixpkgs, and the fetch took an extra 100s.

w.r.t. multiple parented repos: they can have something in common though :$ For an example, here is the beautiful future I want, Nixpkgs merged with Linux... Linix:

$ git clone git@github.com:nixos/nixpkgs.git
$ cd nixpkgs
$ git remote add linux git@github.com:torvalds/linux.git
$ git fetch linux
$ git merge --allow-unrelated-histories linux/master

@AmineChikhaoui
Copy link
Member

@edolstra Note that this is useful as we want to switch the hydra mercurial/git inputs plugins to the builtins.fetch*.

@Mic92
Copy link
Member

Mic92 commented Aug 28, 2018

If storage requirements for fetching from NixOS/nixos-channels and NixOS/nixpkgs are really the only blocker for this pull request I would rather also mirror the branches from nixos-channels to nixpkgs.

@lheckemann
Copy link
Member

How about using the name in the attrset passed to fetchGit when available? This would not only allow sharing between nixpkgs and nixpkgs-channels (by always setting the name to nixpkgs even when fetching from channels), but also:

  • Allow sharing between other repos which may be fetched via different URLs (e.g. a local clone in addition to an HTTP one, to save downloading the whole repo)
  • encourage people to set names on their fetchGit calls, which has the side benefit of providing a more useful store path than {hash}-source.

@edolstra edolstra merged commit 02098d2 into NixOS:master Nov 20, 2018
@edolstra
Copy link
Member

I've implemented an alternative to fetchGit that (at least for GitHub repos) eliminates the disk space issue, at the cost of not returning a revCount attribute: edolstra@7d33eb8. Since this could be used for fetching Nixpkgs, it makes the disk space usage of fetchGit less of an issue.

@Mic92
Copy link
Member

Mic92 commented Nov 20, 2018

mhm. I am a bit worried about running into API limits. When people have a few repositories 500 requests per IP can be reached easily.

@Mic92
Copy link
Member

Mic92 commented Nov 20, 2018

It is even worse 60 requests per hour: https://developer.github.com/v3/#rate-limiting

@Mic92
Copy link
Member

Mic92 commented Nov 20, 2018

Also downloading a tarball does not really require the API,
since fetchurl can be used directly to fetch a certain tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants