fetchGit: use a better caching scheme #2358

graham-at-target · 2018-08-17T15:35:20Z

The current usage technically works by putting multiple different
repos in to the same git directory. However, it is very slow as
Git tries very hard to find common commits between the two
repositories. If the two repositories are large (like Nixpkgs and
another long-running project,) it is maddeningly slow.

This change busts the cache for existing deployments, but users
will be promptly repaid in per-repository performance.

I ran the tests and the fetchGit test passed. There is an unrelated failure in brotli.sh, which is already failing on Darwin: https://hydra.nixos.org/build/79518003/nixlog/1 which shouldn't block this merging.

The current usage technically works by putting multiple different repos in to the same git directory. However, it is very slow as Git tries very hard to find common commits between the two repositories. If the two repositories are large (like Nixpkgs and another long-running project,) it is maddeningly slow. This change busts the cache for existing deployments, but users will be promptly repaid in per-repository performance.

graham-at-target · 2018-08-17T15:38:41Z

(doing some real-life testing now, outside of the automated tests.)

graham-at-target · 2018-08-17T15:47:56Z

🥗 seems to work okay from here on Darwin!

shlevy

💯

edolstra · 2018-08-19T10:10:07Z

I'm not really in favor of this because it multiplies the storage required for Nixpkgs (e.g. if you have a Nixpkgs clone from several repositories, like NixOS/nixpkgs and NixOS/nixos-channels).

LnL7 · 2018-08-19T10:43:46Z

The problem is that this becomes unusable when used in combination with other projects. I've seen a warning like WARNING: no common commits when using a new repository for the first time, maybe that could be used somehow?

Alternatively using eg. commit 0 as the identity instead of the url might be an option.

grahamc · 2018-08-19T11:11:19Z

Right. Fetchgit can take multiple minutes per repository, just scanning for mutual histories. This makes —pure quite painful. Unfortunately there is no true commit 0 because a a repository is a full dag and can support multiple initial commits. An ugly alternative idea might be allow fetchgit to accept an optional cache key param to override the default. As is though fetchgit is only really usable to fetch a single repo.

…

On Aug 19, 2018, at 6:43 AM, Daiderd Jordan ***@***.***> wrote: The problem is that this becomes unusable when used in combination with other projects. I've seen a warning like WARNING: no common commits when using a new repository for the first time, maybe that could be used somehow? Alternatively using eg. commit 0 as the identity instead of the url might be an option. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

LnL7 · 2018-08-19T11:23:21Z

True, but there's no point in sharing history with orphan branches since they won't have anything in common. 😄

grahamc · 2018-08-19T13:00:46Z

I did a fetch of the Linux kernel in to a repo already containing nixpkgs, and the fetch took an extra 100s.

w.r.t. multiple parented repos: they can have something in common though :$ For an example, here is the beautiful future I want, Nixpkgs merged with Linux... Linix:

$ git clone git@github.com:nixos/nixpkgs.git
$ cd nixpkgs
$ git remote add linux git@github.com:torvalds/linux.git
$ git fetch linux
$ git merge --allow-unrelated-histories linux/master

AmineChikhaoui · 2018-08-20T08:10:22Z

@edolstra Note that this is useful as we want to switch the hydra mercurial/git inputs plugins to the builtins.fetch*.

Mic92 · 2018-08-28T23:14:55Z

If storage requirements for fetching from NixOS/nixos-channels and NixOS/nixpkgs are really the only blocker for this pull request I would rather also mirror the branches from nixos-channels to nixpkgs.

lheckemann · 2018-09-20T15:52:02Z

How about using the name in the attrset passed to fetchGit when available? This would not only allow sharing between nixpkgs and nixpkgs-channels (by always setting the name to nixpkgs even when fetching from channels), but also:

Allow sharing between other repos which may be fetched via different URLs (e.g. a local clone in addition to an HTTP one, to save downloading the whole repo)
encourage people to set names on their fetchGit calls, which has the side benefit of providing a more useful store path than {hash}-source.

edolstra · 2018-11-20T20:04:54Z

I've implemented an alternative to fetchGit that (at least for GitHub repos) eliminates the disk space issue, at the cost of not returning a revCount attribute: edolstra@7d33eb8. Since this could be used for fetching Nixpkgs, it makes the disk space usage of fetchGit less of an issue.

Mic92 · 2018-11-20T23:33:18Z

mhm. I am a bit worried about running into API limits. When people have a few repositories 500 requests per IP can be reached easily.

Mic92 · 2018-11-20T23:34:48Z

It is even worse 60 requests per hour: https://developer.github.com/v3/#rate-limiting

Mic92 · 2018-11-20T23:38:52Z

Also downloading a tarball does not really require the API,
since fetchurl can be used directly to fetch a certain tag.

shlevy approved these changes Aug 17, 2018

View reviewed changes

edolstra merged commit 02098d2 into NixOS:master Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetchGit: use a better caching scheme #2358

fetchGit: use a better caching scheme #2358

graham-at-target commented Aug 17, 2018

graham-at-target commented Aug 17, 2018

graham-at-target commented Aug 17, 2018

shlevy left a comment

edolstra commented Aug 19, 2018

LnL7 commented Aug 19, 2018

grahamc commented Aug 19, 2018 via email

LnL7 commented Aug 19, 2018

grahamc commented Aug 19, 2018

AmineChikhaoui commented Aug 20, 2018

Mic92 commented Aug 28, 2018

lheckemann commented Sep 20, 2018

edolstra commented Nov 20, 2018

Mic92 commented Nov 20, 2018

Mic92 commented Nov 20, 2018 •

edited

Mic92 commented Nov 20, 2018

fetchGit: use a better caching scheme #2358

fetchGit: use a better caching scheme #2358

Conversation

graham-at-target commented Aug 17, 2018

graham-at-target commented Aug 17, 2018

graham-at-target commented Aug 17, 2018

shlevy left a comment

Choose a reason for hiding this comment

edolstra commented Aug 19, 2018

LnL7 commented Aug 19, 2018

grahamc commented Aug 19, 2018 via email

LnL7 commented Aug 19, 2018

grahamc commented Aug 19, 2018

AmineChikhaoui commented Aug 20, 2018

Mic92 commented Aug 28, 2018

lheckemann commented Sep 20, 2018

edolstra commented Nov 20, 2018

Mic92 commented Nov 20, 2018

Mic92 commented Nov 20, 2018 • edited

Mic92 commented Nov 20, 2018

Mic92 commented Nov 20, 2018 •

edited