New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetchGit: use a better caching scheme #2358
Conversation
The current usage technically works by putting multiple different repos in to the same git directory. However, it is very slow as Git tries very hard to find common commits between the two repositories. If the two repositories are large (like Nixpkgs and another long-running project,) it is maddeningly slow. This change busts the cache for existing deployments, but users will be promptly repaid in per-repository performance.
(doing some real-life testing now, outside of the automated tests.) |
🥗 seems to work okay from here on Darwin! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
I'm not really in favor of this because it multiplies the storage required for Nixpkgs (e.g. if you have a Nixpkgs clone from several repositories, like |
The problem is that this becomes unusable when used in combination with other projects. I've seen a warning like Alternatively using eg. commit 0 as the identity instead of the url might be an option. |
Right. Fetchgit can take multiple minutes per repository, just scanning for mutual histories. This makes —pure quite painful.
Unfortunately there is no true commit 0 because a a repository is a full dag and can support multiple initial commits.
An ugly alternative idea might be allow fetchgit to accept an optional cache key param to override the default.
As is though fetchgit is only really usable to fetch a single repo.
… On Aug 19, 2018, at 6:43 AM, Daiderd Jordan ***@***.***> wrote:
The problem is that this becomes unusable when used in combination with other projects. I've seen a warning like WARNING: no common commits when using a new repository for the first time, maybe that could be used somehow?
Alternatively using eg. commit 0 as the identity instead of the url might be an option.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
True, but there's no point in sharing history with orphan branches since they won't have anything in common. 😄 |
I did a fetch of the Linux kernel in to a repo already containing nixpkgs, and the fetch took an extra 100s. w.r.t. multiple parented repos: they can have something in common though :$ For an example, here is the beautiful future I want, Nixpkgs merged with Linux... Linix:
|
@edolstra Note that this is useful as we want to switch the hydra mercurial/git inputs plugins to the builtins.fetch*. |
If storage requirements for fetching from |
How about using the
|
I've implemented an alternative to |
mhm. I am a bit worried about running into API limits. When people have a few repositories 500 requests per IP can be reached easily. |
It is even worse 60 requests per hour: https://developer.github.com/v3/#rate-limiting |
Also downloading a tarball does not really require the API, |
The current usage technically works by putting multiple different
repos in to the same git directory. However, it is very slow as
Git tries very hard to find common commits between the two
repositories. If the two repositories are large (like Nixpkgs and
another long-running project,) it is maddeningly slow.
This change busts the cache for existing deployments, but users
will be promptly repaid in per-repository performance.
I ran the tests and the fetchGit test passed. There is an unrelated failure in brotli.sh, which is already failing on Darwin: https://hydra.nixos.org/build/79518003/nixlog/1 which shouldn't block this merging.