Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0021] Pull Request Testing on Hydra #21

Closed
wants to merge 3 commits into from

Conversation

globin
Copy link
Member

@globin globin commented Nov 6, 2017

@globin globin changed the title [RFC 00XX] Pull Request Testing on Hydra [RFC 0021] Pull Request Testing on Hydra Nov 6, 2017
@globin globin force-pushed the nixborg branch 2 times, most recently from 5c17235 to 73f3711 Compare November 6, 2017 12:38
@zimbatm
Copy link
Member

zimbatm commented Nov 6, 2017

Sorry for bringing this up again, can you expand on why we can't re-use bors? It would be nice to re-use existing software if we can. Basically can Hydra:

  • build a set of fixed branches every time some code is pushed to them?
  • report the build status as a github build status on the commit that has been built?

@globin
Copy link
Member Author

globin commented Nov 7, 2017

Currently it can do neither, I also have too many difficulties writing perl to be able to implement that properly..

Also we definitely don't want the bot merging PRs right now, this might be an option in the future but currently it should only give us an opportunity to test PRs more easily.

@Profpatsch
Copy link
Member

Sorry for bringing this up again, can you expand on why we can't re-use bors? It would be nice to re-use existing software if we can.

Would be awesome to reuse the work of the rust team. Especially considering not very many people know (or want to write) perl these days, but lots of people want to learn Rust anyway.

@moretea
Copy link

moretea commented Nov 7, 2017

@zimbatm @Profpatsch AFAIK, the bors currently used by Cargo (package manager for Rust) is bors-ng which is implemented in Elixir.

It appears that they don't use it yet for Rust itself (see bottom comments in this thread)

@globin
Copy link
Member Author

globin commented Nov 7, 2017

Can we please keep discussing PR testing on hydra for now which is a small piece of work compared to changing our whole CI system.

@Ericson2314
Copy link
Member

Ericson2314 commented Nov 7, 2017

I think we all agree that eventually, only CI should be changing master (i.e. the ref itself). Is it a good idea to tackle this in this RFC, or best to leave it for later? Relatedly, I'm a bit unsure on rebasing or merging PRs.

@7c6f434c
Copy link
Member

7c6f434c commented Nov 8, 2017

@Ericson2314

This «eventually» requires more Hydra capacity than there seems to be right now. And more trust in the handling of transient failures than we currently have (and a solution probably requires that the transient failures, and not the capacity, are the bottleneck for large rebuilds). Also a clear idea of what breakage is acceptable at what points (this obviously depends on the previous points, because we need confidence that we find out about failures, and find out quickly). This RFC is a good step to try now (and see what needs fixing), and only-CI-moves-master (and even only-CI-merges-PRs) seems significantly out of scope.

@globin
Copy link
Member Author

globin commented Nov 8, 2017

I generally agree with most of the points mentioned but I wanted to keep this RFC rather small in scope to not reach a deadlock in discussion and planning while not moving forward :)

Copy link
Member

@fpletz fpletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real question here is if we want PR testing on Hydra as a service for the community or not.

I'm not sure if we have enough resources but this should be implemented. Maybe even on a second Hydra or with another bot. The specifics aren't important.

@Ericson2314
Copy link
Member

Ericson2314 commented Nov 9, 2017

@7c6f434c the incremental step would be having CI able to push to master (if it's a fast-forward to the commit it just built successfully), without requiring CI to push to master. The extra step of going back to merge a PR after CI succeed is generally a waste of human time, too.

That should ideally be just an additional @-mention command + a conditional git push HEAD:master || true + credentials for bot, but yes I don't want to derail this on bonus features if its more work than that.

[design]: #detailed-design

The general idea is to have a service which receives webhooks from github,
then requires some acknowledgement of a maintainer to build/test a PR and
Copy link
Member

@7c6f434c 7c6f434c Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: «a maintainer» — maybe add something about supporting a flexible definition of a maintainer (any NixOS project member/Nixpkgs committer or the package maintainer who doesn't yet have access) to «Future work»? Not in the main body, because this discussion should not delay checking if Hydra can build all PRs. Maybe also people who have multiple accepted commits to the package.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19 also.

AFAIK the current implementation of that bot just has a hardcoded list of people who should be allowed access.

I think that people from the maintainers file + anyone who currently has commit access should be allowed to instruct the bot via @-mentions.

Copy link
Member Author

@globin globin Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently "a maintainer" refers to a contributor with commit access, "member" in GitHub terminology, I'll clarify this. In the current test instance this doesn't work as @nixborg is not member of the NixOS GitHub org and therefore cannot read the membership status. That's why it currently has an additional hard-coded whitelist, which should be removed in the future..

- Which "release.nix" should be used to test:
- We probably want to test aarch64 and x86_64-darwin, not only x86_64-linux, as only a little number of maintainers/pull requesters have access to these architectures.
- We probably want to also test NixOS module tests, especially for PRs touching these
- Can Hydra handle the additional workload by this, we probably have to start with a small number of PRs being tested to evaluate this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, and maybe perform initial tests only x86_64-linux? How the Hydra capacity (or maybe scaling costs, if cloud builds are used regularly — I do not know) compare across targets?


Some small remarks on the architecture:
- the rebasing is done in order to:
- make sure Hydra builds a commit that has been reviewed by a maintainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note that the bot compares the time the nixborg comment was made to when the last push was, to ensure there isn't a race condition between the comment and a malicious push.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@globin
Copy link
Member Author

globin commented Nov 12, 2017

Added some minor clarifications, after the comments above and am happy that @grahamc is co-authoring this.

@edolstra do you have some comments on this?

@Ericson2314
Copy link
Member

Ericson2314 commented Nov 13, 2017

It just occurred to me that in the failing case I wish the original branch was being built, so I'd have some cached binaries to help debug. But I suppose the PR could be updated to the nixborg's rebase itself.

Also, does it rebase on master or non-master when the target branch is non-master (e.g. staging)? The answer, arrived at empirically, is the actual target branch.

@globin
Copy link
Member Author

globin commented Nov 16, 2017

Says so in the RFC, too:

  1. a nixborg-worker rebases the PR on the base branch of the PR (mainly master, staging or release-XX.XX)

@zimbatm
Copy link
Member

zimbatm commented Nov 19, 2017

How much work left is required to make this work?

The sooner we have any sort of automation the better, and it's hard to predict the problems that might come up without deploying and running the code. So instead of mulling over the RFC too much, I'm in favor of getting something running and have a trial period, let's say 3 months. After that period we will review the state of things and decide if we want to keep it or not.

During that period, the implementer need to have access to the machines where the code is deployed so that they can fix and iterate.

The only big concern would be to overload Hydra. If that happens we either need a separate Hydra deployment or a way to pull the plug until a solution can be found.

@dtzWill
Copy link
Member

dtzWill commented Feb 20, 2018

This may be incorrect or premature, but... hasn't ofborg filled this gap more or less?

@7c6f434c
Copy link
Member

7c6f434c commented Feb 20, 2018 via email

@zimbatm
Copy link
Member

zimbatm commented Feb 20, 2018

It's coming. The next step is going to be able to share the binary cache between ofborg and hydra so that hydra can basically become a noop.

@edolstra
Copy link
Member

In that case, we should just get rid of Hydra?

@7c6f434c
Copy link
Member

@zimbatm not yet close in power to what is needed

@edolstra it's like a blockchain, if in the future Borg can perform a 51% buildrate attack against Hydra…

@edolstra
Copy link
Member

Closing after discussion in the RFC SC meeting: this is obsoleted by ofborg.

@edolstra edolstra closed this Jan 10, 2019
infinisil referenced this pull request in nixpkgs-architecture/rfc-140 Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants