Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redo the notify events between eval and queue-runner #645

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

disassembler
Copy link
Member

(cherry picked from commit 73ca325)

We did this at IOHK to improve some performance problems with notifications. It's still not perfect, and there may be a good reason to not do it this way, but creating a PR for the community to discuss whether this is worthwhile, and if so if it needs improvements to get in.


pqxx::result res = txn.parameterized
("select id, project, jobset, job, drvPath, maxsilent, timeout, timestamp, globalPriority, priority from Builds "
"where id IN (select build from jobsetevalmembers where eval = $1) order by globalPriority desc, id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could be a fairly simple join, and avoid a temporary table.

@grahamc
Copy link
Member

grahamc commented Mar 21, 2019

from clever on IRC:

when hydra does a new eval, the old version reports the lowest build# it just added in the eval. so, if a given eval creates builds 6, 7, and 8, it will report 6 over the psql event channel the queue-runner will then query psql, for every build >=6, that is unfinished, and begin to process jobs. the problem, is that build 3, was cached from a previous eval, and wont be processed. and so, the github status plugin, never gets told about build3 finishing on the new eval and github is never told that the job is still passing on the new git rev. my change, makes it report the eval# to queue-runner, which then refreshes every build in the eval

@@ -74,6 +75,53 @@ struct PreviousFailure : public std::exception {
};


bool State::getQueuedBuildsInEval(Connection & conn,
ref<Store> destStore, unsigned int evalId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code duplication between this and getQueuedBuilds, maybe that can be avoided?

@edolstra
Copy link
Member

If I understand correctly what this does, it could cause a lot of extra load on hydra.nixos.org, since for every Nixpkgs/NixOS evaluation, it will have to process every finished build again and run plugins. That's tens of thousands of builds per evaluation...

@cleverca22
Copy link
Contributor

yeah, it may be best to not upstream this until the hydra-notify perf issues are fixed, possibly as a result of redoing the whole notify system

@dhess
Copy link
Contributor

dhess commented Aug 23, 2021

FYI, I've rewritten this to work with recent Hydra versions here:

https://github.com/hackworthltd/hydra/commit/f9bf7f6f4982ef35fd64d0c6f88a96a182de5dff

We've been using it in production for awhile and it seems to work reasonably well, with one caveat that I'll describe below.

I dunno why IOHK originally wrote this patch (something to do with performance?), but the reason we use it is as follows:

  • Say you have a Hydra jobset with 20 jobs. Your Hydra is set up to notify GitHub of the status of this jobset.
  • You open a PR, which kicks off a Hydra build. Say all 20 jobs complete, and GitHub shows the status as "20 checks passed."
  • Now you make a small change to the PR and push. Say this change only causes 5 of the jobs to re-run, and the rest are cached.
  • GitHub will now only show "5 checks passed," and that's all it ever will report.

As the user, there is no way to know whether the remaining 15 jobs are still running, the reporting failed, or what, unless you go check Hydra. This is not very nice UX, as normally you only need to check Hydra when a job fails and you want to see the logs.

So, this patch fixes the problem by ensuring that all 20 job statuses are reported in this scenario, and that's why we use it.

There is one problem, however. This patch (or at least my rewritten version — I'm not sure whether this issue exists in @cleverca22's original patch) will report all cached statuses for the given job to GitHub. So, for example, if a particular job has been cached for the last 4 commits to main, and another build is triggered where that job is once again cached, Hydra will report 5 status updates to GitHub for the one job name (with different job IDs). This is not a problem for the GitHub status integration, as GitHub will happily ignore the redundant status updates.

However, if you want to do something like kick off a GitHub Action workflow whenever a job named foo completes successfully (e.g., with a Hydra GitHub Action plugin), then if you're also running this patch, and foo has been cached the past 5 builds, then Hydra will trigger the workflow 5 times! :)

So while I'm not requesting that this particular patch should be merged, it would be nice if one could configure a Hydra to report cached status updates to GitHub when a jobset is built (and just once, rather than n times when the job has been cached n times).

(There is currently no GitHub Action plugin for Hydra, but we have a very hacky one here, which is how I discovered this wrinkle in the eval patch: https://github.com/hackworthltd/hydra/commit/a5ce9dce8d40ded1b535e31f3db418e7795aa226)

@dhess
Copy link
Contributor

dhess commented Aug 23, 2021

Two more things:

  1. If there are performance concerns for the NixOS Hydra, this could be made an optional feature.
  2. I'm happy to open a new issue for the cached status feature, if anyone thinks it should be in its own issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants