redo the notify events between eval and queue-runner #645

disassembler · 2019-03-20T19:33:39Z

(cherry picked from commit 73ca325)

We did this at IOHK to improve some performance problems with notifications. It's still not perfect, and there may be a good reason to not do it this way, but creating a PR for the community to discuss whether this is worthwhile, and if so if it needs improvements to get in.

(cherry picked from commit 73ca325)

grahamc · 2019-03-20T19:42:05Z

src/hydra-queue-runner/queue-monitor.cc

+
+        pqxx::result res = txn.parameterized
+          ("select id, project, jobset, job, drvPath, maxsilent, timeout, timestamp, globalPriority, priority from Builds "
+          "where id IN (select build from jobsetevalmembers where eval = $1) order by globalPriority desc, id")


This looks like it could be a fairly simple join, and avoid a temporary table.

grahamc · 2019-03-21T18:36:56Z

from clever on IRC:

when hydra does a new eval, the old version reports the lowest build# it just added in the eval. so, if a given eval creates builds 6, 7, and 8, it will report 6 over the psql event channel the queue-runner will then query psql, for every build >=6, that is unfinished, and begin to process jobs. the problem, is that build 3, was cached from a previous eval, and wont be processed. and so, the github status plugin, never gets told about build3 finishing on the new eval and github is never told that the job is still passing on the new git rev. my change, makes it report the eval# to queue-runner, which then refreshes every build in the eval

edolstra · 2019-03-21T21:43:11Z

src/hydra-queue-runner/queue-monitor.cc

@@ -74,6 +75,53 @@ struct PreviousFailure : public std::exception {
 };


+bool State::getQueuedBuildsInEval(Connection & conn,
+    ref<Store> destStore, unsigned int evalId)


There is a lot of code duplication between this and getQueuedBuilds, maybe that can be avoided?

edolstra · 2019-03-21T21:47:41Z

If I understand correctly what this does, it could cause a lot of extra load on hydra.nixos.org, since for every Nixpkgs/NixOS evaluation, it will have to process every finished build again and run plugins. That's tens of thousands of builds per evaluation...

cleverca22 · 2019-03-25T17:08:23Z

yeah, it may be best to not upstream this until the hydra-notify perf issues are fixed, possibly as a result of redoing the whole notify system

dhess · 2021-08-23T17:37:28Z

FYI, I've rewritten this to work with recent Hydra versions here:

https://github.com/hackworthltd/hydra/commit/f9bf7f6f4982ef35fd64d0c6f88a96a182de5dff

We've been using it in production for awhile and it seems to work reasonably well, with one caveat that I'll describe below.

I dunno why IOHK originally wrote this patch (something to do with performance?), but the reason we use it is as follows:

Say you have a Hydra jobset with 20 jobs. Your Hydra is set up to notify GitHub of the status of this jobset.
You open a PR, which kicks off a Hydra build. Say all 20 jobs complete, and GitHub shows the status as "20 checks passed."
Now you make a small change to the PR and push. Say this change only causes 5 of the jobs to re-run, and the rest are cached.
GitHub will now only show "5 checks passed," and that's all it ever will report.

As the user, there is no way to know whether the remaining 15 jobs are still running, the reporting failed, or what, unless you go check Hydra. This is not very nice UX, as normally you only need to check Hydra when a job fails and you want to see the logs.

So, this patch fixes the problem by ensuring that all 20 job statuses are reported in this scenario, and that's why we use it.

There is one problem, however. This patch (or at least my rewritten version — I'm not sure whether this issue exists in @cleverca22's original patch) will report all cached statuses for the given job to GitHub. So, for example, if a particular job has been cached for the last 4 commits to main, and another build is triggered where that job is once again cached, Hydra will report 5 status updates to GitHub for the one job name (with different job IDs). This is not a problem for the GitHub status integration, as GitHub will happily ignore the redundant status updates.

However, if you want to do something like kick off a GitHub Action workflow whenever a job named foo completes successfully (e.g., with a Hydra GitHub Action plugin), then if you're also running this patch, and foo has been cached the past 5 builds, then Hydra will trigger the workflow 5 times! :)

So while I'm not requesting that this particular patch should be merged, it would be nice if one could configure a Hydra to report cached status updates to GitHub when a jobset is built (and just once, rather than n times when the job has been cached n times).

(There is currently no GitHub Action plugin for Hydra, but we have a very hacky one here, which is how I discovered this wrinkle in the eval patch: https://github.com/hackworthltd/hydra/commit/a5ce9dce8d40ded1b535e31f3db418e7795aa226)

dhess · 2021-08-23T17:39:18Z

Two more things:

If there are performance concerns for the NixOS Hydra, this could be made an optional feature.
I'm happy to open a new issue for the cached status feature, if anyone thinks it should be in its own issue.

redo the notify events between eval and queue-runner

4ea39d7

(cherry picked from commit 73ca325)

grahamc reviewed Mar 20, 2019

View reviewed changes

edolstra reviewed Mar 21, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redo the notify events between eval and queue-runner #645

redo the notify events between eval and queue-runner #645

disassembler commented Mar 20, 2019

grahamc Mar 20, 2019

grahamc commented Mar 21, 2019

edolstra Mar 21, 2019

edolstra commented Mar 21, 2019

cleverca22 commented Mar 25, 2019

dhess commented Aug 23, 2021

dhess commented Aug 23, 2021

redo the notify events between eval and queue-runner #645

Are you sure you want to change the base?

redo the notify events between eval and queue-runner #645

Conversation

disassembler commented Mar 20, 2019

grahamc Mar 20, 2019

Choose a reason for hiding this comment

grahamc commented Mar 21, 2019

edolstra Mar 21, 2019

Choose a reason for hiding this comment

edolstra commented Mar 21, 2019

cleverca22 commented Mar 25, 2019

dhess commented Aug 23, 2021

dhess commented Aug 23, 2021