Add output from hydra-queue-runner for unrunnable steps (no machine). #651

kquick · 2019-04-23T19:18:26Z

This helps generate log information (from the hydra-queue-runner service) about steps for which there is no available machine. This is currently a silent failure.

This patch works by seeing if any of the currently busy machines could build the remaining steps and if a step cannot be built by any busy machine (and it wasn't already assigned to a non-busy machine) then it reports it as a step with no machine available to build it.

This might over-report in the case where build machines where down, but I think that is still a valid condition for reporting.

There is no Web-UI display of this condition; it is found by examining the log output from the hydra-queue-runner service. Web-UI display would need to have a web-page replication of this logic, or else an addition to the database to store this information, both of which are more ambitious undertakings left to (possibly) future pull request efforts.

Issues:

kquick · 2019-04-23T19:20:40Z

Sample output:

Apr 23 11:58:34 hydra hydra-queue-runner[28483[: NO MACHINE AVAILABLE to run step '/nix/store/b41lzs0xshmkq6bban48h8k4y207abxz-llvm-5.0.2.drv' (needs system type 'x86_64-linux:big-parallel')

src/hydra-queue-runner/dispatcher.cc

kquick · 2019-04-23T22:25:24Z

Please hold off a merge... this is overreporting and needs an update (which I'm working on).

hydra-server: add limit parameter on search path

@inc

The test was failing with: machine# Can't locate LWP/UserAgent.pm in @inc (you may need to install the LWP::UserAgent module)...

test.api: use Hydra perl dependencies to run `api-test.pl`

The creation of the `pg_trgm` extension needs superuser power. So, this patch makes the extension creation in the Hydra NixOS module when a local database is used. If it is not possible to create this extension (remote database for instance with nosuperuser), the creation of the `pg_trgm` index is skipped (this index speedup queries on builds.drvpath) and warnings are emitted: initialising the Hydra database schema... WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm" WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath) This allows to keep smooth migrations: the migration process doesn't require a manual step (but this manual step is recommended on big remote databases).

Create `pg_trgm` extension in the NixOS module

This adds a `InfluxDBNotification` plugin which is configured as: ``` <influxdb> url = http://127.0.0.1:8086 db = hydra </influxdb> ``` which will write a notification for every finished job to the configured database in InfluxDB looking like: ``` hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212 ```

Added the InfluxDBNotification plugin including a NixOS test

Replaced you with look

Corrected grammar typo

It now receives notifications about started/finished builds/steps via PostgreSQL. This gets rid of the (substantial) overhead of starting hydra-notify for every event. It also allows other programs (even on other machines) to listen to Hydra notifications.

* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'. * 'eval_failed' has the format '<tmpId>'. (The cause of the error can be found in the database.) * 'eval_added' has the format '<tmpId>:<evalId>'.

Plugins are now disabled at startup time unless there is some relevant configuration in hydra.conf. This avoids hydra-notify having to do a lot of redundant work (a lot of plugins did a lot of database queries *before* deciding they were disabled). Note: BitBucketStatus users will need to add 'enable_bitbucket_status = 1' to hydra.conf.

In particular, doing a 'select * from Jobsets where ...' must be avoided, because the 'errormsg' column can be very big.

Turn hydra-notify into a daemon

May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57. (cherry picked from commit 438ddf5)

This adds the following (pre-existing) attributes to the jobset response: - nrtotal - lastcheckedtime - starttime - checkinterval - triggertime - fetcherrormsg - errortime

The previous code converted option values to ints when the value contained a digit somewhere. This is too eager since it also converts strings like `release-0.2` to an int which should not happen. We now only convert to int when the value is an integer.

…ion-values-to-ints GitInput: only convert integer option values to int

This was clearly an error in the original part-2 of the diff, and specifically breaks when two projects have a jobset of the same name.

schema/Builds: use jobset_id instead of jobset name matches

The correct filename is `hydra-api.yml` not `hydra.yml`.

Correct the link to hydra-api.yml file

runHyda automatically starts hydra and postgres: ``` $ nix-shell -A runHydra ``` The shell receives hydra from the working copy as buildInput. Running hydra, queue-runner, evaluator and postgres is managed by foreman (https://github.com/ddollar/foreman) and configured in `Procfile`.

This adds a `devShell` which unlike `runHydra` doesn't start hydra automatically and doesn't receive hydra as build input. It is better suited for interactive development cycles: ``` $ nix-shell -A devShell $ ./bootstrap $ configurePhase $ make $ # hack hack hack $ foreman start # test test test <C-c> $ # hack hack hack ```

Use custom ports so hydra and postgres can run in environments where the default ports are in use already.

Add sections about using `runHydra` and `foreman`

Using `pg_ctl status` is more reliable than relying checking an open port via netcat.

Co-authored-by: Graham Christensen <graham@grahamc.com>

Execute hydra-dev-server instead of hydra-server Co-authored-by: Graham Christensen <graham@grahamc.com>

- scripts -> foreman - drop runHydra - drop devShell - move postgresql to buildInputs

- drop any mention of runHydra - link foreman and mention Procfile

Devshell

The current implementation will pass all values to `create_or_update` method. The missing values will end up as `undef` (or `NULL`) when assigned to `%update`. Thus, for columns that are NOT NULL, when, for example, flakes are not used, will result in a horrible: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: null value in column "type" violates not-null constraint DETAIL: Failing row contains (.jobsets, 118, hydra, hydra jobsets, src, hydra/jobsets.nix, null, null, null, 1589536378, 1, 0, 0, , 3, 30, 100, null, null, 1589536379, null, null). [for Statement "UPDATE jobsets SET checkinterval = ?, description = ?, enableemail = ?, nixexprinput = ?, nixexprpath = ?, type = ? WHERE ( ( name = ? AND project = ? ) )" with ParamValues: 1='30', 2='hydra jobsets', 3='0', 4='src', 5='hydra/jobsets.nix', 6=undef, 7='.jobsets', 8='hydra'] at /nix/store/lsf81ip9ybxihk5praf2n0nh14a6i9j0-hydra-0.1.19700101.DIRTY/libexec/hydra/lib/Hydra/Helper/AddBuilds.pm line 50 This change just omits adding such values to `%update`, which results in PostgreSQL assigning the default values.

`type` column in `Jobsets` is defined as NOT NULL. However, the original upgrade script adding this column ommited the constraint.

…type-value Fix declarative jobset missing type value

The original code would return standard "Please come back later" page when there are only fetch errors on a newly setup declarative project. The problem is that there are two types of errors: standard errors and fetch errors. Each is acompanied by a corresponding field for time of occurence. Standard errors use 'errortime', while fetch errors have 'lastchecktime' set to the time of the error. Unfortunately, jobset.tt file was only using 'errortime' for displaying the time. This would result in the following errors in logs: Couldn't render template "date error - bad time/date string: expects 'h:m:s d:m:y' got: '' This change includes using 'lastchecktime' when rendering the error times.

Render the jobset page correctly when there are fetch errors

knl · 2020-06-01T14:12:39Z

Please hold off a merge... this is overreporting and needs an update (which I'm working on).

Hi @kquick do you have an update on this one? We found it might be useful on our deployment, as we stumbled upon the same problem several times.

Co-Authored-By: kquick <quick@sparq.org>

…nesteps

kquick · 2020-06-03T17:35:04Z

These changes got out of date and a rebase got messy, so there's a newer version at #774. That one resolved the over-reporting issue and is ready for merging consideration.

Closing this PR as replaced by #774.

grahamc reviewed Apr 23, 2019

View reviewed changes

src/hydra-queue-runner/dispatcher.cc Outdated Show resolved Hide resolved

edolstra and others added 27 commits June 19, 2019 16:45

Merge pull request NixOS#656 from nlewo/search-limit-param

c85bf66

hydra-server: add limit parameter on search path

test.api: use Hydra perl dependencies to run api-test.pl

c620bc2

The test was failing with: machine# Can't locate LWP/UserAgent.pm in @inc (you may need to install the LWP::UserAgent module)...

sql: refactor some sql statements to lowercase

42784a9

Merge pull request NixOS#657 from nlewo/create-pg-trgm-in-module

5e9439e

Create `pg_trgm` extension in the NixOS module

Merge pull request NixOS#665 from basvandijk/influxdb

9ec258d

Added the InfluxDBNotification plugin including a NixOS test

Corrected grammer typo

bb4f349

Replaced you with look

Merge pull request NixOS#666 from craigem/you_look

f13a2cb

Corrected grammar typo

Merge remote-tracking branch 'origin/master' into flake

a74dec6

Use nixFlakes in devShell

1c9e22e

Add hydra-notify service

c7861b8

Separate payload elements using \t

7114d2a

Send notifications when evaluations start/finish/fail

976d88d

* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'. * 'eval_failed' has the format '<tmpId>'. (The cause of the error can be found in the database.) * 'eval_added' has the format '<tmpId>:<evalId>'.

Fix duplicate step_finished notification

7c7cc8c

hydra-notify: Fix processing notifications

72c3637

Add a 'step_started' notification

d08cfa4

Restart hydra-notify

e2537f7

hydra-notify: Don't do an unnecessary fetch of Jobsets

f49a089

Avoid fetching Projects/Jobsets just to get the name column

92d8d6b

In particular, doing a 'select * from Jobsets where ...' must be avoided, because the 'errormsg' column can be very big.

Merge pull request NixOS#668 from NixOS/notifications

f17cd94

Turn hydra-notify into a daemon

Merge remote-tracking branch 'origin/master' into flake

2de52d8

Extend the jobset API response

919195b

This adds the following (pre-existing) attributes to the jobset response: - nrtotal - lastcheckedtime - starttime - checkinterval - triggertime - fetcherrormsg - errortime

edolstra and others added 23 commits May 12, 2020 16:14

Fix build

15a45f1

Merge pull request NixOS#756 from basvandijk/only-convert-integer-opt…

301e4e0

…ion-values-to-ints GitInput: only convert integer option values to int

schema/Builds: use jobset_id instead of jobset name matches

548fd8e

This was clearly an error in the original part-2 of the diff, and specifically breaks when two projects have a jobset of the same name.

Merge pull request NixOS#757 from grahamc/fixup-migration-p2

16ecd4f

schema/Builds: use jobset_id instead of jobset name matches

Correct the link to hydra-api.yml file

090c05b

The correct filename is `hydra-api.yml` not `hydra.yml`.

Merge pull request NixOS#760 from knl/patch-1

a614199

Correct the link to hydra-api.yml file

Use custom ports for postgres/hydra

07a4465

Use custom ports so hydra and postgres can run in environments where the default ports are in use already.

Default to devShell in shell.nix

0add1cc

README.md: info about runHydra/foreman

45ac8e0

Add sections about using `runHydra` and `foreman`

Use pg_ctl for readiness check

d9d58b7

Using `pg_ctl status` is more reliable than relying checking an open port via netcat.

Use pg_isready for readiness check

1bcb8d0

Co-authored-by: Graham Christensen <graham@grahamc.com>

Run hydra-dev-server

fc0eb02

Execute hydra-dev-server instead of hydra-server Co-authored-by: Graham Christensen <graham@grahamc.com>

Address PR comments:

31262f1

- scripts -> foreman - drop runHydra - drop devShell - move postgresql to buildInputs

README.md update

006233d

- drop any mention of runHydra - link foreman and mention Procfile

Merge pull request NixOS#759 from gilligan/devshell

c4104fe

Devshell

Add missing SQL upgrade script for NOT NULL on type

e9922c4

`type` column in `Jobsets` is defined as NOT NULL. However, the original upgrade script adding this column ommited the constraint.

Merge pull request NixOS#762 from knl/fix-declarative-jobset-missing-…

03f14f4

…type-value Fix declarative jobset missing type value

Merge pull request NixOS#764 from knl/render-jobset.tt-correctly

3c6b724

Render the jobset page correctly when there are fetch errors

kquick and others added 3 commits June 3, 2020 07:43

Add output from hydra-queue-runner for unrunnable steps (no machine).

2442f1b

Update src/hydra-queue-runner/dispatcher.cc

7b7c621

Co-Authored-By: kquick <quick@sparq.org>

Merge branch 'nomachinesteps' of github.com:kquick/hydra into nomachi…

e0ed745

…nesteps

kquick closed this Jun 3, 2020

kquick deleted the nomachinesteps branch June 3, 2020 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add output from hydra-queue-runner for unrunnable steps (no machine). #651

Add output from hydra-queue-runner for unrunnable steps (no machine). #651

kquick commented Apr 23, 2019

kquick commented Apr 23, 2019

kquick commented Apr 23, 2019

knl commented Jun 1, 2020

kquick commented Jun 3, 2020

Add output from hydra-queue-runner for unrunnable steps (no machine). #651

Add output from hydra-queue-runner for unrunnable steps (no machine). #651

Conversation

kquick commented Apr 23, 2019

kquick commented Apr 23, 2019

kquick commented Apr 23, 2019

knl commented Jun 1, 2020

kquick commented Jun 3, 2020