Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add output from hydra-queue-runner for unrunnable steps (no machine). #651

Closed
wants to merge 289 commits into from

Conversation

kquick
Copy link
Contributor

@kquick kquick commented Apr 23, 2019

This helps generate log information (from the hydra-queue-runner service) about steps for which there is no available machine. This is currently a silent failure.

This patch works by seeing if any of the currently busy machines could build the remaining steps and if a step cannot be built by any busy machine (and it wasn't already assigned to a non-busy machine) then it reports it as a step with no machine available to build it.

This might over-report in the case where build machines where down, but I think that is still a valid condition for reporting.

There is no Web-UI display of this condition; it is found by examining the log output from the hydra-queue-runner service. Web-UI display would need to have a web-page replication of this logic, or else an addition to the database to store this information, both of which are more ambitious undertakings left to (possibly) future pull request efforts.

Issues:

@kquick
Copy link
Contributor Author

kquick commented Apr 23, 2019

Sample output:

Apr 23 11:58:34 hydra hydra-queue-runner[28483[: NO MACHINE AVAILABLE to run step '/nix/store/b41lzs0xshmkq6bban48h8k4y207abxz-llvm-5.0.2.drv' (needs system type 'x86_64-linux:big-parallel')

@kquick
Copy link
Contributor Author

kquick commented Apr 23, 2019

Please hold off a merge... this is overreporting and needs an update (which I'm working on).

edolstra and others added 27 commits June 19, 2019 16:45
hydra-server: add limit parameter on search path
The test was failing with:

    machine# Can't locate LWP/UserAgent.pm in @inc (you may need to install the LWP::UserAgent module)...
test.api: use Hydra perl dependencies to run `api-test.pl`
The creation of the `pg_trgm` extension needs superuser power. So,
this patch makes the extension creation in the Hydra NixOS module when
a local database is used.

If it is not possible to create this extension (remote database for
instance with nosuperuser), the creation of the `pg_trgm` index is
skipped (this index speedup queries on builds.drvpath) and warnings
are emitted:

    initialising the Hydra database schema...
    WARNING:  Can not create extension pg_trgm: permission denied to create extension "pg_trgm"
    WARNING:  HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql
    WARNING:  The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath)

This allows to keep smooth migrations: the migration process doesn't
require a manual step (but this manual step is recommended on big
remote databases).
Create `pg_trgm` extension in the NixOS module
This adds a `InfluxDBNotification` plugin which is configured as:

```
<influxdb>
  url = http://127.0.0.1:8086
  db = hydra
</influxdb>
```

which will write a notification for every finished job to the
configured database in InfluxDB looking like:

```
hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212
```
Added the InfluxDBNotification plugin including a NixOS test
Replaced you with look
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'.

* 'eval_failed' has the format '<tmpId>'. (The cause of the error can
  be found in the database.)

* 'eval_added' has the format '<tmpId>:<evalId>'.
Plugins are now disabled at startup time unless there is some relevant
configuration in hydra.conf. This avoids hydra-notify having to do a
lot of redundant work (a lot of plugins did a lot of database queries
*before* deciding they were disabled).

Note: BitBucketStatus users will need to add 'enable_bitbucket_status
= 1' to hydra.conf.
In particular, doing a 'select * from Jobsets where ...' must be
avoided, because the 'errormsg' column can be very big.
Turn hydra-notify into a daemon
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57.

(cherry picked from commit 438ddf5)
This adds the following (pre-existing) attributes to the jobset response:

- nrtotal
- lastcheckedtime
- starttime
- checkinterval
- triggertime
- fetcherrormsg
- errortime
edolstra and others added 23 commits May 12, 2020 16:14
The previous code converted option values to ints when the value
contained a digit somewhere. This is too eager since it also converts
strings like `release-0.2` to an int which should not happen.

We now only convert to int when the value is an integer.
…ion-values-to-ints

GitInput: only convert integer option values to int
This was clearly an error in the original part-2 of the diff, and
specifically breaks when two projects have a jobset of the same name.
schema/Builds: use jobset_id instead of jobset name matches
The correct filename is `hydra-api.yml` not `hydra.yml`.
Correct the link to hydra-api.yml file
runHyda automatically starts hydra and postgres:

```
$ nix-shell -A runHydra
```

The shell receives hydra from the working copy as buildInput.
Running hydra, queue-runner, evaluator and postgres is managed
by foreman (https://github.com/ddollar/foreman) and configured
in `Procfile`.
This adds a `devShell` which unlike `runHydra` doesn't start hydra
automatically and doesn't receive hydra as build input. It is better
suited for interactive development cycles:

```
$ nix-shell -A devShell
$ ./bootstrap
$ configurePhase
$ make
$ # hack hack hack
$ foreman start
  # test test test
  <C-c>
$ # hack hack hack
```
Use custom ports so hydra and postgres can run in environments where
the default ports are in use already.
Add sections about using `runHydra` and `foreman`
Using `pg_ctl status` is more reliable than relying checking an
open port via netcat.
Co-authored-by: Graham Christensen <graham@grahamc.com>
Execute hydra-dev-server instead of hydra-server

Co-authored-by: Graham Christensen <graham@grahamc.com>
- scripts -> foreman
- drop runHydra
- drop devShell
- move postgresql to buildInputs
- drop any mention of runHydra
- link foreman and mention Procfile
The current implementation will pass all values to `create_or_update` method. The
missing values will end up as `undef` (or `NULL`) when assigned to `%update`.
Thus, for columns that are NOT NULL, when, for example, flakes are not used,
will result in a horrible:

    DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed:
    ERROR:  null value in column "type" violates not-null constraint

    DETAIL:  Failing row contains (.jobsets, 118, hydra, hydra jobsets, src, hydra/jobsets.nix, null,
    null, null, 1589536378, 1, 0, 0, , 3, 30, 100, null, null, 1589536379, null, null). [for Statement
    "UPDATE jobsets SET checkinterval = ?, description = ?, enableemail = ?, nixexprinput = ?,
    nixexprpath = ?, type = ? WHERE ( ( name = ? AND project = ? ) )" with ParamValues: 1='30',
    2='hydra jobsets', 3='0', 4='src', 5='hydra/jobsets.nix', 6=undef, 7='.jobsets', 8='hydra'] at
    /nix/store/lsf81ip9ybxihk5praf2n0nh14a6i9j0-hydra-0.1.19700101.DIRTY/libexec/hydra/lib/Hydra/Helper/AddBuilds.pm line 50

This change just omits adding such values to `%update`, which results in
PostgreSQL assigning the default values.
`type` column in `Jobsets` is defined as NOT NULL. However, the original upgrade
script adding this column ommited the constraint.
…type-value

Fix declarative jobset missing type value
The original code would return standard "Please come back later" page when there
are only fetch errors on a newly setup declarative project. The problem is that
there are two types of errors: standard errors and fetch errors. Each is
acompanied by a corresponding field for time of occurence. Standard errors use
'errortime', while fetch errors have 'lastchecktime' set to the time of the
error. Unfortunately, jobset.tt file was only using 'errortime' for displaying
the time. This would result in the following errors in logs:

    Couldn't render template "date error - bad time/date string:  expects 'h:m:s d:m:y'  got: ''

This change includes using 'lastchecktime' when rendering the error times.
Render the jobset page correctly when there are fetch errors
@knl
Copy link
Contributor

knl commented Jun 1, 2020

Please hold off a merge... this is overreporting and needs an update (which I'm working on).

Hi @kquick do you have an update on this one? We found it might be useful on our deployment, as we stumbled upon the same problem several times.

@kquick
Copy link
Contributor Author

kquick commented Jun 3, 2020

These changes got out of date and a rebase got messy, so there's a newer version at #774. That one resolved the over-reporting issue and is ready for merging consideration.

Closing this PR as replaced by #774.

@kquick kquick closed this Jun 3, 2020
@kquick kquick deleted the nomachinesteps branch June 3, 2020 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet