-
-
Notifications
You must be signed in to change notification settings - Fork 315
Add output from hydra-queue-runner for unrunnable steps (no machine). #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sample output:
|
Please hold off a merge... this is overreporting and needs an update (which I'm working on). |
hydra-server: add limit parameter on search path
The test was failing with: machine# Can't locate LWP/UserAgent.pm in @inc (you may need to install the LWP::UserAgent module)...
test.api: use Hydra perl dependencies to run `api-test.pl`
The creation of the `pg_trgm` extension needs superuser power. So, this patch makes the extension creation in the Hydra NixOS module when a local database is used. If it is not possible to create this extension (remote database for instance with nosuperuser), the creation of the `pg_trgm` index is skipped (this index speedup queries on builds.drvpath) and warnings are emitted: initialising the Hydra database schema... WARNING: Can not create extension pg_trgm: permission denied to create extension "pg_trgm" WARNING: HINT: Temporary provide superuser role to your Hydra Postgresql user and run the script src/sql/upgrade-57.sql WARNING: The pg_trgm index on builds.drvpath has been skipped (slower complex queries on builds.drvpath) This allows to keep smooth migrations: the migration process doesn't require a manual step (but this manual step is recommended on big remote databases).
Create `pg_trgm` extension in the NixOS module
This adds a `InfluxDBNotification` plugin which is configured as: ``` <influxdb> url = http://127.0.0.1:8086 db = hydra </influxdb> ``` which will write a notification for every finished job to the configured database in InfluxDB looking like: ``` hydra_build_status,cached=false,job=job,jobset=default,project=sample,repo=default,result=success,status=success,system=x86_64-linux build_id="1",build_status=0i,closure_size=584i,duration=0i,main_build_id="1",queued=0i,size=168i 1564156212 ```
Added the InfluxDBNotification plugin including a NixOS test
Replaced you with look
Corrected grammar typo
It now receives notifications about started/finished builds/steps via PostgreSQL. This gets rid of the (substantial) overhead of starting hydra-notify for every event. It also allows other programs (even on other machines) to listen to Hydra notifications.
* 'eval_started' has the format '<tmpId>\t<project>\t<jobset>'. * 'eval_failed' has the format '<tmpId>'. (The cause of the error can be found in the database.) * 'eval_added' has the format '<tmpId>:<evalId>'.
Plugins are now disabled at startup time unless there is some relevant configuration in hydra.conf. This avoids hydra-notify having to do a lot of redundant work (a lot of plugins did a lot of database queries *before* deciding they were disabled). Note: BitBucketStatus users will need to add 'enable_bitbucket_status = 1' to hydra.conf.
In particular, doing a 'select * from Jobsets where ...' must be avoided, because the 'errormsg' column can be very big.
Turn hydra-notify into a daemon
May 15 09:20:10 chef hydra-queue-runner[27523]: Hydra::Plugin::GitlabStatus=HASH(0x519a7b8)->buildFinished: Can't call method "value" on an undefined value at /nix/store/858hinflxcl2jd12wv1r3a8j11ybsf6w-hydra-0.1.2629.89fa829/libexec/hydra/lib/Hydra/Plugin/GitlabStatus.pm line 57. (cherry picked from commit 438ddf5)
This adds the following (pre-existing) attributes to the jobset response: - nrtotal - lastcheckedtime - starttime - checkinterval - triggertime - fetcherrormsg - errortime
The previous code converted option values to ints when the value contained a digit somewhere. This is too eager since it also converts strings like `release-0.2` to an int which should not happen. We now only convert to int when the value is an integer.
…ion-values-to-ints GitInput: only convert integer option values to int
This was clearly an error in the original part-2 of the diff, and specifically breaks when two projects have a jobset of the same name.
schema/Builds: use jobset_id instead of jobset name matches
The correct filename is `hydra-api.yml` not `hydra.yml`.
Correct the link to hydra-api.yml file
runHyda automatically starts hydra and postgres: ``` $ nix-shell -A runHydra ``` The shell receives hydra from the working copy as buildInput. Running hydra, queue-runner, evaluator and postgres is managed by foreman (https://github.com/ddollar/foreman) and configured in `Procfile`.
This adds a `devShell` which unlike `runHydra` doesn't start hydra automatically and doesn't receive hydra as build input. It is better suited for interactive development cycles: ``` $ nix-shell -A devShell $ ./bootstrap $ configurePhase $ make $ # hack hack hack $ foreman start # test test test <C-c> $ # hack hack hack ```
Use custom ports so hydra and postgres can run in environments where the default ports are in use already.
Add sections about using `runHydra` and `foreman`
Using `pg_ctl status` is more reliable than relying checking an open port via netcat.
Co-authored-by: Graham Christensen <graham@grahamc.com>
Execute hydra-dev-server instead of hydra-server Co-authored-by: Graham Christensen <graham@grahamc.com>
- scripts -> foreman - drop runHydra - drop devShell - move postgresql to buildInputs
- drop any mention of runHydra - link foreman and mention Procfile
The current implementation will pass all values to `create_or_update` method. The missing values will end up as `undef` (or `NULL`) when assigned to `%update`. Thus, for columns that are NOT NULL, when, for example, flakes are not used, will result in a horrible: DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: null value in column "type" violates not-null constraint DETAIL: Failing row contains (.jobsets, 118, hydra, hydra jobsets, src, hydra/jobsets.nix, null, null, null, 1589536378, 1, 0, 0, , 3, 30, 100, null, null, 1589536379, null, null). [for Statement "UPDATE jobsets SET checkinterval = ?, description = ?, enableemail = ?, nixexprinput = ?, nixexprpath = ?, type = ? WHERE ( ( name = ? AND project = ? ) )" with ParamValues: 1='30', 2='hydra jobsets', 3='0', 4='src', 5='hydra/jobsets.nix', 6=undef, 7='.jobsets', 8='hydra'] at /nix/store/lsf81ip9ybxihk5praf2n0nh14a6i9j0-hydra-0.1.19700101.DIRTY/libexec/hydra/lib/Hydra/Helper/AddBuilds.pm line 50 This change just omits adding such values to `%update`, which results in PostgreSQL assigning the default values.
`type` column in `Jobsets` is defined as NOT NULL. However, the original upgrade script adding this column ommited the constraint.
…type-value Fix declarative jobset missing type value
The original code would return standard "Please come back later" page when there are only fetch errors on a newly setup declarative project. The problem is that there are two types of errors: standard errors and fetch errors. Each is acompanied by a corresponding field for time of occurence. Standard errors use 'errortime', while fetch errors have 'lastchecktime' set to the time of the error. Unfortunately, jobset.tt file was only using 'errortime' for displaying the time. This would result in the following errors in logs: Couldn't render template "date error - bad time/date string: expects 'h:m:s d:m:y' got: '' This change includes using 'lastchecktime' when rendering the error times.
Render the jobset page correctly when there are fetch errors
Hi @kquick do you have an update on this one? We found it might be useful on our deployment, as we stumbled upon the same problem several times. |
Co-Authored-By: kquick <quick@sparq.org>
This helps generate log information (from the hydra-queue-runner service) about steps for which there is no available machine. This is currently a silent failure.
This patch works by seeing if any of the currently busy machines could build the remaining steps and if a step cannot be built by any busy machine (and it wasn't already assigned to a non-busy machine) then it reports it as a step with no machine available to build it.
This might over-report in the case where build machines where down, but I think that is still a valid condition for reporting.
There is no Web-UI display of this condition; it is found by examining the log output from the hydra-queue-runner service. Web-UI display would need to have a web-page replication of this logic, or else an addition to the database to store this information, both of which are more ambitious undertakings left to (possibly) future pull request efforts.
Issues: