Fix unhelpful error messages in aggregate jobs. #825

samueldr · 2020-10-25T23:34:28Z

From the commit messages:

hydra-eval-jobs: Identify unexpected errors in handling aggregate jobs

The vague "[json.exception.type_error.302] type must be string, but is null" is absolutely unhelpful in the way Hydra currently handles it on evaluation.

This is handling unexpected errors only; the following commit will handle the specific instance of the previously mentioned error.

hydra-eval-jobs: Transmit original Nix error when handling aggregate jobs

It might happen that a job from the aggregate returned an error!

This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no drvPath to stringify!

So we now actively watch for errors and copy them to the aggregate job.

Note: This wasn't used inside Hydra. The evaluator was only ran directly against the repo.

env -i HYDRA_CONFIG=/Users/samuel/tmp/nixpkgs/hydra.conf \
    /Users/samuel/Projects/nixos/hydra/./src/hydra-eval-jobs/hydra-eval-jobs \
    -I /nix/store \
    -I ./hydra-fail/ \
    ./hydra-fail/nixos/release-small.nix

This was verified on top of NixOS/nixpkgs@f262382

So I do not know if adding an error attribute on the aggregate job will work as expected.

I'm also a bit exhausted from figuring out the error. I'll let someone that knows more about Hydra guide or fix the PR so the aggregate job properly reports that there was an error. ~~Ideally~~ The aggregate job HAS to be failed, otherwise we would have a missing job!

With this change, the job in the JSON file gets the original errors, if present.

Additionally, new unexpected errors would show up like this:

Unexpected error in hydra-eval-jobs when handling job 'nixos.tests.boot.biosCdrom.x86_64-linux', when producing aggregate job 'tested':
error: [...]

cc @andir

The vague "[json.exception.type_error.302] type must be string, but is null" is **absolutely** unhelpful in the way Hydra currently handles it on evaluation. This is handling *unexpected* errors only; the following commit will handle the specific instance of the previously mentioned error.

samueldr · 2020-10-25T23:56:25Z

src/hydra-eval-jobs/hydra-eval-jobs.cc

@@ -442,45 +442,66 @@ int main(int argc, char * * argv)
        for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) {
            auto jobName = i.key();
            auto & job = i.value();
+            // For the error message
+            std::string lastTriedJobName = i.key();


I guess the initialization here is not great...

I think when I wrote that up originally I was thinking about using auto to intrinsically get the type. In that case auto lastTriedJobName = jobName is probably better.

Otherwise, initializing with "".

What would be the better option?

IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.

samueldr · 2020-10-25T23:58:15Z

src/hydra-eval-jobs/hydra-eval-jobs.cc

+                        if (job2 == state->jobs.end())
+                            throw Error("aggregate job '%s' references non-existent job '%s'", jobName, jobName2);
+
+                        if ((*job2).find("error") != (*job2).end()) {


Needing to dereference here tripped me up. I get that this comes from auto job2 being the result of the search. Is there a pattern we could be using so that we don't have to dereference all the time?

auto& something = *job2 should do the trick here and then using something instead of *job2.

Honestly, this is probably fine? I'd probably replace the (*job2). with just job2-> but other than that this seems... fine.

samueldr · 2020-10-26T00:00:12Z

src/hydra-eval-jobs/hydra-eval-jobs.cc

+                            if (job.find("error") == job.end()) {
+                                job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName);
+                            }


Is there a better pattern to initialize the error message on the JSON object?

andir

Just did a quick pass.

src/hydra-eval-jobs/hydra-eval-jobs.cc

andir · 2020-10-26T00:04:44Z

src/hydra-eval-jobs/hydra-eval-jobs.cc

@@ -442,45 +442,66 @@ int main(int argc, char * * argv)
        for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) {
            auto jobName = i.key();
            auto & job = i.value();
+            // For the error message
+            std::string lastTriedJobName = i.key();


IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.

…jobs It might happen that a job from the aggregate returned an error! This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no `drvPath` to stringify! So we now actively watch for errors and copy them to the aggregate job.

ju1m · 2020-12-01T16:44:15Z

I don't know if this PR is ready, but just to let you know that merging it would unblock NixOS/nixpkgs#101071

samueldr · 2020-12-03T23:17:07Z

I've stated in the past on the IRC channels (a couple of times) that it needs to be reviewed by "code owners" of that section for the open questions.

Other than that, in my opinion it is ready, if not the exact code as shown, the same but with the corrections and nits fixed.

I know that, during my testing, it was working as expected.

First because IFD (import-from-derivation) is not allowed on hydra.nixos.org, and second because without NixOS/hydra#825 hydra-eval-jobs crashes instead of skipping aggregated jobs which fail (here because they required an IFD).

lukegb · 2021-08-01T21:42:54Z

This would have massively helped debugging NixOS/nixpkgs#132328.

ncfavier · 2021-08-01T22:14:42Z

src/hydra-eval-jobs/hydra-eval-jobs.cc

+                            if (job.find("error") == job.end()) {
+                                job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName);
+                            }
+                            job["error"] = fmt("While handling '%1%': %2%\n", jobName2, (std::string) (*job2)["error"]);


Doesn't this always overwrite the assignment on line 473?

samueldr · 2021-10-24T00:13:53Z

Just saying, today and possibly since two days ago nixos:trunk-combined is having a bad time again:

error: [json.exception.type_error.302] type must be string, but is null

And we can't know why.

grahamc · 2021-10-24T00:34:45Z

I'm going to merge this under the idea that it is strictly better than whats there.

samueldr · 2021-10-24T00:37:21Z

Better in behaviour at least.

grahamc · 2021-10-24T02:23:25Z

This doesn't seem to do what it should.

samueldr commented Oct 25, 2020

View reviewed changes

samueldr commented Oct 26, 2020

View reviewed changes

andir reviewed Oct 26, 2020

View reviewed changes

samueldr force-pushed the fix/unhelpful-errors-in-aggregates branch from 0ae2efc to b5140c1 Compare October 26, 2020 00:13

andir mentioned this pull request Oct 26, 2020

Evaluation error is blocking jobsets NixOS/nixpkgs#83647

Closed

ju1m mentioned this pull request Nov 29, 2020

apparmor: try again to fix and improve NixOS/nixpkgs#101071

Merged

18 tasks

ncfavier reviewed Aug 1, 2021

View reviewed changes

grahamc merged commit 3516950 into NixOS:master Oct 24, 2021

samueldr deleted the fix/unhelpful-errors-in-aggregates branch October 24, 2021 00:37

grahamc mentioned this pull request Oct 24, 2021

Revert "Fix unhelpful error messages in aggregate jobs." #1047

Merged

andersk mentioned this pull request Oct 26, 2021

nixos-unstable evaluation broken on Hydra (error: [json.exception.type_error.302] type must be string, but is null) NixOS/nixpkgs#142918

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unhelpful error messages in aggregate jobs. #825

Fix unhelpful error messages in aggregate jobs. #825

samueldr commented Oct 25, 2020 •

edited

samueldr Oct 25, 2020 •

edited

andir Oct 26, 2020

samueldr Oct 25, 2020 •

edited

andir Oct 26, 2020 •

edited

lukegb Aug 1, 2021

samueldr Oct 26, 2020

andir left a comment

andir Oct 26, 2020

ju1m commented Dec 1, 2020

samueldr commented Dec 3, 2020

lukegb commented Aug 1, 2021

ncfavier Aug 1, 2021

samueldr commented Oct 24, 2021

grahamc commented Oct 24, 2021

samueldr commented Oct 24, 2021

grahamc commented Oct 24, 2021

Fix unhelpful error messages in aggregate jobs. #825

Fix unhelpful error messages in aggregate jobs. #825

Conversation

samueldr commented Oct 25, 2020 • edited

samueldr Oct 25, 2020 • edited

Choose a reason for hiding this comment

andir Oct 26, 2020

Choose a reason for hiding this comment

samueldr Oct 25, 2020 • edited

Choose a reason for hiding this comment

andir Oct 26, 2020 • edited

Choose a reason for hiding this comment

lukegb Aug 1, 2021

Choose a reason for hiding this comment

samueldr Oct 26, 2020

Choose a reason for hiding this comment

andir left a comment

Choose a reason for hiding this comment

andir Oct 26, 2020

Choose a reason for hiding this comment

ju1m commented Dec 1, 2020

samueldr commented Dec 3, 2020

lukegb commented Aug 1, 2021

ncfavier Aug 1, 2021

Choose a reason for hiding this comment

samueldr commented Oct 24, 2021

grahamc commented Oct 24, 2021

samueldr commented Oct 24, 2021

grahamc commented Oct 24, 2021

samueldr commented Oct 25, 2020 •

edited

samueldr Oct 25, 2020 •

edited

samueldr Oct 25, 2020 •

edited

andir Oct 26, 2020 •

edited