Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unhelpful error messages in aggregate jobs. #825

Merged
merged 2 commits into from Oct 24, 2021

Conversation

samueldr
Copy link
Member

@samueldr samueldr commented Oct 25, 2020

From the commit messages:


hydra-eval-jobs: Identify unexpected errors in handling aggregate jobs

The vague "[json.exception.type_error.302] type must be string, but is null" is absolutely unhelpful in the way Hydra currently handles it on evaluation.

This is handling unexpected errors only; the following commit will handle the specific instance of the previously mentioned error.


hydra-eval-jobs: Transmit original Nix error when handling aggregate jobs

It might happen that a job from the aggregate returned an error!

This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no drvPath to stringify!

So we now actively watch for errors and copy them to the aggregate job.


Note: This wasn't used inside Hydra. The evaluator was only ran directly against the repo.

env -i HYDRA_CONFIG=/Users/samuel/tmp/nixpkgs/hydra.conf \
    /Users/samuel/Projects/nixos/hydra/./src/hydra-eval-jobs/hydra-eval-jobs \
    -I /nix/store \
    -I ./hydra-fail/ \
    ./hydra-fail/nixos/release-small.nix

This was verified on top of NixOS/nixpkgs@f262382

So I do not know if adding an error attribute on the aggregate job will work as expected.

I'm also a bit exhausted from figuring out the error. I'll let someone that knows more about Hydra guide or fix the PR so the aggregate job properly reports that there was an error. Ideally The aggregate job HAS to be failed, otherwise we would have a missing job!

With this change, the job in the JSON file gets the original errors, if present.

Additionally, new unexpected errors would show up like this:

Unexpected error in hydra-eval-jobs when handling job 'nixos.tests.boot.biosCdrom.x86_64-linux', when producing aggregate job 'tested':
error: [...]

cc @andir

The vague "[json.exception.type_error.302] type must be string, but is null"
is **absolutely** unhelpful in the way Hydra currently handles it on
evaluation.

This is handling *unexpected* errors only; the following commit will
handle the specific instance of the previously mentioned error.
@@ -442,45 +442,66 @@ int main(int argc, char * * argv)
for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) {
auto jobName = i.key();
auto & job = i.value();
// For the error message
std::string lastTriedJobName = i.key();
Copy link
Member Author

@samueldr samueldr Oct 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the initialization here is not great...

I think when I wrote that up originally I was thinking about using auto to intrinsically get the type. In that case auto lastTriedJobName = jobName is probably better.

Otherwise, initializing with "".

What would be the better option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.

if (job2 == state->jobs.end())
throw Error("aggregate job '%s' references non-existent job '%s'", jobName, jobName2);

if ((*job2).find("error") != (*job2).end()) {
Copy link
Member Author

@samueldr samueldr Oct 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to dereference here tripped me up. I get that this comes from auto job2 being the result of the search. Is there a pattern we could be using so that we don't have to dereference all the time?

Copy link
Member

@andir andir Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto& something = *job2 should do the trick here and then using something instead of *job2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, this is probably fine? I'd probably replace the (*job2). with just job2-> but other than that this seems... fine.

Comment on lines +472 to +474
if (job.find("error") == job.end()) {
job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better pattern to initialize the error message on the JSON object?

Copy link
Member

@andir andir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a quick pass.

src/hydra-eval-jobs/hydra-eval-jobs.cc Outdated Show resolved Hide resolved
@@ -442,45 +442,66 @@ int main(int argc, char * * argv)
for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) {
auto jobName = i.key();
auto & job = i.value();
// For the error message
std::string lastTriedJobName = i.key();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.

…jobs

It might happen that a job from the aggregate returned an error!

This is what the vague "[json.exception.type_error.302] type must be string, but is null"
was all about in this instance; there was no `drvPath` to stringify!

So we now actively watch for errors and copy them to the aggregate job.
@ju1m
Copy link

ju1m commented Dec 1, 2020

I don't know if this PR is ready, but just to let you know that merging it would unblock NixOS/nixpkgs#101071

@samueldr
Copy link
Member Author

samueldr commented Dec 3, 2020

I've stated in the past on the IRC channels (a couple of times) that it needs to be reviewed by "code owners" of that section for the open questions.

Other than that, in my opinion it is ready, if not the exact code as shown, the same but with the corrections and nits fixed.

I know that, during my testing, it was working as expected.

7c6f434c pushed a commit to NixOS/nixpkgs that referenced this pull request Apr 24, 2021
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org,
and second because without NixOS/hydra#825
hydra-eval-jobs crashes instead of skipping aggregated jobs which fail
(here because they required an IFD).
@lukegb
Copy link
Contributor

lukegb commented Aug 1, 2021

This would have massively helped debugging NixOS/nixpkgs#132328.

if (job.find("error") == job.end()) {
job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName);
}
job["error"] = fmt("While handling '%1%': %2%\n", jobName2, (std::string) (*job2)["error"]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this always overwrite the assignment on line 473?

@samueldr
Copy link
Member Author

Just saying, today and possibly since two days ago nixos:trunk-combined is having a bad time again:

error: [json.exception.type_error.302] type must be string, but is null

And we can't know why.

@grahamc
Copy link
Member

grahamc commented Oct 24, 2021

I'm going to merge this under the idea that it is strictly better than whats there.

@grahamc grahamc merged commit 3516950 into NixOS:master Oct 24, 2021
@samueldr
Copy link
Member Author

Better in behaviour at least.

@samueldr samueldr deleted the fix/unhelpful-errors-in-aggregates branch October 24, 2021 00:37
@grahamc
Copy link
Member

grahamc commented Oct 24, 2021

This doesn't seem to do what it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants