New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unhelpful error messages in aggregate jobs. #825
Fix unhelpful error messages in aggregate jobs. #825
Conversation
The vague "[json.exception.type_error.302] type must be string, but is null" is **absolutely** unhelpful in the way Hydra currently handles it on evaluation. This is handling *unexpected* errors only; the following commit will handle the specific instance of the previously mentioned error.
@@ -442,45 +442,66 @@ int main(int argc, char * * argv) | |||
for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) { | |||
auto jobName = i.key(); | |||
auto & job = i.value(); | |||
// For the error message | |||
std::string lastTriedJobName = i.key(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the initialization here is not great...
I think when I wrote that up originally I was thinking about using auto
to intrinsically get the type. In that case auto lastTriedJobName = jobName
is probably better.
Otherwise, initializing with ""
.
What would be the better option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the nlohman value key()
function returns reference. auto
should then deduct the type T&
which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.
if (job2 == state->jobs.end()) | ||
throw Error("aggregate job '%s' references non-existent job '%s'", jobName, jobName2); | ||
|
||
if ((*job2).find("error") != (*job2).end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needing to dereference here tripped me up. I get that this comes from auto job2
being the result of the search. Is there a pattern we could be using so that we don't have to dereference all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto& something = *job2
should do the trick here and then using something
instead of *job2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, this is probably fine? I'd probably replace the (*job2).
with just job2->
but other than that this seems... fine.
if (job.find("error") == job.end()) { | ||
job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better pattern to initialize the error message on the JSON object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick pass.
@@ -442,45 +442,66 @@ int main(int argc, char * * argv) | |||
for (auto i = state->jobs.begin(); i != state->jobs.end(); ++i) { | |||
auto jobName = i.key(); | |||
auto & job = i.value(); | |||
// For the error message | |||
std::string lastTriedJobName = i.key(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the nlohman value key()
function returns reference. auto
should then deduct the type T&
which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.
…jobs It might happen that a job from the aggregate returned an error! This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no `drvPath` to stringify! So we now actively watch for errors and copy them to the aggregate job.
0ae2efc
to
b5140c1
Compare
I don't know if this PR is ready, but just to let you know that merging it would unblock NixOS/nixpkgs#101071 |
I've stated in the past on the IRC channels (a couple of times) that it needs to be reviewed by "code owners" of that section for the open questions. Other than that, in my opinion it is ready, if not the exact code as shown, the same but with the corrections and nits fixed. I know that, during my testing, it was working as expected. |
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org, and second because without NixOS/hydra#825 hydra-eval-jobs crashes instead of skipping aggregated jobs which fail (here because they required an IFD).
This would have massively helped debugging NixOS/nixpkgs#132328. |
if (job.find("error") == job.end()) { | ||
job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName); | ||
} | ||
job["error"] = fmt("While handling '%1%': %2%\n", jobName2, (std::string) (*job2)["error"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this always overwrite the assignment on line 473?
Just saying, today and possibly since two days ago
And we can't know why. |
I'm going to merge this under the idea that it is strictly better than whats there. |
Better in behaviour at least. |
This doesn't seem to do what it should. |
From the commit messages:
hydra-eval-jobs: Identify unexpected errors in handling aggregate jobs
The vague "[json.exception.type_error.302] type must be string, but is null" is absolutely unhelpful in the way Hydra currently handles it on evaluation.
This is handling unexpected errors only; the following commit will handle the specific instance of the previously mentioned error.
hydra-eval-jobs: Transmit original Nix error when handling aggregate jobs
It might happen that a job from the aggregate returned an error!
This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no
drvPath
to stringify!So we now actively watch for errors and copy them to the aggregate job.
Note: This wasn't used inside Hydra. The evaluator was only ran directly against the repo.
This was verified on top of NixOS/nixpkgs@f262382
So I do not know if adding an
error
attribute on the aggregate job will work as expected.I'm also a bit exhausted from figuring out the error. I'll let someone that knows more about Hydra guide or fix the PR so the aggregate job properly reports that there was an error.
IdeallyThe aggregate job HAS to be failed, otherwise we would have a missing job!With this change, the job in the JSON file gets the original errors, if present.
Additionally, new unexpected errors would show up like this:
cc @andir