New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cmake flag -DHPX_WITH_FAULT_TOLERANCE=ON (OFF by default) #3170
Conversation
…re connection reset signals of connections, so HPX does not terminate if a node crashes.
hpx/runtime.hpp
Outdated
/// component type. This a purely internal function allowing to work | ||
/// around certain library specific problems related to dynamic | ||
/// loading of external libraries. | ||
HPX_EXPORT bool tolerate_node_faults(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not match the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, copy/paste :-( I'll add a proper comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KADichev ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated the branch. Not sure if anything else is required of me here. Never did PR before.
What is the rationale of making this functionality optional? Wouldn't it be better to be always able to handle this kind of exception? |
The functionality is optional, because I'm not sure if ignoring a "connection_reset" signal due to suddenly failing remote endpoints is the right way of handling this -- it is a workaround to avoid HPX raising an exception, and terminating, and allowing to explore recovery (which we are working on). |
In the end this should be the default, it shouldn't even be possible to disable this. I would have no objections to go ahead with merging this PR if that's the eventual goal. |
The goal should be to have a fault tolerance HPX in the end. I am all for having it optional, since there's always a trade off between fault tolerance and performance. This is also the first step in that direction so it makes sense to have it defaulted to being turned off for now. |
Ok. Fair enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
I fixed the comment and pushed the changes to the branch Let me know if you need anything else. |
to ignore connection reset signals of connections, so HPX does not terminate if a node crashes.