Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to avoid data races in async_traversal while evaluating dataflow() #3001

Merged
merged 6 commits into from Nov 23, 2017

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Nov 11, 2017

@Naios please verify that this does not break anything

@Naios
Copy link
Contributor

Naios commented Nov 12, 2017

Ouch these typos..., thanks for correcting it. @hkaiser I think your change doesn't make any difference to the current revision because detach() just sets the bool detached_; member inside the async_traversal_point class rather than altering the control flow directly and thus the order of both calls doesn't matter.

As I remind there was a check that should prevent that the finalizer of dataflow was called more than once: c852439#diff-040de9e61eef02088de58b48a421a1e5L514 .
While having a Skype conversation we went through this section and came to the conclusion that it could be dropped because we thought that there is no possibility the continuation of the future will be executed more than once. Maybe this has something to do with our issue here, but I'm not sure.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 12, 2017

@Naios unfortunately this patch does not fix the problem (it makes it appear less often, though - but this could be coincidental). My current theory is that the container the iterators in https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/detail/pack_traversal_async_impl.hpp#L232 refer to goes out of scope too early. It might be noteworthy that I move the container into dataflow() which might be a use case we have not tested thoroughly enough. I will investigate further.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 12, 2017

Also, I think setting the flag before actually attaching a continuation to the current future changes the behavior as under certain circumstances the continuation might be executed during async_continue in which case the next async-traversal is started even before the detached flag might be reset,

@sithhell
Copy link
Member

I think in principle, setting the detached flag before executing async_continue is correct. A race might still occur since detached_ is not synchronized in any way. It should be either protected by a lock or changed to be std::atomic_flag (or similar).

@Naios
Copy link
Contributor

Naios commented Nov 15, 2017

The detached variable isn't shared across threads, because we create a new variable for every resumed traversal.

@sithhell
Copy link
Member

Don't the completion handlers concurrently access the variable?

@Naios
Copy link
Contributor

Naios commented Nov 15, 2017

No the only state which is shared across the suspensions are the async_traversal_frame which is stored on the heap and the iterators of the current traversal hierarchy.
See the arguments of frame_->async_continue(*current, std::move(state)); and its implementation:

            template <typename T, typename Hierarchy>
            void async_continue(T&& value, Hierarchy&& hierarchy)
            {
                // Create a self reference
                boost::intrusive_ptr<async_traversal_frame> self(this);

                // Create a callable object which resumes the current
                // traversal when it's called.
                auto resumable = make_resume_traversal_callable(
                    std::move(self), std::forward<Hierarchy>(hierarchy));

                // Invoke the visitor with the current value and the
                // callable object to resume the control flow.
                util::invoke(visitor(), async_traverse_detach_tag{},
                    std::forward<T>(value), std::move(resumable));
            }

where the make_resume_traversal_callable transfers the current state to the next resumption.

In addition to that since we are resolving the futures one by one after each other we wouldn't encounter any threading issues at all I think.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 15, 2017

Whatever causes the problem, the current code clearly exposes data races in some way. For me the iterators to the shared data start to point into nowhere if dataflow is used when running on more than one thread.

@sithhell
Copy link
Member

sithhell commented Nov 15, 2017 via email

@hkaiser
Copy link
Member Author

hkaiser commented Nov 15, 2017

I don't have a minimal use-case at the moment. Also, I have not tried using a std::atomic_flag.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 18, 2017

@Naios I think I understand what is going on now. I however have a hard time changing the code to fix it, maybe you can help.

The issue is that the current code creates a detached flag for each iteration level, attempting to propagate the status of the child back up on exit. This causes subtle data race problems caused by futures becoming ready out of order. For this reason, the old dataflow code had exactly one flag to track the 'ready' state. I believe the problems should go away if the new code is changed such that it will use just one Boolean to track the readiness (instead of one per iteration level).

@Naios
Copy link
Contributor

Naios commented Nov 18, 2017

@hkaiser I'm not sure what is going on there, are you open for a call tomorrow?
I'm trying to get my phylanx installation ready meanwhile.

Theoretically every future should be traversed only once, thus the traversal order is stricly ordered.
The detached bool variable is only used for marking the current execution context as abadoned.
When the last future is traversed the final handler is called, thus I think that a global detached variable would be a workaround for an issue we currently don't know about.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 18, 2017

@Naios the latest commit fixes the issue I was seeing. Please verify that I have not broken anything.

- fixed another potential use after move problem
- fixed memory leak (dataflow was leaking its shared state)
- simplified function operator implementation for resume_state_callable
return typename types::frame_pointer_type(ptr);
// Create an intrusive_ptr from the heap object, don't increase
// reference count (it's already 'one').
return typename types::frame_pointer_type(ptr, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch here 👍

@@ -514,8 +553,7 @@ namespace util {
template <typename Frame, typename State>
void resume_traversal_callable<Frame, State>::operator()()
{
auto hierarchy =
util::tuple_cat(util::make_tuple(frame_), std::move(state_));
auto hierarchy = util::tuple_cat(util::make_tuple(frame_), state_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every resume_traversal_callable instance should only be called once, so the move should be valid here.
Maybe we could guard this through an assertion.

Which issue did occur with this move?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I was not sure whether this may have caused my issues. I will revert this change and add an assertion making sure this doesn't cause any problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (/**/; !range.is_finished(); ++range)
{
async_traverse_one(range);
if (is_detached()) // test before increment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm wondering here about, is that the iterator isn't modified inside async_traverse_one.
When !range.is_finished() it should be perfectly valid to increment the iterator.
In the worst case we finish the range but in this case it should never yield an iterator dereferencation error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the iterators are not changed inside async_traverse_one. However this function might move away the vector those iterators refer to (see the std::move(args_) here:

visitor(), async_traverse_complete_tag{}, std::move(args_));
). This was actually the main problem I stumbled over.

@@ -275,14 +300,15 @@ namespace util {
{
Frame frame_;
tuple<Hierarchy...> hierarchy_;
bool detached_;
std::atomic<bool>& detached_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The usage of the reference is a great simplification 👍
  • From my point of understanding making the detached_ variable atomic doesn't make any difference here since as described earlier the variable isn't shared across threads because we create a local variable for every resumption. So this can be ellided for sure. My assumption is that I forgot to propagate the value of the variable back somewhere. Probably the atomic can be converted back to a normal bool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will verify whether converting to a plain bool is sufficient.

explicit static_async_range(Target* target) : target_(target) {}

static_async_range(static_async_range const& rhs) = default;
static_async_range(static_async_range && rhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the custom move constructors and operators aren't needed (and probably it is left from debugging).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not really needed, indeed. OTOH, explicitly setting the moved-from object to be invalid might be a good thing in general.

@Naios
Copy link
Contributor

Naios commented Nov 19, 2017

@hkaiser I was able to reproduce the Issue, see my comments above.
I would like to put a little bit more time investigating into this.

Especially see:

What I'm wondering here about, is that the iterator isn't modified inside async_traverse_one.
When !range.is_finished() it should be perfectly valid to increment the iterator.
In the worst case we finish the range but in this case it should never yield an iterator dereferencation error.

@Naios
Copy link
Contributor

Naios commented Nov 19, 2017

@hkaiser I added assertions to ensure that the traversal is finished only once and it seems that the final handler is called at least twice which is probably the main reason of the iterator dereferencation error:

screenshot_1

@hkaiser
Copy link
Member Author

hkaiser commented Nov 19, 2017

Especially see:

What I'm wondering here about, is that the iterator isn't modified inside async_traverse_one. When !range.is_finished() it should be perfectly valid to increment the iterator. In the worst case we finish the range but in this case it should never yield an iterator dereferencation error.

I think I addressed this concern above.

@hkaiser I added assertions to ensure that the traversel is finished only once and it seems that the final handler is called at least twice which is probably the main reason of the iterator dereferencation error

This might explain things. Will you investigate this?

@hkaiser
Copy link
Member Author

hkaiser commented Nov 19, 2017

@Naios I think the finished_ flag is the one which needs to be an atomic. We've had similar issues with dataflow, where we had to protect finalize from being called more than once with such a flag.

- demoting detached flag to plain boolean.
@hkaiser
Copy link
Member Author

hkaiser commented Nov 19, 2017

@Naios please see my latest patch adding an atomic finished flag and demoting the detached flag to a plain Boolean.

@Naios
Copy link
Contributor

Naios commented Nov 19, 2017

We could wrap the finished_ flag with #ifndef _NDEBUG, when it is only used by an assertion.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 19, 2017

We could wrap the finished_ flag with #ifndef _NDEBUG, when it is only used by an assertion.

It mainly prevents from calling the finalize code more than once.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 21, 2017

@Naios are you ok with this change now?

@Naios
Copy link
Contributor

Naios commented Nov 21, 2017

@hkaiser I'm not sure whether this is the correct solution. Yesterday I spent more time on this and at least for me it seems like the memory of the traversal frame got corrupted. Or the pointer is freed too early.
I added a breakpoint to the finish method and the first time the method is called the debugger shows strange values and the assertion is triggered as well.
Maybe it would be helpful to inspect this with asan and ubsan.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 21, 2017

@Naios Here is what I know. The main problem is that under certain circumstances, finalize might be called more than once. I have not been able to think of a scenario how this could happen, but apparently - in concurrent scenarios only - there is a chance of this occurring. Now, the first invocation of finalize moves the vector into the function invoked by dataflow. This invalidates the iterators into said vector - which is flagged as an error by the iterator checking facilities.

This PR proposes to fix this by changing three things:

  • Introduce an atomic finalized flag that prevents from executing the finalize functionality more than once. From introspecting the code, I'm convinced that it is safe to let all subsequent invocations of finalize just skip calling into the dataflow code. Those invocations could only come from the 'tail-end' of an asynchronous recursive execution of the iteration code.
  • Slightly change the dynamic range iteration such that the iterators themselves are not touched any more as soon as a particular asynchronous recursive execution of the iteration code has been detached. Those iterators could have become invalid because of finalize being called in a separate thread (see point 1).
  • Make the detached flag 'global' to a particular asynchronous recursive execution of the iteration code. That should guarantee to reliably detach any execution once one of the futures was not ready to begin with. This flag does not need to be atomic as it is sitting on the stack of the execution and can't be accessed by more than one thread concurrently. Also, this flag is now set to true 'before' a continuation is attached to the current future. This prevents races from the continuation being executed out of order.

@Naios
Copy link
Contributor

Naios commented Nov 21, 2017

I have not been able to think of a scenario how this could happen

There is an issue somewhere else, maybe we could just merge this workaround, since it was present in the original code too.

Did you test whether the async_complete handler is executed more than once through setting breakpoints?
Because in my scenario the frame was corrupted from the first invocation.

@hkaiser
Copy link
Member Author

hkaiser commented Nov 21, 2017

There is an issue somewhere else, maybe we could just merge this workaround, since it was present in the original code too.

Ok, I take this as you accepting this PR ;-)

Did you test whether the async_complete handler is executed more than once through setting breakpoints? Because in my scenario the frame was corrupted from the first invocation.

This is an assumption on my end as I have no other explanation for the things going on.

@hkaiser hkaiser merged commit bad0a9d into master Nov 23, 2017
@hkaiser hkaiser deleted the fixing_dataflow branch November 23, 2017 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants