New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parallel is_heap and is_heap_until. #2654
Implement parallel is_heap and is_heap_until. #2654
Conversation
Fix a compile error about shadowing template parameter. Fix a compile error about dependent type name.
How about you add it here: https://github.com/STEllAR-GROUP/hpx/tree/master/tests/performance/local? |
@hkaiser I'll add many performance tests in summer. I think creating new folder and put them into it is better. So which path is good do you think?
|
sequential(ExPolicy && policy, RandIter first, RandIter last, | ||
Comp && comp) | ||
{ | ||
return std::is_heap_until(first, last, std::forward<Comp>(comp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether it's really important here, but first
and last
are not perfectly forwarded to is_heap_until
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, we do it inconsistently throughout the algorithm implementations. Most of the time iterators are passed by value, sometimes not. The Standard prescribes for the algorithms to take iterators by value (on the outermost layer), everything else is up to the implementation I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterators are usually cheap to copy. So, pass-by-value is faster than pass-by-reference.
And when using pass-by-reference, there is a dangerous thing like below:
iterator func(iterator& first, iterator& last, int val) {
while(first != last) {
if(*fisrt == val) return first;
++first;
}
}
func(first, first);
So, I think pass-by-value is better than perfect forwarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterators are usually cheap to copy. So, pass-by-value is faster than pass-by-reference.
While I agree with the first statement (even if the keyword here is usually), I don't see how pass-by-value can be faster than pass-by-reference. Could you elaborate, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser By compiler, pass-by-reference is implemented using pointer in compiled assembly codes. So, when accessing element through reference, there is a dereferencing cost. So, pass-by-value is more faster than pass-by-reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can zip_iterator<iterator1, iterator2, ...>
be used with parallel algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, the Standard mandates for the algorithms to take the iterators by value (at the outermost level). So that's what we should do. Otherwise we have not applied a uniform policy. I think almost everywhere we pass iterators by value, but I'm not sure. In the end we shouldn't be able to measure any difference, so I think you can leave things the way you have it right now.
Once you have resolved the merge conflict with master we can go ahead and merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, my personal opinion is that one should avoid additional copies in API internal code, since you don't know how huge the object will be the users passes to it. Also @hkaiser has shown that compound iterators can have a significantly larger size (:eyes: ranges). Hence, call by value can have a performance drawback when a copy operation can't be optimized out (type erased objects).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can zip_iterator<iterator1, iterator2, ...> be used with parallel algorithms?
Yes, and we do so in the implementation of some of the algorithms.
Formally, zip_iterator
can be only an input iterator at best as the type returned by dereferencing it is not a true reference. However, we applied a somewhat illegal trick allowing to pretend that zip_iterator
is more than an input iterator. We know that dereferencing our zip_iterator
returns a tuple of stable references, which allows for it to be used in most contexts normally requiring better iterators. The only exception to this is sort_by_key
which relies on swapping elements dereferenced by the underlying iterator. For this reason we have added this overload for swapping tuples of references: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/tuple.hpp#L985-L1001 which allows us to implement sort_by_key
on top of sort
using a zip_iterator
.
if (count <= 1) | ||
return result::get(std::move(last)); | ||
|
||
RandIter second = first + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor improvement: maybe you could use std::next
over + 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_heap
requires random access iterators, so this should be fine.
hpx/parallel/algorithms/is_heap.hpp
Outdated
difference_type; | ||
|
||
return is_heap_until_helper()( | ||
std::forward<ExPolicy>(policy), first, last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for first
and second
as above
hpx/parallel/algorithms/is_heap.hpp
Outdated
/// That is, the last iterator it for which range [first, it) is a max heap. | ||
/// | ||
template <typename ExPolicy, typename RandIter, typename Comp = detail::less, | ||
HPX_CONCEPT_REQUIRES_( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"HPX main exited with non-zero status"); | ||
|
||
return hpx::util::report_errors(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline at eof
returned_from_algorithm = true; | ||
f.get(); | ||
|
||
HPX_TEST(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's far out of scope for this PR, but it would be nice if there would be something like a HPX_TEST_FAIL();
macro, like GTest provides one.
Just checked it and something like this doesn't seem to exist in the current codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naios feel free to add it, we'd be happy to accept such a PR. Thanks!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly very good, thanks!
docs/manual/parallel_algorithms.qbk
Outdated
[table Heap operations (In Header: <hpx/include/parallel_algorithm.hpp>) | ||
[[Name] [Description] [In Header]] | ||
[[ [algoref is_heap] ] | ||
[Returns `true` if the range is max heap.] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a 'max heap' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser in standard, is_heap check whether the given range is max heap.
http://en.cppreference.com/w/cpp/algorithm/is_heap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add a link to cppreference.com as this has an explanation for a max heap (see also: 47e122e). BTW, the actual Standard does not talk about max heaps, see: http://eel.is/c++draft/alg.heap.operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. But, I think using word "max heap' is right. With std::less as a predicate, biggest element is root of heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, just please add a link to cppreference.com like I did for the other algorithms, please (47e122e).
|
||
typedef execution::is_sequential_execution_policy<ExPolicy> is_seq; | ||
|
||
return detail::is_heap<RandIter>().call( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit of explicitly implementing is_heap
instead of simply relying on is_heap_until
:
bool is_heap(first, last) { return is_heap_until(first, last) == last; }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a overhead in that way.
is_heap : https://github.com/taeguk/hpx/blob/176f86e11388b38dcc6afe9516b099246a7b6505/hpx/parallel/algorithms/is_heap.hpp#L77
is_heap_until : https://github.com/taeguk/hpx/blob/176f86e11388b38dcc6afe9516b099246a7b6505/hpx/parallel/algorithms/is_heap.hpp#L227
In case of is_heap
, it can be canceled early than is_heap_until
.
Surely, there are many duplications of codes.
I had a try to have a one helper with no overhead, not two helpers(is_heap_helper and is_heap_until_helper).
But, I was failed to that. Because util::partitioner
returns algorithm_result<ExPolicy, R>
, I can't determine whether using just result
or result.get()
.
https://github.com/taeguk/hpx/blob/176f86e11388b38dcc6afe9516b099246a7b6505/hpx/parallel/algorithms/is_heap.hpp#L216
Surely, I can do that using template meta programming and enable_if. But, this is complex. And I think it seems not good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of is_heap , it can be canceled early than is_heap_until .
You should be able to break early for is_heap_until
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of is_heap_until, I can't break the partition checking lower index than an index of element which breaks the heap. I can only break the partition checking higher index than it.
But in case of is_heap, when in any partition I find an element which breaks the heap, I can break all partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense.
{ | ||
if (this->name < t.name) | ||
return true; | ||
else if (this->name < t.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make any sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it.
auto result = hpx::parallel::is_heap(policy, | ||
iterator(boost::begin(c)), iterator(boost::end(c)), pred); | ||
auto solution = std::is_heap(std::begin(c), std::end(c), pred); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to use bool
here instead of auto
. Same amount of typing, better readability...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it.
iterator(boost::begin(c)), iterator(boost::end(c))); | ||
auto result = f.get(); | ||
auto solution = std::is_heap(std::begin(c), std::end(c)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Let's make those return types explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it.
2fd74f1
to
30fa0d5
Compare
… mistakes in them. By accepting reviews from @hkaiser, refactor and fix codes related to parallel is_heap and is_heap_until.
base_idx, it, part_size, tok, | ||
[&tok, first, &comp](type const& v, std::size_t i) | ||
{ | ||
if (comp(*(first + i / 2), v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser @Naios How do you think about this overhead to dereferencing comp
.
In line 73, comp
is passed by reference. This dereferencing is executed many times. So, I think this is an issue to think about.
I think passing comp
by value is better. How do you think about it?
This issue is related to find_if
, too. (https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/algorithms/find.hpp#L208)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the comparison function is a function object, passing by reference here is fine. Passing by value requires a copy operation on the FO which will be more costly - in cases like this, where a lambda captures the FO (by reference) the compiler only has to call the function at the address supplied - it does not have to 'dereference' the pointer every time, so this is a zero cost reference capture in effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I obsessed with too small thing which just one memory access per loop iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. Ultimately, you should time the algorithm, use a profiler if possible, and then make changes and do it again. If the dereferencing is a problem, you'll see a difference, if not, don't worry about it and move on to bigger things. Usually in cases like this, the algorithm itself can be order N, O(N^2), O(logN) etc etc and tweaking the way the algorithm works will make a much bigger impact than the micro-optimizations.
There will be cases where passing by reference/copy makes a difference and it's good to keep an eye open for them though, so don't stop looking (but time things if you can to confirm your hypotheses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not have to 'dereference' the pointer every time, so this is a zero cost reference capture in effect.
@biddisco Did you mean that the compiler optimizes such dereferencings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function object has no 'state', the only thing that matters is the call operator - which is just a function pointer. - an address. So when you pass the function object by reference, all the compiler has to do is call the function at the address. So yes, I suspect that when you have a function object with no member data, then the compiler really ought to be smart enough to know that there is no need to pass a reference to the object itself, but rather the function. However, as I write this down, I realize that it probably isn't true and there might well be a dereference going on ...
I think I need to try a small example with an assembler to be sure ....
But passing a function object by value, won't be any better, because you'll have to create a copy on the stack and then call the function call operator - which will be the same pointer (after all, the call operator will be a static function stored somewhere).
Maybe the call operator will be inlined anyway for a simple lambda ..... I need to research what's going on before I write more!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on. the call operator is a just a function address. and when it is caled, it requires the args and a 'this' pointer (to the lambda in this case), so passing by refernce just requires the this pointer (the reference) and the function call. Passing by value, requires a copy, and then a call. So reference wins (even if the copy is made only once at the start of a loop for example because they both require a function call and a this pointer - and if the object has no state, the this pointer isn't even needed, so the compile will probably drop it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@biddisco I've got it! That was my wrong thought.
The dereferencing will not occur even if FO is passed by reference because a function call needs a pointer anyway.
Obviously, pass-by-reference is better than pass-by-value in this case.
@hkaiser Finally, I'm ready to be merged!!! |
Just note) This implementation's performance seems to be limited by memory bandwidth. |
Is this caused by the algorithm itself or is it because of the implementation? |
@hkaiser hpx/hpx/parallel/algorithms/is_heap.hpp Line 75 in 67870fc
In above code, accessing parent node is bad to cache. But I have no solutions to resolve that problem with keeping current algorithm for is_heap .If anyone want to get better performance in is_heap , maybe they have to find other algorithms.This is a note for someone in the future. |
@taeguk, I think we can leave the code as it is for now as it is functionally correct and reasonably fast. |
Check Box
I writed benchmark codes. And then I found that my implementation's performance is better than std::* and has scalability as the number of cores increase.
Benchmark for is_heap_until (Fix the number of cores and adjust N)
Benchmark for is_heap_until (Fix N and adjust M(the number of cores used))