Implement parallel is_heap and is_heap_until. #2654

taeguk · 2017-05-28T11:36:45Z

Check Box

implementation of is_heap.
unit tests for is_heap.
implementation of is_heap_until.
unit tests for is_heap_until.
benchmark codes for is_heap.
benchmark codes for is_heap_until.
benchmark for is_heap_until.

I writed benchmark codes. And then I found that my implementation's performance is better than std::* and has scalability as the number of cores increase.

Benchmark for is_heap_until (Fix the number of cores and adjust N)

The range [0, N) is heap.
Environment : Rostam. ( 8 physical cores / 16 logical cores )
The number of cores used = 16.
time unit : seconds

kind\N	100,000	1,000,000	10,000,000	100,000,000	1,000,000,000
std	0.00019	0.00149	0.01172	0.11785	1.02402
seq	0.00018	0.00140	0.01171	0.11217	1.01263
par	0.00053	0.00033	0.00125	0.01657	0.16828
par_unseq	0.00014	0.00018	0.00118	0.01629	0.16750

Benchmark for is_heap_until (Fix N and adjust M(the number of cores used))

The range [0, N) is heap.
N = 100,000,000
Environment : Rostam. ( 8 physical cores / 16 logical cores )
std::* version : 0.10221
time unit : seconds

kind\M	1	2	4	8	12	16	32
par	0.10097	0.05089	0.02685	0.01554	0.01595	0.01636	0.04736

Fix a compile error about shadowing template parameter. Fix a compile error about dependent type name.

hkaiser · 2017-05-28T13:34:55Z

I'm writing an application for measuring and analyzing the performance of is_heap_until. I want to include it to HPX code bases. Where do I add it to?

How about you add it here: https://github.com/STEllAR-GROUP/hpx/tree/master/tests/performance/local?

taeguk · 2017-05-29T00:01:51Z

@hkaiser I'll add many performance tests in summer. I think creating new folder and put them into it is better. So which path is good do you think?

/tests/performance/local/parallel_algorithms/
/tests/performance/parallel_algorithms/local/ (I think it is better.)
etc...

Naios · 2017-05-29T00:51:43Z

hpx/parallel/algorithms/is_heap.hpp

+            sequential(ExPolicy && policy, RandIter first, RandIter last,
+                Comp && comp)
+            {
+                return std::is_heap_until(first, last, std::forward<Comp>(comp));


I don't know whether it's really important here, but first and last are not perfectly forwarded to is_heap_until.

Yah, we do it inconsistently throughout the algorithm implementations. Most of the time iterators are passed by value, sometimes not. The Standard prescribes for the algorithms to take iterators by value (on the outermost layer), everything else is up to the implementation I guess.

Iterators are usually cheap to copy. So, pass-by-value is faster than pass-by-reference.
And when using pass-by-reference, there is a dangerous thing like below:

iterator func(iterator& first, iterator& last, int val) { while(first != last) { if(*fisrt == val) return first; ++first; } } func(first, first);

So, I think pass-by-value is better than perfect forwarding.

Iterators are usually cheap to copy. So, pass-by-value is faster than pass-by-reference.

While I agree with the first statement (even if the keyword here is usually), I don't see how pass-by-value can be faster than pass-by-reference. Could you elaborate, please?

@hkaiser By compiler, pass-by-reference is implemented using pointer in compiled assembly codes. So, when accessing element through reference, there is a dereferencing cost. So, pass-by-value is more faster than pass-by-reference.

Can zip_iterator<iterator1, iterator2, ...> be used with parallel algorithms?

@Naios @hkaiser I agree about size of iterator can be larger than a pointer. I want to hear your thoughts about what we use for parallel algorithm function interfaces. Which one is better choice as you think?

As I said above, the Standard mandates for the algorithms to take the iterators by value (at the outermost level). So that's what we should do. Otherwise we have not applied a uniform policy. I think almost everywhere we pass iterators by value, but I'm not sure. In the end we shouldn't be able to measure any difference, so I think you can leave things the way you have it right now.

Once you have resolved the merge conflict with master we can go ahead and merge this PR.

To be honest, my personal opinion is that one should avoid additional copies in API internal code, since you don't know how huge the object will be the users passes to it. Also @hkaiser has shown that compound iterators can have a significantly larger size (:eyes: ranges). Hence, call by value can have a performance drawback when a copy operation can't be optimized out (type erased objects).

Can zip_iterator<iterator1, iterator2, ...> be used with parallel algorithms?

Yes, and we do so in the implementation of some of the algorithms.

Formally, zip_iterator can be only an input iterator at best as the type returned by dereferencing it is not a true reference. However, we applied a somewhat illegal trick allowing to pretend that zip_iterator is more than an input iterator. We know that dereferencing our zip_iterator returns a tuple of stable references, which allows for it to be used in most contexts normally requiring better iterators. The only exception to this is sort_by_key which relies on swapping elements dereferenced by the underlying iterator. For this reason we have added this overload for swapping tuples of references: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/tuple.hpp#L985-L1001 which allows us to implement sort_by_key on top of sort using a zip_iterator.

Naios · 2017-05-29T00:52:00Z

hpx/parallel/algorithms/is_heap.hpp

+                if (count <= 1)
+                    return result::get(std::move(last));
+
+                RandIter second = first + 1;


Just a minor improvement: maybe you could use std::next over + 1?

is_heap requires random access iterators, so this should be fine.

Naios · 2017-05-29T00:52:31Z

hpx/parallel/algorithms/is_heap.hpp

+                    difference_type;
+
+                return is_heap_until_helper()(
+                    std::forward<ExPolicy>(policy), first, last,


Same for first and second as above

Naios · 2017-05-29T00:54:23Z

hpx/parallel/algorithms/is_heap.hpp

+    ///           That is, the last iterator it for which range [first, it) is a max heap.
+    ///
+    template <typename ExPolicy, typename RandIter, typename Comp = detail::less,
+    HPX_CONCEPT_REQUIRES_(


Naios · 2017-05-29T00:56:48Z

tests/unit/parallel/algorithms/if_heap_until.cpp

+        "HPX main exited with non-zero status");
+
+    return hpx::util::report_errors();
+}


missing newline at eof

Naios · 2017-05-29T01:02:21Z

tests/unit/parallel/algorithms/is_heap_until_tests.hpp

+        returned_from_algorithm = true;
+        f.get();
+
+        HPX_TEST(false);


It's far out of scope for this PR, but it would be nice if there would be something like a HPX_TEST_FAIL(); macro, like GTest provides one.
Just checked it and something like this doesn't seem to exist in the current codebase.

@Naios feel free to add it, we'd be happy to accept such a PR. Thanks!

hkaiser · 2017-05-29T16:38:23Z

I'll add many performance tests in summer. I think creating new folder and put them into it is better. So which path is good do you think?

/tests/performance/parallel_algorithms/local/ looks good to me.

hkaiser

Looks mostly very good, thanks!

hkaiser · 2017-05-29T16:41:23Z

docs/manual/parallel_algorithms.qbk

+[table Heap operations (In Header: <hpx/include/parallel_algorithm.hpp>)
+    [[Name]     [Description]   [In Header]]
+    [[ [algoref is_heap] ]
+     [Returns `true` if the range is max heap.]


What is a 'max heap' ?

@hkaiser in standard, is_heap check whether the given range is max heap.
http://en.cppreference.com/w/cpp/algorithm/is_heap

You might want to add a link to cppreference.com as this has an explanation for a max heap (see also: 47e122e). BTW, the actual Standard does not talk about max heaps, see: http://eel.is/c++draft/alg.heap.operations.

Hmm.. But, I think using word "max heap' is right. With std::less as a predicate, biggest element is root of heap.

Sure, just please add a link to cppreference.com like I did for the other algorithms, please (47e122e).

hkaiser · 2017-05-29T16:43:55Z

hpx/parallel/algorithms/is_heap.hpp

+
+        typedef execution::is_sequential_execution_policy<ExPolicy> is_seq;
+
+        return detail::is_heap<RandIter>().call(


Is there a benefit of explicitly implementing is_heap instead of simply relying on is_heap_until:

bool is_heap(first, last) { return is_heap_until(first, last) == last; }

?

There is a overhead in that way.
is_heap : https://github.com/taeguk/hpx/blob/176f86e11388b38dcc6afe9516b099246a7b6505/hpx/parallel/algorithms/is_heap.hpp#L77
is_heap_until : https://github.com/taeguk/hpx/blob/176f86e11388b38dcc6afe9516b099246a7b6505/hpx/parallel/algorithms/is_heap.hpp#L227
In case of is_heap, it can be canceled early than is_heap_until.

Surely, there are many duplications of codes.
I had a try to have a one helper with no overhead, not two helpers(is_heap_helper and is_heap_until_helper).
But, I was failed to that. Because util::partitioner returns algorithm_result<ExPolicy, R>, I can't determine whether using just result or result.get().
https://github.com/taeguk/hpx/blob/176f86e11388b38dcc6afe9516b099246a7b6505/hpx/parallel/algorithms/is_heap.hpp#L216
Surely, I can do that using template meta programming and enable_if. But, this is complex. And I think it seems not good.

In case of is_heap , it can be canceled early than is_heap_until .

You should be able to break early for is_heap_until as well.

In case of is_heap_until, I can't break the partition checking lower index than an index of element which breaks the heap. I can only break the partition checking higher index than it.
But in case of is_heap, when in any partition I find an element which breaks the heap, I can break all partitions.

Right, makes sense.

hkaiser · 2017-05-29T16:48:36Z

tests/unit/parallel/algorithms/is_heap_tests.hpp

+    {
+        if (this->name < t.name)
+            return true;
+        else if (this->name < t.name)


This does not make any sense.

I'll fix it.

hkaiser · 2017-05-29T16:50:04Z

tests/unit/parallel/algorithms/is_heap_tests.hpp

+        auto result = hpx::parallel::is_heap(policy,
+            iterator(boost::begin(c)), iterator(boost::end(c)), pred);
+        auto solution = std::is_heap(std::begin(c), std::end(c), pred);
+


I'd suggest to use bool here instead of auto. Same amount of typing, better readability...

I'll fix it.

hkaiser · 2017-05-29T16:50:46Z

tests/unit/parallel/algorithms/is_heap_tests.hpp

+            iterator(boost::begin(c)), iterator(boost::end(c)));
+        auto result = f.get();
+        auto solution = std::is_heap(std::begin(c), std::end(c));
+


Same here. Let's make those return types explicit.

I'll fix it.

@hkaiser

… mistakes in them. By accepting reviews from @hkaiser, refactor and fix codes related to parallel is_heap and is_heap_until.

taeguk · 2017-05-30T15:46:39Z

hpx/parallel/algorithms/is_heap.hpp

+                                base_idx, it, part_size, tok,
+                                [&tok, first, &comp](type const& v, std::size_t i)
+                                {
+                                    if (comp(*(first + i / 2), v))


@hkaiser @Naios How do you think about this overhead to dereferencing comp.
In line 73, comp is passed by reference. This dereferencing is executed many times. So, I think this is an issue to think about.
I think passing comp by value is better. How do you think about it?
This issue is related to find_if, too. (https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/algorithms/find.hpp#L208)

Since the comparison function is a function object, passing by reference here is fine. Passing by value requires a copy operation on the FO which will be more costly - in cases like this, where a lambda captures the FO (by reference) the compiler only has to call the function at the address supplied - it does not have to 'dereference' the pointer every time, so this is a zero cost reference capture in effect.

Am I obsessed with too small thing which just one memory access per loop iteration?

Possibly. Ultimately, you should time the algorithm, use a profiler if possible, and then make changes and do it again. If the dereferencing is a problem, you'll see a difference, if not, don't worry about it and move on to bigger things. Usually in cases like this, the algorithm itself can be order N, O(N^2), O(logN) etc etc and tweaking the way the algorithm works will make a much bigger impact than the micro-optimizations.

There will be cases where passing by reference/copy makes a difference and it's good to keep an eye open for them though, so don't stop looking (but time things if you can to confirm your hypotheses)

it does not have to 'dereference' the pointer every time, so this is a zero cost reference capture in effect.

@biddisco Did you mean that the compiler optimizes such dereferencings?

If the function object has no 'state', the only thing that matters is the call operator - which is just a function pointer. - an address. So when you pass the function object by reference, all the compiler has to do is call the function at the address. So yes, I suspect that when you have a function object with no member data, then the compiler really ought to be smart enough to know that there is no need to pass a reference to the object itself, but rather the function. However, as I write this down, I realize that it probably isn't true and there might well be a dereference going on ...
I think I need to try a small example with an assembler to be sure ....

But passing a function object by value, won't be any better, because you'll have to create a copy on the stack and then call the function call operator - which will be the same pointer (after all, the call operator will be a static function stored somewhere).

Maybe the call operator will be inlined anyway for a simple lambda ..... I need to research what's going on before I write more!

Hold on. the call operator is a just a function address. and when it is caled, it requires the args and a 'this' pointer (to the lambda in this case), so passing by refernce just requires the this pointer (the reference) and the function call. Passing by value, requires a copy, and then a call. So reference wins (even if the copy is made only once at the start of a loop for example because they both require a function call and a this pointer - and if the object has no state, the this pointer isn't even needed, so the compile will probably drop it)

@biddisco I've got it! That was my wrong thought.
The dereferencing will not occur even if FO is passed by reference because a function call needs a pointer anyway.
Obviously, pass-by-reference is better than pass-by-value in this case.

taeguk · 2017-05-31T11:58:32Z

@hkaiser Finally, I'm ready to be merged!!!

taeguk · 2017-06-05T09:43:28Z

Just note) This implementation's performance seems to be limited by memory bandwidth.

hkaiser · 2017-06-05T13:20:46Z

Just note) This implementation's performance seems to be limited by memory bandwidth.

Is this caused by the algorithm itself or is it because of the implementation?

taeguk · 2017-06-06T08:34:17Z

@hkaiser
As the algorithm itself, is_heap just iterates loop and performs comparison. There are not many computations, but there are a lot of memory accesses. Therefore, maybe performance is limited by memory bandwidth.
I think that utilizing cache more well is a one of ways to get better performance.

hpx/hpx/parallel/algorithms/is_heap.hpp

Line 75 in 67870fc

if (comp(*(first + i / 2), v))

In above code, accessing parent node is bad to cache. But I have no solutions to resolve that problem with keeping current algorithm for is_heap.
If anyone want to get better performance in is_heap, maybe they have to find other algorithms.
This is a note for someone in the future.

hkaiser · 2017-06-06T12:14:56Z

@taeguk, I think we can leave the code as it is for now as it is functionally correct and reasonably fast.

taeguk added 3 commits May 28, 2017 20:04

Implement parallel::is_heap_until. And add docs for that.

ae36afd

Add unit tests for parallel::is_heap_until.

b601bbc

Fix compile errors in gcc.

81fe713

Fix a compile error about shadowing template parameter. Fix a compile error about dependent type name.

hkaiser added category: algorithms type: enhancement labels May 28, 2017

hkaiser added this to the 1.1.0 milestone May 28, 2017

Naios reviewed May 29, 2017

View reviewed changes

taeguk added 3 commits May 29, 2017 18:35

Fix inspection errors.

03353fb

Implement parallel::is_heap. And add docs for that.

f165be0

Add unit tests for parallel::is_heap.

176f86e

taeguk changed the title ~~Implement parallel is_heap_until.~~ Implement parallel is_heap and is_heap_until. May 29, 2017

hkaiser requested changes May 29, 2017

View reviewed changes

Add benchmarks for parallel is_heap and is_heap_until.

30fa0d5

taeguk force-pushed the taeguk_parallel_is_heap branch from 2fd74f1 to 30fa0d5 Compare May 30, 2017 06:22

Refactor codes related to parallel is_heap and is_heap_until. And fix…

2329719

… mistakes in them. By accepting reviews from @hkaiser, refactor and fix codes related to parallel is_heap and is_heap_until.

hkaiser approved these changes May 30, 2017

View reviewed changes

Merge branch 'master' into taeguk_parallel_is_heap

d8ecc92

taeguk commented May 31, 2017

View reviewed changes

Use c++11 inline namespace with considering STEllAR-GROUP#2653.

50078d0

hkaiser merged commit ac340b2 into STEllAR-GROUP:master May 31, 2017

hkaiser mentioned this pull request May 31, 2017

Implement N4409 on top of HPX #1141

Closed

47 tasks


		typedef execution::is_sequential_execution_policy<ExPolicy> is_seq;

		return detail::is_heap<RandIter>().call(

Implement parallel is_heap and is_heap_until. #2654

Implement parallel is_heap and is_heap_until. #2654

Conversation

taeguk commented May 28, 2017 • edited

Check Box

Benchmark for is_heap_until (Fix the number of cores and adjust N)

Benchmark for is_heap_until (Fix N and adjust M(the number of cores used))

hkaiser commented May 28, 2017

taeguk commented May 29, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hkaiser May 29, 2017 • edited

Choose a reason for hiding this comment

taeguk May 30, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Naios May 29, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Naios May 29, 2017 • edited

Choose a reason for hiding this comment

Naios May 29, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hkaiser commented May 29, 2017

hkaiser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taeguk May 31, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taeguk May 31, 2017 • edited

Choose a reason for hiding this comment

taeguk commented May 31, 2017

taeguk commented Jun 5, 2017

hkaiser commented Jun 5, 2017

taeguk commented Jun 6, 2017 • edited

hkaiser commented Jun 6, 2017

taeguk commented May 28, 2017 •

edited

taeguk commented May 29, 2017 •

edited

hkaiser May 29, 2017 •

edited

taeguk May 30, 2017 •

edited

Naios May 29, 2017 •

edited

Naios May 29, 2017 •

edited

Naios May 29, 2017 •

edited

taeguk May 31, 2017 •

edited

taeguk May 31, 2017 •

edited

taeguk commented Jun 6, 2017 •

edited