Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First steps towards implementing execution par_unseq #3063

Closed
wants to merge 1 commit into from

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Dec 8, 2017

This is related to #2271

#elif defined(HPX_INTEL_VERSION) || defined(HPX_CLANG_VERSION)
#pragma ivdep
#pragma omp simd
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave omp out of the picture here since we'd require users to compile with -fopenmp then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other compilers, should we issue a warning?

@sithhell
Copy link
Member

sithhell commented Dec 9, 2017

Would be nice to have some confidence if it really has the expected effects (better code gen) before pushing it further.

@hkaiser
Copy link
Member Author

hkaiser commented Dec 9, 2017

Would be nice to have some confidence if it really has the expected effects (better code gen) before pushing it further.

I agree, but how can we get confidence without actually trying out things?

@sithhell
Copy link
Member

sithhell commented Dec 10, 2017 via email

@hkaiser hkaiser force-pushed the par_unseq branch 2 times, most recently from 85ae248 to 5409978 Compare December 15, 2017 15:29
@jeffhammond
Copy link

Summary

You can't use OpenMP simd (or for) with a != comparison for loop termination, i.e. the following is invalid (see below), at least as of OpenMP 4.5.

+#if defined(HPX_HAVE_OPENMP_SIMD)
 +#pragma omp simd
 +#endif
 +                for (/**/; it != end; ++it)
 +                {
 +                    f(it);
 +                }

OpenMP 5 is expected to fix this. I don't know the precise status of that proposal but can look if you want.

I recall that Intel 18 compilers do not object to != in an OpenMP for loop (haven't checked SIMD explicitly) but GCC 7 and Clang 5 do.

I would not expect a compiler to attempt to vectorize container accesses if a random access iterator is not supported...

Details

The relevant portions of the OpenMP 4.5 specification are:

  • Section 2.8.1

The simd directive places restrictions on the structure of the associated for-loops. Specifically, all associated for-loops must have canonical loop form (Section 2.6 on page 53).

  • Section 2.6

A loop has canonical loop form if it conforms to the following:
for (init-expr; test-expr; incr-expr) structured-block
...
test-expr One of the following:
var relational-op b
b relational-op var
...
relational-op One of the following:
<
<=
>
>=

@hkaiser
Copy link
Member Author

hkaiser commented Jan 21, 2018

@jeffhammond thanks for this information. This branch is really meant to be used for experimentation towards implementing par_unseq, and I admit that I don't have too much experience with this.

find_package(OpenMP QUIET)
if(OPENMP_FOUND)
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel")
hpx_add_compile_flag_if_available(-openmp-simd)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-openmp-simd is deprecated (https://software.intel.com/en-us/node/693432). It was replaced by -qopenmp-simd in version 16 or 17.

I recommend that CMake test for -fopenmp-simd, then -qopenmp-simd then -openmp-simd. ICC supports many of the GCC flags for compatibility...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

@@ -253,9 +253,9 @@ namespace hpx { namespace parallel { namespace util
HPX_FORCEINLINE void prefetch_addresses(T const& ... ts)
{
int const sequencer[] = {
(_mm_prefetch(
0, (_mm_prefetch(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the non-x86 code path, you may want to use __builtin_prefetch, which both GCC and Clang support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that, MSVC however supports _mm_prefetch only.

@jeffhammond
Copy link

@hkaiser These are important experiments, in part because the OpenMP standard working group recognizes that support for C++ in OpenMP is lacking. While 5.0 is pretty close to finished, your experience in implementing PSTL will be useful in identifying gaps that should be addressed in the next iteration.

I don't know what code you allow yourself to look at (due to licensing), but both RAJA and Intel PSTL may be useful. In particular, RAJA supports a bunch of different pragmas for persuading compilers to vectorize inner loops.

@msimberg msimberg removed this from the 1.1.0 milestone Mar 22, 2018
@msimberg
Copy link
Contributor

Closing this since it's not actively being worked on, but if someone feels like picking this up again, feel free to do so!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants