-
Notifications
You must be signed in to change notification settings - Fork 70
Implement some operations of R3Element using intrinsics #1722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Clang on macOS can't find |
We should probably just use |
There is no |
geometry/r3_element_body.hpp
Outdated
@@ -33,6 +39,10 @@ R3Element<Scalar>::R3Element(Scalar const& x, | |||
Scalar const& y, | |||
Scalar const& z) : x(x), y(y), z(z) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the body of the constructor:
static_assert(std::is_standard_layout<R3Element>::value, "blah");
This should ensure that the union is safe, because of the rules pertaining to standard-layout unions (or at least safer; __m128d
is heavily implementation-defined, so it could in theory fail to be layout-compatible with doubles).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
geometry/r3_element_body.hpp
Outdated
#if PRINCIPIA_USE_SSE2_INTRINSICS | ||
__m128d const left_128d = ToM128D(left); | ||
return R3Element<Scalar>(_mm_mul_pd(left_128d, right.xy), | ||
_mm_mul_sd(left_128d, right.zt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems nicer to do
_mm_mul_sd(right.zt, left_128d)
for the second one, so that the result has right.t
; in particular, if we wanted to ensure that t=0
this would preserve the invariant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to use _mm_set_sd(left)
(and a suitable wrapper in the Quantity case) as the operand for the scalar multiplication, so that we are not waiting for the unpack generated from _mm_set1_pd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the constant argument to the right, although it doesn't seem to have much effect. Not changing to _mm_set_sd
as it burns an extra register.
geometry/r3_element_body.hpp
Outdated
__m128d const left_128d = ToM128D(left); | ||
return R3Element<Product<Quantity<LDimension>, RScalar>>( | ||
_mm_mul_pd(left_128d, right.xy), | ||
_mm_mul_sd(left_128d, right.zt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
According to IACA the throughput of these operations goes from 3.00 to 2.00 cycles on Haswell, and this is reflected by a gain of up to 40% on the benchmarks of the polynomials. As expected, the effect is more pronounced for Estrin evaluation.
Before:
After: