Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bernstein polynomials with no branching. #591

Merged
merged 1 commit into from May 2, 2020

Conversation

phkahler
Copy link
Member

@phkahler phkahler commented May 1, 2020

This is cleaner and smaller, but I was not able to measure a performance impact.

@@ -13,84 +13,31 @@
// and convergence should be fast by now.
#define RATPOLY_EPS (LENGTH_EPS/(1e2))

double SolveSpace::Bernstein(int k, int deg, double t)
// indexed by [degree][k][exponent]
const double bernstein_coeff[4][4][4] = {
Copy link
Contributor

@whitequark whitequark May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be static too, for the same reason the functions are. In this case probably even static constexpr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@whitequark
Copy link
Contributor

(Thanks for keeping the git history clean!)

I'm not sure why the Windows build breaks--does our MSVC version miss constexpr? I thought it should have C++11...

@phkahler
Copy link
Member Author

phkahler commented May 2, 2020

Looks like MS didn't add constexpr until VS2015. See the bottom of page:
https://docs.microsoft.com/en-us/cpp/cpp/constexpr-cpp?view=vs-2019
I'll make them static const.

}
break;
}
ssassert(false, "Unexpected degree of spline");
Copy link
Member

@ruevs ruevs May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I was going to complain about the lack of range checking but it seems that when the calls are evaluated at compile time nothing "bad" happens:
https://godbolt.org/z/oLA-gN
Maybe leave an assert for debug mode (-O0)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruevs I was a little concerned about the range checking too, but these are only called by other functions in ratpoly.cpp and those are lacking range checking.

{ { -2.0,2.0,0.0 }, { 2.0,-4.0,0.0 },{ 0.0,2.0,0.0 }, { 0.0,0.0,0.0 } },
{ { -3.0,6.0,-3.0 },{ 3.0,-12.0,9.0 },{ 0.0,6.0,-9.0}, { 0.0,0.0,3.0 } } };

static double Bernstein(int k, int deg, double t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe
static double Bernstein(cons int k, const int deg, const double t)
But the optimizer figures it out anyway...

}

double SolveSpace::BernsteinDerivative(int k, int deg, double t)
static double BernsteinDerivative(int k, int deg, double t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe
static double BernsteinDerivative(cons int k, const int deg, const double t)
But the optimizer figures it out anyway...

@@ -13,84 +13,31 @@
// and convergence should be fast by now.
#define RATPOLY_EPS (LENGTH_EPS/(1e2))

double SolveSpace::Bernstein(int k, int deg, double t)
// indexed by [degree][k][exponent]
static const double bernstein_coeff[4][4][4] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even move these inside the respective functions. (as static const of course).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them inside thinking it would look worse, but now I like it.

@ruevs
Copy link
Member

ruevs commented May 2, 2020

By the way, you were probably not able to measure a performance difference, because modern compilers are pretty good. Take a look at this:
https://godbolt.org/z/XxefSH
But I still like your new implementation :-)

@phkahler
Copy link
Member Author

phkahler commented May 2, 2020

@ruevs with the new structure we could do even better. These two functions are always called from within a loop over the k-th polynomial. We could transpose the inner 2 dimensions of the array so we'd have vectors of coefficients for each power. Then write fixed-size loops 0-3 to do each term of the polynomials. SSE or AVX optimizations would compute all 4 polynomials in about the time this does one. It could return (or fill in) an array of 4 results and take the call out of the loops in higher level functions (which might then be more ready for the compiler to optimize).

But that all seems like premature optimisation. I think this PR reduces complexity ;-)

@phkahler phkahler merged commit 7366a6c into solvespace:master May 2, 2020
@phkahler
Copy link
Member Author

phkahler commented May 3, 2020

@ruevs I did the parallel implementation on my bernstein branch if you want to see. I used plain for-loops to go over all 4 polynomials one term at a time. From what I've read that's what modern vectorizing compilers like. It slowed things down dramatically (test went from 127ms per frame to 137) when I first changed it but only returned 1 of the 4 values. I didn't expect that, thinking AVX could do all 4 in the time it would normally do only 1. Then I realized I'm not sure how to confirm my compiler setting when building SolveSpace - it's probably using Fedora defaults? IDK. Then I modified it to the latest form and changed PointAt() to make only 1 call and that got the performance back to where it was. I then modified the other 3 functions to use the vector form of the functions and ended up back at 127ms, but it's not doing 4x the work any more. I still think it could run much faster and probably isn't being properly optimized (I'm on a Zen+ core).

Anyway, I have a new version that theoretically could be faster, but I'm considering it shelved until I figure out how to do better measurements and get a handle on what GCC is doing or not.

Lesson: SolveSpace does make a LOT of calls to PointAt() and will suffer measurably if that function is implemented poorly.

@whitequark
Copy link
Contributor

Then I realized I'm not sure how to confirm my compiler setting when building SolveSpace - it's probably using Fedora defaults? IDK.

I suspect you'll need to pass -march=native explicitly to see any use of AVX. Also, AVX isn't necessarily going to result in an overall throughput increase due to AVX offset reducing turbo frequency on Intel CPUs.

fbradasc added a commit to fbradasc/solvespace that referenced this pull request May 4, 2020
Bernstein polynomials with no branching. (solvespace#591)
@phkahler
Copy link
Member Author

phkahler commented May 4, 2020

@whitequark where do I put that? This build system is new to me and I can't find where the compiler flags get set.

@rpavlik
Copy link
Contributor

rpavlik commented May 4, 2020

CMAKE_CXX_FLAGS in the cmake gui or command line (eg:

mkdir build
cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_FLAGS=-march=native

)

The lowest-hanging fruit performance-wise I noticed some (semi-long) time ago, at least when testing on Windows, was the stuff I poked at in #432 (recently updated) I thought I handled PointAt, but I was confusing it with DistanceToLine (another hotspot I saw when working with models that caused super slow recompute/export)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants