-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bernstein polynomials with no branching. #591
Conversation
src/srf/ratpoly.cpp
Outdated
@@ -13,84 +13,31 @@ | |||
// and convergence should be fast by now. | |||
#define RATPOLY_EPS (LENGTH_EPS/(1e2)) | |||
|
|||
double SolveSpace::Bernstein(int k, int deg, double t) | |||
// indexed by [degree][k][exponent] | |||
const double bernstein_coeff[4][4][4] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be static
too, for the same reason the functions are. In this case probably even static constexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
(Thanks for keeping the git history clean!) I'm not sure why the Windows build breaks--does our MSVC version miss |
Looks like MS didn't add constexpr until VS2015. See the bottom of page: |
} | ||
break; | ||
} | ||
ssassert(false, "Unexpected degree of spline"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I was going to complain about the lack of range checking but it seems that when the calls are evaluated at compile time nothing "bad" happens:
https://godbolt.org/z/oLA-gN
Maybe leave an assert for debug mode (-O0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruevs I was a little concerned about the range checking too, but these are only called by other functions in ratpoly.cpp and those are lacking range checking.
src/srf/ratpoly.cpp
Outdated
{ { -2.0,2.0,0.0 }, { 2.0,-4.0,0.0 },{ 0.0,2.0,0.0 }, { 0.0,0.0,0.0 } }, | ||
{ { -3.0,6.0,-3.0 },{ 3.0,-12.0,9.0 },{ 0.0,6.0,-9.0}, { 0.0,0.0,3.0 } } }; | ||
|
||
static double Bernstein(int k, int deg, double t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
static double Bernstein(cons int k, const int deg, const double t)
But the optimizer figures it out anyway...
} | ||
|
||
double SolveSpace::BernsteinDerivative(int k, int deg, double t) | ||
static double BernsteinDerivative(int k, int deg, double t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
static double BernsteinDerivative(cons int k, const int deg, const double t)
But the optimizer figures it out anyway...
src/srf/ratpoly.cpp
Outdated
@@ -13,84 +13,31 @@ | |||
// and convergence should be fast by now. | |||
#define RATPOLY_EPS (LENGTH_EPS/(1e2)) | |||
|
|||
double SolveSpace::Bernstein(int k, int deg, double t) | |||
// indexed by [degree][k][exponent] | |||
static const double bernstein_coeff[4][4][4] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even move these inside the respective functions. (as static const
of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved them inside thinking it would look worse, but now I like it.
By the way, you were probably not able to measure a performance difference, because modern compilers are pretty good. Take a look at this: |
@ruevs with the new structure we could do even better. These two functions are always called from within a loop over the k-th polynomial. We could transpose the inner 2 dimensions of the array so we'd have vectors of coefficients for each power. Then write fixed-size loops 0-3 to do each term of the polynomials. SSE or AVX optimizations would compute all 4 polynomials in about the time this does one. It could return (or fill in) an array of 4 results and take the call out of the loops in higher level functions (which might then be more ready for the compiler to optimize). But that all seems like premature optimisation. I think this PR reduces complexity ;-) |
@ruevs I did the parallel implementation on my bernstein branch if you want to see. I used plain for-loops to go over all 4 polynomials one term at a time. From what I've read that's what modern vectorizing compilers like. It slowed things down dramatically (test went from 127ms per frame to 137) when I first changed it but only returned 1 of the 4 values. I didn't expect that, thinking AVX could do all 4 in the time it would normally do only 1. Then I realized I'm not sure how to confirm my compiler setting when building SolveSpace - it's probably using Fedora defaults? IDK. Then I modified it to the latest form and changed PointAt() to make only 1 call and that got the performance back to where it was. I then modified the other 3 functions to use the vector form of the functions and ended up back at 127ms, but it's not doing 4x the work any more. I still think it could run much faster and probably isn't being properly optimized (I'm on a Zen+ core). Anyway, I have a new version that theoretically could be faster, but I'm considering it shelved until I figure out how to do better measurements and get a handle on what GCC is doing or not. Lesson: SolveSpace does make a LOT of calls to PointAt() and will suffer measurably if that function is implemented poorly. |
I suspect you'll need to pass |
Bernstein polynomials with no branching. (solvespace#591)
@whitequark where do I put that? This build system is new to me and I can't find where the compiler flags get set. |
CMAKE_CXX_FLAGS in the cmake gui or command line (eg: mkdir build
cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_FLAGS=-march=native ) The lowest-hanging fruit performance-wise I noticed some (semi-long) time ago, at least when testing on Windows, was the stuff I poked at in #432 (recently updated) I thought I handled PointAt, but I was confusing it with DistanceToLine (another hotspot I saw when working with models that caused super slow recompute/export) |
This is cleaner and smaller, but I was not able to measure a performance impact.