Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small optimization to Vector::Equals to improve performance. #308

Merged
merged 1 commit into from May 13, 2019

Conversation

rpavlik
Copy link
Contributor

@rpavlik rpavlik commented Jan 2, 2018

This (as well as some other unexpected locations, like

if(MinAltitude() < LENGTH_EPS) {
and
double Vector::DistanceToLine(Vector p0, Vector dp) const {
) showed up as a substantial hot-spot when I ran the VS CPU profiler in VS2017 on an x64 build of the latest master, looking at time periods when the GUI was non-responsive and SolveSpace was consuming a full core of CPU "doing something".

This is a fairly small change, but it seemed to improve the profiling results here. I imagine there are further ways to do this, or better ways (do we have profiling data that the early-outs actually improve performance? since this is "inner-loop" code essentially - they might just be extra pain for the branch predictor...), so please do perform your own verification of whether this improves perf for you before merging.

This function showed up surprisingly high on a CPU time profile
when the GUI was unresponsive "doing things". Removed a duplicated
difference in the not-equal case, and switched to abs and a single compare
instead of two compares with a negation. It seems to have moved the
function further down in the profile.
double dx = v.x - x; if(dx < -tol || dx > tol) return false;
double dy = v.y - y; if(dy < -tol || dy > tol) return false;
double dz = v.z - z; if(dz < -tol || dz > tol) return false;
const Vector dv = this->Minus(v);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is faster than it was?
I think Minus() call can break performance, because it run for every call. It's faster to run bbox test without computing full difference firstly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minus can be vectorized, or sometimes (seen in practice) auto-vectorized.


return (this->Minus(v)).MagSquared() < tol*tol;
return dv.MagSquared() < tol*tol;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super rare case when we are in bbox, but not inside sphere, so we even can remove this and just return true. But I don't beleive this really helps. Look at SnapToMesh function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you run profiling as release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RelWith Deb Info, yes.

I don't remember the exact context of this little change, but I do remember that my profiling tests (which included load and export to stl, as well as manipulation of a model that had gotten too complex to edit interactively) showed these really simple things (in some cases sanity checks, or even ostensibly "optimization") as major hotspots, and after making these changes, which mostly reduce duplicated work and avoid scalar manipulation in favor of vector (so auto vectorization can take place, or eventually replacement of some internals with a manually vectorized library like Eigen), these functions disappeared from the hot spots list (leaving primarily list and associated string operations/copies). It was also noticably faster and more usable in the aforementioned pathological model.

It was also

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what problem did you solve? Which case of performance drop? Can you describe your test case in details? I will try to test it.

double dy = v.y - y; if(dy < -tol || dy > tol) return false;
double dz = v.z - z; if(dz < -tol || dz > tol) return false;
const Vector dv = this->Minus(v);
if (std::abs(dv.x) > tol) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is abs really faster? Is this because it compile as intrinsic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if I tested this single change independently. However, you're on the right path with intrinsic: my hunch is it gives better data to the compiler and thus can be optimized better. (It also reduces operations: one comparison, and the abs which probably is a bit manipulation, vs two comparisons, but that's down to the level where I'm not sure if intuition applies when compiling with a sophisticated modern compiler)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the opinon about this (ofc this should be tested): for your case dv calculated always (for all the three dimensions), which is probably resultedd in a function call (in case where inline is not performed). In case of previous implementation, the function have ability to terminate early without computation of whole vector difference.

@whitequark whitequark force-pushed the master branch 3 times, most recently from 371f33b to fb1065d Compare July 12, 2018 20:24
@whitequark
Copy link
Contributor

With gcc 8.3 in RelWithDebInfo mode, before:
Screenshot_20190513_145641
after:
Screenshot_20190513_145646

To me this looks like a clear improvement. Most people should use Release artifacts so that's what we care about.

@whitequark whitequark merged commit a9ec644 into solvespace:master May 13, 2019
@whitequark
Copy link
Contributor

Thank you @rpavlik, and I apologize for taking so long to review.

@rpavlik
Copy link
Contributor Author

rpavlik commented May 13, 2019

np, glad you liked it enough to merge it! I think I recently just pushed some wip branches from my old machine that I had alluded to before, so if you're interested in some cleanup, you're welcome to look at them (or ignore them, your choice)

I also recently, accidentally, made a bit of a promo/intro for solvespace in showing how to customize a little 3d printable gizmo: https://www.thingiverse.com/thing:3625652

@whitequark
Copy link
Contributor

I think I recently just pushed some wip branches from my old machine that I had alluded to before, so if you're interested in some cleanup, you're welcome to look at them (or ignore them, your choice)

I have a lot of workload so I don't think I have nearly enough time to merge someone else's cleanups without their assistance, explanations of rationale, etc.

@rpavlik
Copy link
Contributor Author

rpavlik commented May 15, 2019

certainly, no trouble at all. I've still got it on my backlog to review those and push the results as proper PRs.

I should load up one of the torture files I was working on when I found the need to work on those patches, see if the problems have gone away already.

@rpavlik rpavlik deleted the vector-equals-perf branch May 15, 2019 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants