-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small optimization to Vector::Equals to improve performance. #308
Conversation
This function showed up surprisingly high on a CPU time profile when the GUI was unresponsive "doing things". Removed a duplicated difference in the not-equal case, and switched to abs and a single compare instead of two compares with a negation. It seems to have moved the function further down in the profile.
double dx = v.x - x; if(dx < -tol || dx > tol) return false; | ||
double dy = v.y - y; if(dy < -tol || dy > tol) return false; | ||
double dz = v.z - z; if(dz < -tol || dz > tol) return false; | ||
const Vector dv = this->Minus(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is faster than it was?
I think Minus() call can break performance, because it run for every call. It's faster to run bbox test without computing full difference firstly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minus can be vectorized, or sometimes (seen in practice) auto-vectorized.
|
||
return (this->Minus(v)).MagSquared() < tol*tol; | ||
return dv.MagSquared() < tol*tol; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super rare case when we are in bbox, but not inside sphere, so we even can remove this and just return true. But I don't beleive this really helps. Look at SnapToMesh function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you run profiling as release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RelWith Deb Info, yes.
I don't remember the exact context of this little change, but I do remember that my profiling tests (which included load and export to stl, as well as manipulation of a model that had gotten too complex to edit interactively) showed these really simple things (in some cases sanity checks, or even ostensibly "optimization") as major hotspots, and after making these changes, which mostly reduce duplicated work and avoid scalar manipulation in favor of vector (so auto vectorization can take place, or eventually replacement of some internals with a manually vectorized library like Eigen), these functions disappeared from the hot spots list (leaving primarily list and associated string operations/copies). It was also noticably faster and more usable in the aforementioned pathological model.
It was also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what problem did you solve? Which case of performance drop? Can you describe your test case in details? I will try to test it.
double dy = v.y - y; if(dy < -tol || dy > tol) return false; | ||
double dz = v.z - z; if(dz < -tol || dz > tol) return false; | ||
const Vector dv = this->Minus(v); | ||
if (std::abs(dv.x) > tol) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is abs really faster? Is this because it compile as intrinsic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if I tested this single change independently. However, you're on the right path with intrinsic: my hunch is it gives better data to the compiler and thus can be optimized better. (It also reduces operations: one comparison, and the abs which probably is a bit manipulation, vs two comparisons, but that's down to the level where I'm not sure if intuition applies when compiling with a sophisticated modern compiler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the opinon about this (ofc this should be tested): for your case dv calculated always (for all the three dimensions), which is probably resultedd in a function call (in case where inline is not performed). In case of previous implementation, the function have ability to terminate early without computation of whole vector difference.
371f33b
to
fb1065d
Compare
Thank you @rpavlik, and I apologize for taking so long to review. |
np, glad you liked it enough to merge it! I think I recently just pushed some wip branches from my old machine that I had alluded to before, so if you're interested in some cleanup, you're welcome to look at them (or ignore them, your choice) I also recently, accidentally, made a bit of a promo/intro for solvespace in showing how to customize a little 3d printable gizmo: https://www.thingiverse.com/thing:3625652 |
I have a lot of workload so I don't think I have nearly enough time to merge someone else's cleanups without their assistance, explanations of rationale, etc. |
certainly, no trouble at all. I've still got it on my backlog to review those and push the results as proper PRs. I should load up one of the torture files I was working on when I found the need to work on those patches, see if the problems have gone away already. |
This (as well as some other unexpected locations, like
solvespace/src/polygon.cpp
Line 23 in c6fc012
solvespace/src/util.cpp
Line 587 in c6fc012
This is a fairly small change, but it seemed to improve the profiling results here. I imagine there are further ways to do this, or better ways (do we have profiling data that the early-outs actually improve performance? since this is "inner-loop" code essentially - they might just be extra pain for the branch predictor...), so please do perform your own verification of whether this improves perf for you before merging.