-
Notifications
You must be signed in to change notification settings - Fork 511
System::CalculateRank() is slow #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
@bmorel Can you attach the file? |
I have few days to work on that anew, I'll update my repo and redo the tests, maybe spend some time trying to automate them. |
Yes, the file on which you see speedups. |
Found the file I used for the report... thought I had it already linked it and deleted it from PC, but neither of those were true. I built a .zip archive containing a readme, that source file and the results I have now. |
The major reason is that Compiler can sometimes optimize I did test However, this only happens with Examples: https://godbolt.org/z/875WfD (no optimization flags or https://godbolt.org/z/3ZXaM3 ( https://godbolt.org/z/ZAA-4P (
Anyway, just my two cents. |
As a person that has quite a bit of experience with numerical algorithms, I am never a fan of seeing something like this
It is okish if the number of iterations is small (<10), and data are not pathological (mix of very small and very big values). In all other cases (including 100 or more of seemingly normal values), it is better to use something like Kahan's summation algorithm. It can be easily abstracted in C++ too, and is rather fast.
It does work correctly in gcc and clang even with |
My best question for inteviewing developers: How to summ an array of floats?
|
Oh, Kahan is like delta sigma modulation but for software. Very nice. |
My best reply: check that this is where the performance problem lies, otherwise it's easier to have easy to read code. Now, if there are lines that should be changed, considered the knowledge you guys have to improve more, I think you should just apply those changes, maybe without applying the patches I've sent. Otherwise, that issue should be closed. |
@bmorel the problem with perfromance is the algorithm complexity o(n^3) or o(n^4) - don't actually remember. Any optimization of inner loops can be pointless with such complexity. We don't want to say your code is bad, etc. But just nor mine nor your code solves this problem. The real solution is use Eigen library with sparse-matix algorithms with better alogrithm complexity. |
@bmorel Indeed keyb_case_MAX_UNKNOWNS_2048.slvs takes 25 seconds to load and your optimizations do help. At least in my case (VS 2019 release with debug info) the fastest is your
|
Function Name | Total CPU [unit, %] | Self CPU [unit, %] | Module |
---|---|---|---|
solvespace.exe (PID: 25076) | 203448 (100.00%) | 0 (0.00%) | solvespace.exe |
SolveSpace::System::CalculateRank | 112811 (55.45%) | 112615 (55.35%) | solvespace.exe |
SolveSpace::System::SolveLeastSquares | 84205 (41.39%) | 71577 (35.18%) | solvespace.exe |
SolveSpace::System::SolveLinearSystem | 12526 (6.16%) | 12508 (6.15%) | solvespace.exe |
[External Code] | 203447 (100.00%) | 4390 (2.16%) | Multiple modules |
[System Call] ntoskrnl.exe | 356 (0.17%) | 356 (0.17%) | ntoskrnl.exe |
SolveSpace::Expr::Substitute | 483 (0.24%) | 343 (0.17%) | solvespace.exe |
SolveSpace::Expr::Children | 259 (0.13%) | 259 (0.13%) | solvespace.exe |
SolveSpace::Expr::Eval | 197 (0.10%) | 197 (0.10%) | solvespace.exe |
SolveSpace::System::SolveBySubstitution | 716 (0.35%) | 140 (0.07%) | solvespace.exe |
full-1
21 seconds
Function Name | Total CPU [unit, %] | Self CPU [unit, %] | Module |
---|---|---|---|
solvespace.exe (PID: 14024) | 174927 (100.00%) | 0 (0.00%) | solvespace.exe |
SolveSpace::System::CalculateRank | 84259 (48.17%) | 84102 (48.08%) | solvespace.exe |
SolveSpace::System::SolveLeastSquares | 84299 (48.19%) | 71546 (40.90%) | solvespace.exe |
SolveSpace::System::SolveLinearSystem | 12639 (7.23%) | 12623 (7.22%) | solvespace.exe |
[External Code] | 174926 (100.00%) | 4546 (2.60%) | Multiple modules |
SolveSpace::Expr::Substitute | 483 (0.28%) | 344 (0.20%) | solvespace.exe |
SolveSpace::Expr::Children | 281 (0.16%) | 280 (0.16%) | solvespace.exe |
[System Call] ntoskrnl.exe | 166 (0.09%) | 166 (0.09%) | ntoskrnl.exe |
SolveSpace::Expr::Eval | 162 (0.09%) | 161 (0.09%) | solvespace.exe |
This is a decent gain and the code is simple - just taking the "row" indexing out of the inner loops. I'm inclined to make a pull request for this. @phkahler @rpavlik what do you think?
With the above in mind I agree with @Evil-Spirit - this is just an optimization of an O(n^3) algorithm. "Doomed" by definition - for example in our case it would not "allow" us to increase the unknowns to 2048.
Well, the point was not to have a perfect solution, I perfectly knew (and still know) that I lack knowledge (and time) for bigger, more efficient changes. The point was to improve situation. |
The speed of the system is "irrelevant" - this algorithm will be "slow" on any system - by it's nature (by slow I do not mean "the human will be annoyed" - even though that will happen with enough objects) .
I understand. And as I said "I'm inclined to make a pull request for this." - 16% less time is 16% less time :-) But let's see what @phkahler and @rpavlik will say. If it will make you feel better Eigen (the library in question) has been "waiting" since April 2017 ;-)
|
Yep. That's what I had in mind :)
It does not. I think this is sad to freeze improvements because of the big nice silver bullet is «coming soon(tm)», and I like solvespace :) |
@ruevs I'm inclined not to make changes like this in the solver because it will make the @Evil-Spirit solver patches less likely to apply cleanly. And those changes will definitely be a better performance increase, but last I tried there were some problems. I'd also like his patch series split into two - the solver changes and then the Eigen integration. IIRC somewhere in all that a memory leak was introduced that was never tracked down. My last attempt at doing part one: BTW I think solver performance is becoming a bigger issue than it used to be since so many other parts of the code are faster now. PRs that work toward better algorithmic complexity and use of Eigen sparse matrix code are very much welcome. Just be aware of what has been done before but never merged. |
so... the problem is that it might be harder to rebase? If that's the only problem to accept the kind of patches I proposed years ago, then I'm ok to do it. I even have feature requests in mind, to help my own usage, that I might find time to implement myself. An easy task would certainly help to (de)motivate (depending on how things goes on) me to find willpower to implement or even just post issues on them. |
@bmorel - Your help will be gladly accepted. I wasn't a contributor back when this issue was opened, nor when the other solver work was done (but not merged). I consider all the open issues on github to be either nice ideas when someone has time, or backlog of things that really should be done. As with most things I find the challenge is balancing short and long term goals. Leaning toward long term in this case can mean very long term unfortunately. Any help in pulling things together sooner is good. Even the resurfacing of this issue is making me see the importance of solver improvements.
Please open those feature requests. If they're reasonable and not redundant they will be kept open even if nobody gets to them in the near term. Sometimes a feature request leads to a different solution to the same problem. Pick an area that interests you and jump in - just not the solver right now unless you want to take on the big changes ;-) |
I think I had a crack at integrating the two eigen branches. Mine was a direct translation, while the other was a greater improvement along with the use of Eigen. (we have an exploded view setting now? Fancy!) I've pushed it here, though my most recent commit messages on the branch make me nervous that it's possibly nonfunctional: https://github.com/rpavlik/solvespace/tree/eigen_over_master |
OK, and I split stuff up a bit: #1156 is a pretty minimal introduction of Eigen, from long ago (my previous job email!) that noticeably improved performance at that time. Tests pass. Doesn't touch anything dramatic so shouldn't introduce any leaks. I've updated my eigen_over_master branch now to build on it: if it's interesting to review the big refactor from @Evil-Spirit separately, it doesn't actually depend on Eigen before the "Convert main system Jacobian code to using Eigen" commit so it can be rebased away and evaluated separately. Unfortunately the weird List and IdList structures are and will continue to be leak-prone as long as they silently share memory, but my previous attempts (to replace their backing stores with |
Eigen 3.4.0 wants C++14 by default - have to add #define EIGEN_MAX_CPP_VER 14 there to be able to compile. Playing with keyb_case.slvs got strange thing while first load of this file - Generate::ALL (for bounding box) took 12285 ms on master while on eigen_over_master Generate::ALL (for bounding box) took 14128 ms. |
#432 was my previous PR, so I went back to using that one |
Sorry for my previous comment - all timings for keyb_case.slvs sample were obtained for debug build. For release build have got 10x speedup: 3727 ms on master vs. 369 ms on eigen_over_master branch. |
And for my keyb_case_MAX_UNKNOWNS_2048.zip the https://github.com/rpavlik/solvespace/tree/eigen_over_master takes about 2.7 seconds (down from 25 in trunk):
However if I do 4096 keyb_case_MAX_UNKNOWNS_4096.zip it is still rather slow - 18 seconds. So @rpavlik we should not remove the "too many unknowns" restriction rpavlik@2847f2c and partly in rpavlik@349de5f and partly rpavlik@349de5f, but increase it to about 2048. Anyway it is a very bad habit doing "copy-paste like crazy" in sketches. |
The removal of too-many-unknowns is associated with the replacement of a fixed-size array with a std::vector and other things of variable size. I suppose we could still add back an artificial limit of unknowns. |
Yes I think an artificial limit is in order. Essentially put back this contition: |
I'm looking at the @Evil-Spirit optimizations other than Eigen #1158 now. They do not help with the test case in this issue, but there are some interesting things there. I'll need some time to "parse" the math and make sure it makes sense for these two 8a7996b 5b014aa. And if nothing else we should consider whether we really want a "suppress DOF calculation" flag what do you guys think? @phkahler @ppd @vespakoen But overall I think that with your two fresh pull requests #1158 #1159 we have a perfect place to discuss test, review, clean up and finally merge this stuff. |
cool, thanks for looking into this. I'd love to get any portion of this merged if it makes it faster (or more maintainable with minimal speed penalty). I have noticed that many of the times that we call into WriteJacobian, we actually end up with at least one dimension 0. It would be great to figure out how to short-circuit as much of the code as possible in those cases, because I'm sure there are allocations we don't need then. We may also benefit from a "smallvec" type - basically, like std::vector<> but defaults to no dynamic allocation, instead using an internal fixed-capacity buffer if possible. (There's a lot of cases where our lists are less than 10 elements, for instance.) If anybody knows of a good one, properly licensed, and header only, I'd love to hear about it. |
@baryluk Since you've confessed to having some experience with numeric algorithms, we have a few pull requests pending that integrate Eigen into the solver. If you've got some time it might be worth having a look and offering any suggestions you might have. In particular #1158, #1159, and #1160. If not, then carry on ;-) |
Since #1158 and #1159 are now merged and the performance is much better I will close this. Just for reference the 18s loading time with the 4096 test file:
Overall the code is now spending 93% of the time in Eigen doing multiplications (dot products) which sounds correct to me :-) |
In addition the new check box in the sketch group properties |
System information
SolveSpace version: master, 9d1601e
Operating system: Debian stable
In a file I'm working on (I'll attach it tomorrow when I'll have more proofs), adding a 'same length' constraint between the top of the 1st square of 1st line and the top of the 1st square of 2nd line is extremely slow (on my low end computer).
Using LD_PRELOAD, I've nailed the problem to the method in the title, and implemented some optimisations.
I don't have enough tests to have reliable numbers, but the number of samples gperftools took moved from 47.3% of 3009 to 36.8% of 827 (those numbers make it obvious to me that I need more numbers).
I am opening this ticket in advance to ask for hints about how to automate the runs, which would provide me more reliable numbers, and to provide a 1st patch of the optimized code I have so far (too lazy to do a usual fork/clone/create branch/commit/merge/push/pull request, for just 10 lines of code):
If there are other things to improve, feel free to say so.
The text was updated successfully, but these errors were encountered: