-
-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve rendering of large viewport areas #7962
Conversation
908e72d
to
7d67d62
Compare
7d67d62
to
7d7c0ce
Compare
Apparently iterating 10 times more elements with foward_list is still faster than bothering maps. So it's about 20x faster than original now. |
7d7c0ce
to
406958d
Compare
I tried this with one case that broke in the previous attempt at improving the sprite sorter, and did not get any flickering. So far it seems good. |
The ordering value seems to be inconsistently typed between int32 and uint32. |
@ldpl You mentioned on IRC that you'd found some other optimisations as well. Have you made any further progress with that, or would you consider it a separate change? |
@LordAro well, I have a faster sorter, I just don't like it xD So I'm still looking for better solutions. |
Have you done any measurements within OpenTTD itself, or just within the test harness in the zip? I did some simple measurements by means of adding a line to the FPS window for the sprite sorter and got worse performance with this PR than the original algorithm (3x worse in the most demanding case). |
If this does perform better on very large sprite counts, but worse on small (normal) counts, would it be worth to have both the original and this implementation, and select which to use based on number of sprites to sort? |
While we indeed can use some other sprite sorter for smaller cases question is whether it's really an issue. As sprite sorter doesn't always need to be fast, even if it sorts sprites in 3 microseconds instead of 1 it doesn't matter much as long it's fast enough to compose the frame. In fact it's usually called with so few sprites to sort that it raises a question of measuring that performance as, for example, PerformanceTimer for such small times may as well measure random noise (especially on linux). And, btw, that test case I'm using is dumped from some average reddit game. But ofc, it's for extreme case of map unzoom on 4k where you don't need any measurements to spot a difference between 1s and 100ms lag xD |
The number of sprites which will need to be sorted at a time is effectively bounded by the redraw region chunking in ViewportDrawChk. Not having to sort too many sprites at a time is one of the main reasons why the chunking is done that way. |
@JGRennison there is very much need, see #6566, it takes several seconds just to sort sprites after zooming out on 4k. |
406958d
to
71c3da2
Compare
Even though #7968 significantly reduced the need for a faster sprite sorter if we have one we can get rid of viewport subdivision completely and further improve fullscreen rendering time on 4k by about 50%. And while that subdivision was initially added for a different reason it's no longer relevant:
And for anyone who wants to mess with sprite sorter in future, I leave this as a source of inspiration, assistance, and a warning xD
|
71c3da2
to
4249529
Compare
I like patches like this, but they are always a bit tricky. Like you say it is bug-compatible with the current, but knowing that for sure when looking at the code ... will take a bit of time to figure out :D Having them both side-by-side with an Additionally, you say 20 times faster, which means absolutely 0 to me. A function that only took 1ms in a 1000ms run, making 20 times faster is not impressive at all ;) So I miss a broader context here. How fast (or slow, pick your side) is the current sorter (in ms), and how much % of total calculation is this? And that for different games, screensizes, systems, OSes, etc etc :) What I am aiming at .. rewriting code that has been stable for years now, always have to be approached with caution. And if your code is only a 0.1% FPS increase in 99% of the cases, it might not be worth it. But I know you a bit longer, so I suspect the gain from this PR is a lot bigger .. but I fail to read that here :D This lack of facts has been hinted in earlier comments, but I do not see a follow-up :) (it might have been on IRC, but ... it has been a year since :P Sorry if it has!) Would you be up to provide some more numbers on this? I know that is not an easy task, but it would help out if it is worth spending the time reviewing this PR at this moment in time :) (As you know, we are a bit strained in review capacity atm) Again, to be clear, I love patches like this :) But I have to be a bit cautious here :D Tnx! |
@TrueBrain well, tbh I'm kind of done with this thing and don't want spend any more time on it. Also I probably wouldn't have done it at all had I known about #7968 beforehand. But still, there are some numbers and testing code one post above yours ;) And it does check that sprite order is the same as in the original sorter (you can even uncomment random_shuffle there, just ignore run times in that case as these sorters rely on initial array being somewhat sorted for performance). Admittedly it only uses one test case but it's some work to get more as they need to be saved from the real game and I don't think they'll make any difference anyway. P.S. old vs new:
|
Does this mean we can better close this PR? Or am I misreading it? (again, I don't mind these PRs, just trying to get up-to-date on current state of some PRs, like this one).
There are numbers, that is for sure. But that was not what I was asking for ;) I am missing context of those numbers. They don't even tell me what they mean :P Jiffies? Iterations? But mostly, I miss how this fits in OpenTTD as a whole. Is this only 0.00001% of the time spent on drawing? Or is this 90%? As that matters .. are we optimizing something very small, or something very prominent. That is what I am after :)
Sweet! And sorry, I did not check out the zips, as before I delve into such PRs, I like to know if it is worth considering to start with :D Matter of prioritization ;)
Still means little to me :) 1046 oranges? :D Guessing by the numbers it is either iterations or jiffies or something in between, and clearly it is faster. But making 0.1% of your application 100x faster still is only a ~0.1% increase in speed :D PS: I know, this is a 10+ month old PR, and I know you did your work here (I know your work long enough to know this :D), so I am not debating any of that :) But I am trying to judge if this is worth our time here and now :) Tnx for the the fast reply, much appreciated :) |
Well, I understand your concerns but I guess you'll have to either close it or find someone else to do the testing as I'm sick tired of these sorters so I'm ok fixing small stuff if needed but not doing anything substatial in any foreseeable future. Also I did enough testing to convince myself to include it into cmclient so whether it makes into vanilla or not doesn't matter much to me as I'm not even gonna use it anyway.
Some kind of processor cycles as I understand them, you can check cycle.h for more details. I just googled some code that seemed the most appropriate for precise performance measurement. But I only used it fore relative comparison, in absolute values without #7968 it was already pretty clear with frames taking seconds to render solely because of the sprite sorter. |
This was not easy to balance (cost vs reward), for various of reasons. Read-back can be found here: https://webster.openttdcoop.org/index.php?channel=openttd&date=1608249600#1608318064 @ldpl : can you rebase this please? We will merge after :) Cheers! |
4249529
to
9657eee
Compare
9657eee
to
b95e985
Compare
b95e985
to
cbcbac7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a few spelling typos and a missing space; nothing code related :)
My take on sprite sorter optimization, I estimate it to be about 10x faster than the current one. And while it's still not ideal it's decently fast and I'm tired of this shit anyway :P
It's made to perfectly replicate the current sorting algorithm (including weird reversing of moved sprites) so there shouldn't be any (new) glitches.
Also, there is definitely some room for further optimizations. At the very least map of sets can probably be replaced with some more specialized structure. For example, kd tree should be faster but its current implementation of node removal is very inefficient and needs to be optimized. It's also possible that deviating from the original algorithm can be beneficial to performance (std::sort does it instantly but I couldn't find a good ordering relation).
If someone is interested here is my testing code and over a dozen sorter implementations:
sprite-sorter.zip