New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485
Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485
Conversation
…ng sprites less often
just throwing this corner case in here which we discussed in the JGR thread: |
This is resolved for almost all visible situations by the introduction of I'm not sure if there's an alternate but still-performant way to solve this - the ideal would be work out all the situations in which the x/y origin of the sprite can change and only do |
ok, different question: how is this handling vehicles that are displayed in a GUI? (train list, details window, etc.) |
I'd be interested in seeing some numbers for this |
A good question. As far as I understand the code, vehicles within a GUI call |
Running a debug build on an i7-4700HQ, with a test game having 299 trains. Before: ~230ms spent in train ticks. This is debug mode so may not be entirely representative, let me know if you'd like me to get some numbers from a release build and across multiple games. The improvement also reduces as you zoom out and more vehicles are in the viewport, e.g. zoomed out as far as possible at 1920x1080 with the view centred on one of the busier network segments track ticks increased to 60ms. |
Indeed, using debug build is not representative :) Would you mind benchmarking against a release builds? I did test out this PR on games I do have on file. None of these games hit very high train tick values, but improvements are still noticeable. A random game with a lot of NewGRF trains went from ~7ms to ~5ms. Which is still a very impressive speed-up for such a small (as in, line of code) change :) |
Updating vehicles which are in the same hash buckets but not within the redrawing bounds (by means of is_viewport_candidate) may have a higher false positive rate than would be expected. |
This is my test game - on an i5-6600 in vanilla 1.10.3 it is just beginning to slow down with train ticks taking 20ms on average. (I don't have statistics for a release build of my fix on this computer yet, but this is the same game which resulted in 230ms in debug mode on the older i7) ETA: I should add it contains GRF files and game scripts, all are using their standard versions available on BaNaNaS. |
Is not a file that appears to be on BaNaNaS |
Oops.. my fault, my 1.10.3 directory is polluted by test versions of my GRF work. I'll fix and upload a more friendly version. |
Here's the test game without references to unpublished test versions of Timberwolf's Trains. |
…cle to a mutable type in render methods
I still want to look at JGR's suggestion to use something more specific than "the vehicle is in the hash" for viewport recalculation, but this is a good point to check in some performance stats for release build so I know if that makes a further improvement. On my desktop (an i5-6600), this is the situation with regular 1.10.3 both zoomed in to the area around Parkhouse Limestone Mines and fully zoomed out: The CPU is running turbo boost to a consistently 3.75GHz at this point. Here are the same situations from a Release build of my changes: The CPU boosts slightly lower in this situation, but not enough to be significant; it oscillates between 3.68GHz and 3.74GHz. Note that unlike standard 1.10.3, there is a notable difference to the performance zoomed in and zoomed out, where more vehicles are considered for sprite updates. Even the bad case is significantly faster though, at ~4ms for train ticks rather than ~20ms. |
JGR's suggestion is correct, there are a lot of false positives generated by checking only the viewport hash. Statistics from the latest update which also checks the vehicle location: Zoomed in: Zoomed out: The most noticeable improvement is in the zoomed-out case, with train ticks now down to ~2ms (not much worse than the ~1.9ms of the new update at regular zoom) |
5239c7f
to
773b63c
Compare
773b63c
to
d373d26
Compare
I tested an old openttdcoop save (1327 trains) with master It's an improvement for me. (AMD FX-6100 3.30GHz) |
Attached is an example of an extreme case which does not require downloading as many files to reproduce the situation. In 1.10.3 the train ticks are ~32ms, and worsening as more of the long steel trains exit the depot. With this change, the ticks remain around 1-2ms, even when viewing the busy stations. |
…andidate as using only the hash generates false positives
d373d26
to
39a98ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with it :)
Quick thought about the whole CETS-style graphics chain corner case: |
I did wonder about this, but decided I preferred the approach of checking when sprites became "viewport candidates". The main reason is that curvature may not be the only reason for this. Currently, the known affected sets are CETS, Timberwolf's and a set Michael Blunck is working on, all of which change sprite bounds in response to curvature information. However it is possible that a future set might use a different variable - perhaps an aircraft set might introduce a "sonic boom" based on the current speed, or a vehicle may change its size based on the motion counter. There may be further possible optimisation where we simply use the geometry of the largest known sprite(s) in a set, and only call |
This is a proof of concept for a performance improvement. I'd like to get some feedback on whether I'm on the right track, if there are fundamental flaws making this not a good approach, and if it is viable some ideas to improve the code quality.
Motivation / Problem
I recently played a network game using Timberwolf's Trains, a vehicle set with an unusually complex set of switch chains used to resolve vehicle graphics. Once the three most active players had assembled networks of ~80 trains apiece, the game began to slow down to the point players with slower computers were unable to continue the game.
I noticed this same problem didn't occur in JGRPP, which has some local enhancements to reduce the number of times the graphics chains are called. Further investigation revealed:
UpdateViewport()
is called many times per tick for a vehicle, as successive vehicle tick actions cause events which may change the X/Y location or bounding box of the vehicle.v->GetImage()
becomes slow - this does not seem to be the fault of any one variable or switch that I could identify, merely the sheer number of checks required by the vehicle set.Initially I intended to port JGR's existing fix, but this contains a lot of elements for features unique to JGR, so ended up taking inspiration from it but taking a slightly different approach.
Description
We can assume for the default vehicles and for most vehicle sets that the bounding box of a sprite will only change by a meaningful amount if the direction of the vehicle itself changes. Therefore, if the direction of the vehicle is the same as the last time we called
UpdateViewport()
, it is sufficient to reuse the existing sprite sequence and only update the sprite once it is known to appear on a viewport. (To take care of animations, etc.)While a valid assumption for many vehicle sets, some do alter the bounding box. Timberwolf's Trains is an example - each "vehicle" is a set of three articulated vehicles. Those longer than 8 length units swap between a 1x1 "invisible" sprite and a more normally dimensioned "section" sprite depending on the curvature and z-offset state. If we consider only the bounding box of the sprite when it first entered the direction it is currently travelling, we will always use the 1x1 bounding box and vehicles will appear incorrectly when entering a viewport.
The current approach is to track when a vehicle was contained within a hash for a viewport that was recently shown, and if so revert to recalculating the sprite every frame. In testing this gave acceptable results, but there may still be corner cases where a sprite could be incorrectly not displayed for a single frame. I'm not sure if there's a better way to handle this - whether there's a way to track changes which would result in a sprite change, without a lot of code complexity?
Limitations
is_viewport_candidate
before it needs to be drawn.