Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485

Merged

Conversation

mattkimber
Copy link
Contributor

This is a proof of concept for a performance improvement. I'd like to get some feedback on whether I'm on the right track, if there are fundamental flaws making this not a good approach, and if it is viable some ideas to improve the code quality.

Motivation / Problem

I recently played a network game using Timberwolf's Trains, a vehicle set with an unusually complex set of switch chains used to resolve vehicle graphics. Once the three most active players had assembled networks of ~80 trains apiece, the game began to slow down to the point players with slower computers were unable to continue the game.

I noticed this same problem didn't occur in JGRPP, which has some local enhancements to reduce the number of times the graphics chains are called. Further investigation revealed:

  • UpdateViewport() is called many times per tick for a vehicle, as successive vehicle tick actions cause events which may change the X/Y location or bounding box of the vehicle.
  • With a complex sprite resolution chain, v->GetImage() becomes slow - this does not seem to be the fault of any one variable or switch that I could identify, merely the sheer number of checks required by the vehicle set.
  • Most vehicle sets will maintain a consistent x/y location and bounding box for a vehicle facing a given direction.

Initially I intended to port JGR's existing fix, but this contains a lot of elements for features unique to JGR, so ended up taking inspiration from it but taking a slightly different approach.

Description

We can assume for the default vehicles and for most vehicle sets that the bounding box of a sprite will only change by a meaningful amount if the direction of the vehicle itself changes. Therefore, if the direction of the vehicle is the same as the last time we called UpdateViewport(), it is sufficient to reuse the existing sprite sequence and only update the sprite once it is known to appear on a viewport. (To take care of animations, etc.)

While a valid assumption for many vehicle sets, some do alter the bounding box. Timberwolf's Trains is an example - each "vehicle" is a set of three articulated vehicles. Those longer than 8 length units swap between a 1x1 "invisible" sprite and a more normally dimensioned "section" sprite depending on the curvature and z-offset state. If we consider only the bounding box of the sprite when it first entered the direction it is currently travelling, we will always use the 1x1 bounding box and vehicles will appear incorrectly when entering a viewport.

The current approach is to track when a vehicle was contained within a hash for a viewport that was recently shown, and if so revert to recalculating the sprite every frame. In testing this gave acceptable results, but there may still be corner cases where a sprite could be incorrectly not displayed for a single frame. I'm not sure if there's a better way to handle this - whether there's a way to track changes which would result in a sprite change, without a lot of code complexity?

Limitations

  • I think the approach of casting to a mutable type is messy. I'm very much a novice when it comes to modern C++, any pointers on how to improve this and make it clean would be welcome.
  • There are possibilities for sprites to be incorrectly not shown (or attempting to be drawn) if a newgrf set changes bounding boxes and/or offsets by a large enough amount, and the viewport is close enough to a hash boundary that the renderer won't set is_viewport_candidate before it needs to be drawn.
  • Things are a bit "proof of concept", I wanted to get this to PR early to see if this approach has merit rather than spend a lot of time on an idea that's unworkable.

src/vehicle.cpp Outdated Show resolved Hide resolved
@Eddi-z
Copy link
Contributor

Eddi-z commented Jan 2, 2021

just throwing this corner case in here which we discussed in the JGR thread:

https://www.tt-forums.net/viewtopic.php?p=1239940#p1239940

@mattkimber
Copy link
Contributor Author

just throwing this corner case in here which we discussed in the JGR thread:

This is resolved for almost all visible situations by the introduction of is_viewport_candidate, but not to the point I can't guarantee that a fast enough vehicle, a set with a large enough difference in bounding boxes or a player who scrolls viewports either at speed or to a place which is right on the boundary of a viewport hash won't notice problems. (The latter is the most likely to be visible as it would last for longer than a single frame if there was a large enough difference between bounding boxes - I haven't got to the point of understanding the viewport hash code well enough to know exactly when this would occur)

I'm not sure if there's an alternate but still-performant way to solve this - the ideal would be work out all the situations in which the x/y origin of the sprite can change and only do GetImage() in those cases but I'm not sure how to approach that.

@Eddi-z
Copy link
Contributor

Eddi-z commented Jan 2, 2021

ok, different question: how is this handling vehicles that are displayed in a GUI? (train list, details window, etc.)

@LordAro
Copy link
Member

LordAro commented Jan 2, 2021

I'd be interested in seeing some numbers for this

@mattkimber
Copy link
Contributor Author

ok, different question: how is this handling vehicles that are displayed in a GUI? (train list, details window, etc.)

A good question. As far as I understand the code, vehicles within a GUI call GetImage and related functions separately with the relevant EngineImageType parameter, so my changes only affect those vehicles drawn as part of a map viewport. Since it's rare for players to have a screen size large enough that sprite chains for ~300 vehicles need to be evaluated in each tick (and I don't think the GUI screens have the same need to call update functions multiple times) that's less in need of speeding up.

@mattkimber
Copy link
Contributor Author

mattkimber commented Jan 3, 2021

I'd be interested in seeing some numbers for this

Running a debug build on an i7-4700HQ, with a test game having 299 trains.

Before: ~230ms spent in train ticks.
After: 14ms spent in train ticks.

This is debug mode so may not be entirely representative, let me know if you'd like me to get some numbers from a release build and across multiple games.

The improvement also reduces as you zoom out and more vehicles are in the viewport, e.g. zoomed out as far as possible at 1920x1080 with the view centred on one of the busier network segments track ticks increased to 60ms.

@TrueBrain
Copy link
Member

Indeed, using debug build is not representative :) Would you mind benchmarking against a release builds?
From my testing, a release build already halves (if not more) the time it spends on the train tick :)
Additionally, would you mind sharing a savegame that shows off this PR in an extreme case? That always helps for review, as now we have to find a game our self ;) Especially as I myself don't have a 100+ms train tick game on file :)

I did test out this PR on games I do have on file. None of these games hit very high train tick values, but improvements are still noticeable. A random game with a lot of NewGRF trains went from ~7ms to ~5ms. Which is still a very impressive speed-up for such a small (as in, line of code) change :)

@JGRennison
Copy link
Contributor

Updating vehicles which are in the same hash buckets but not within the redrawing bounds (by means of is_viewport_candidate) may have a higher false positive rate than would be expected.
The viewport hash is a fixed size hash table, multiple separate/distant locations on the viewport are aliased to the same hash bucket. Updating vehicles which are nowhere near the viewport redrawing area but happen to be in the same bucket introduces some inefficiency. It may well be that this is too small to matter though.

@mattkimber
Copy link
Contributor Author

mattkimber commented Jan 3, 2021

This is my test game - on an i5-6600 in vanilla 1.10.3 it is just beginning to slow down with train ticks taking 20ms on average. (I don't have statistics for a release build of my fix on this computer yet, but this is the same game which resulted in 230ms in debug mode on the older i7)

such_trains.zip

ETA: I should add it contains GRF files and game scripts, all are using their standard versions available on BaNaNaS.

@LordAro
Copy link
Member

LordAro commented Jan 3, 2021

dbg: [grf] NewGRF 54574607 (timberwolfs_trains\timberwolfs_trains.grf) not found; checksum 839ED7E5F973C3FB219C263063017530

Is not a file that appears to be on BaNaNaS

@mattkimber
Copy link
Contributor Author

dbg: [grf] NewGRF 54574607 (timberwolfs_trains\timberwolfs_trains.grf) not found; checksum 839ED7E5F973C3FB219C263063017530

Is not a file that appears to be on BaNaNaS

Oops.. my fault, my 1.10.3 directory is polluted by test versions of my GRF work. I'll fix and upload a more friendly version.

@mattkimber
Copy link
Contributor Author

Here's the test game without references to unpublished test versions of Timberwolf's Trains.

such_trains_fixed.zip

@mattkimber
Copy link
Contributor Author

I still want to look at JGR's suggestion to use something more specific than "the vehicle is in the hash" for viewport recalculation, but this is a good point to check in some performance stats for release build so I know if that makes a further improvement.

On my desktop (an i5-6600), this is the situation with regular 1.10.3 both zoomed in to the area around Parkhouse Limestone Mines and fully zoomed out:

parkhouse_mines_1_10_3

parkhouse_fullzoom_1_10_3

The CPU is running turbo boost to a consistently 3.75GHz at this point.

Here are the same situations from a Release build of my changes:

parkhouse_mines_spritecache

parkhouse_fullzoom_spritecache

The CPU boosts slightly lower in this situation, but not enough to be significant; it oscillates between 3.68GHz and 3.74GHz.

Note that unlike standard 1.10.3, there is a notable difference to the performance zoomed in and zoomed out, where more vehicles are considered for sprite updates. Even the bad case is significantly faster though, at ~4ms for train ticks rather than ~20ms.

@mattkimber
Copy link
Contributor Author

JGR's suggestion is correct, there are a lot of false positives generated by checking only the viewport hash. Statistics from the latest update which also checks the vehicle location:

Zoomed in:

parkhouse_mines_bounds_update

Zoomed out:

parkhouse_fullzoom_bounds_update

The most noticeable improvement is in the zoomed-out case, with train ticks now down to ~2ms (not much worse than the ~1.9ms of the new update at regular zoom)

@mattkimber mattkimber force-pushed the better-viewport-recalc-performance branch 2 times, most recently from 5239c7f to 773b63c Compare January 3, 2021 14:37
src/vehicle_base.h Outdated Show resolved Hide resolved
@mattkimber mattkimber force-pushed the better-viewport-recalc-performance branch from 773b63c to d373d26 Compare January 3, 2021 14:54
@glx22
Copy link
Contributor

glx22 commented Jan 3, 2021

I tested an old openttdcoop save (1327 trains) with master
Bloggs   Co , 10th May 2273

And with this PR
Bloggs   Co , 10th May 2273#1

It's an improvement for me. (AMD FX-6100 3.30GHz)

@mattkimber
Copy link
Contributor Author

mattkimber commented Jan 3, 2021

Attached is an example of an extreme case which does not require downloading as many files to reproduce the situation. In 1.10.3 the train ticks are ~32ms, and worsening as more of the long steel trains exit the depot. With this change, the ticks remain around 1-2ms, even when viewing the busy stations.

extreme trains.zip

…andidate as using only the hash generates false positives
@mattkimber mattkimber force-pushed the better-viewport-recalc-performance branch from d373d26 to 39a98ea Compare January 3, 2021 19:21
Copy link
Member

@LordAro LordAro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with it :)

@Eddi-z
Copy link
Contributor

Eddi-z commented Jan 4, 2021

Quick thought about the whole CETS-style graphics chain corner case:
can we just update the whole train when one vehicle in the chain needs to be updated?

@mattkimber
Copy link
Contributor Author

Quick thought about the whole CETS-style graphics chain corner case:
can we just update the whole train when one vehicle in the chain needs to be updated?

I did wonder about this, but decided I preferred the approach of checking when sprites became "viewport candidates". The main reason is that curvature may not be the only reason for this. Currently, the known affected sets are CETS, Timberwolf's and a set Michael Blunck is working on, all of which change sprite bounds in response to curvature information. However it is possible that a future set might use a different variable - perhaps an aircraft set might introduce a "sonic boom" based on the current speed, or a vehicle may change its size based on the motion counter.

There may be further possible optimisation where we simply use the geometry of the largest known sprite(s) in a set, and only call GetImage() in the cases it is potentially possible that a vehicle might be shown in a viewport. This would be more of a disruptive change, though - whereas the current approach still offers identical behaviour in the case of sets which do not change their sprite bounds through means other than a change in direction.

@TrueBrain TrueBrain merged commit 5728f9c into OpenTTD:master Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants