Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485

mattkimber · 2021-01-02T23:03:28Z

This is a proof of concept for a performance improvement. I'd like to get some feedback on whether I'm on the right track, if there are fundamental flaws making this not a good approach, and if it is viable some ideas to improve the code quality.

Motivation / Problem

I recently played a network game using Timberwolf's Trains, a vehicle set with an unusually complex set of switch chains used to resolve vehicle graphics. Once the three most active players had assembled networks of ~80 trains apiece, the game began to slow down to the point players with slower computers were unable to continue the game.

I noticed this same problem didn't occur in JGRPP, which has some local enhancements to reduce the number of times the graphics chains are called. Further investigation revealed:

UpdateViewport() is called many times per tick for a vehicle, as successive vehicle tick actions cause events which may change the X/Y location or bounding box of the vehicle.
With a complex sprite resolution chain, v->GetImage() becomes slow - this does not seem to be the fault of any one variable or switch that I could identify, merely the sheer number of checks required by the vehicle set.
Most vehicle sets will maintain a consistent x/y location and bounding box for a vehicle facing a given direction.

Initially I intended to port JGR's existing fix, but this contains a lot of elements for features unique to JGR, so ended up taking inspiration from it but taking a slightly different approach.

Description

We can assume for the default vehicles and for most vehicle sets that the bounding box of a sprite will only change by a meaningful amount if the direction of the vehicle itself changes. Therefore, if the direction of the vehicle is the same as the last time we called UpdateViewport(), it is sufficient to reuse the existing sprite sequence and only update the sprite once it is known to appear on a viewport. (To take care of animations, etc.)

While a valid assumption for many vehicle sets, some do alter the bounding box. Timberwolf's Trains is an example - each "vehicle" is a set of three articulated vehicles. Those longer than 8 length units swap between a 1x1 "invisible" sprite and a more normally dimensioned "section" sprite depending on the curvature and z-offset state. If we consider only the bounding box of the sprite when it first entered the direction it is currently travelling, we will always use the 1x1 bounding box and vehicles will appear incorrectly when entering a viewport.

The current approach is to track when a vehicle was contained within a hash for a viewport that was recently shown, and if so revert to recalculating the sprite every frame. In testing this gave acceptable results, but there may still be corner cases where a sprite could be incorrectly not displayed for a single frame. I'm not sure if there's a better way to handle this - whether there's a way to track changes which would result in a sprite change, without a lot of code complexity?

Limitations

I think the approach of casting to a mutable type is messy. I'm very much a novice when it comes to modern C++, any pointers on how to improve this and make it clean would be welcome.
There are possibilities for sprites to be incorrectly not shown (or attempting to be drawn) if a newgrf set changes bounding boxes and/or offsets by a large enough amount, and the viewport is close enough to a hash boundary that the renderer won't set is_viewport_candidate before it needs to be drawn.
Things are a bit "proof of concept", I wanted to get this to PR early to see if this approach has merit rather than spend a lot of time on an idea that's unworkable.

…ng sprites less often

src/vehicle.cpp

Eddi-z · 2021-01-02T23:33:20Z

just throwing this corner case in here which we discussed in the JGR thread:

https://www.tt-forums.net/viewtopic.php?p=1239940#p1239940

mattkimber · 2021-01-02T23:47:09Z

just throwing this corner case in here which we discussed in the JGR thread:

This is resolved for almost all visible situations by the introduction of is_viewport_candidate, but not to the point I can't guarantee that a fast enough vehicle, a set with a large enough difference in bounding boxes or a player who scrolls viewports either at speed or to a place which is right on the boundary of a viewport hash won't notice problems. (The latter is the most likely to be visible as it would last for longer than a single frame if there was a large enough difference between bounding boxes - I haven't got to the point of understanding the viewport hash code well enough to know exactly when this would occur)

I'm not sure if there's an alternate but still-performant way to solve this - the ideal would be work out all the situations in which the x/y origin of the sprite can change and only do GetImage() in those cases but I'm not sure how to approach that.

Eddi-z · 2021-01-02T23:49:32Z

ok, different question: how is this handling vehicles that are displayed in a GUI? (train list, details window, etc.)

LordAro · 2021-01-02T23:53:36Z

I'd be interested in seeing some numbers for this

mattkimber · 2021-01-02T23:56:38Z

ok, different question: how is this handling vehicles that are displayed in a GUI? (train list, details window, etc.)

A good question. As far as I understand the code, vehicles within a GUI call GetImage and related functions separately with the relevant EngineImageType parameter, so my changes only affect those vehicles drawn as part of a map viewport. Since it's rare for players to have a screen size large enough that sprite chains for ~300 vehicles need to be evaluated in each tick (and I don't think the GUI screens have the same need to call update functions multiple times) that's less in need of speeding up.

mattkimber · 2021-01-03T00:00:38Z

I'd be interested in seeing some numbers for this

Running a debug build on an i7-4700HQ, with a test game having 299 trains.

Before: ~230ms spent in train ticks.
After: 14ms spent in train ticks.

This is debug mode so may not be entirely representative, let me know if you'd like me to get some numbers from a release build and across multiple games.

The improvement also reduces as you zoom out and more vehicles are in the viewport, e.g. zoomed out as far as possible at 1920x1080 with the view centred on one of the busier network segments track ticks increased to 60ms.

TrueBrain · 2021-01-03T10:06:59Z

Indeed, using debug build is not representative :) Would you mind benchmarking against a release builds?
From my testing, a release build already halves (if not more) the time it spends on the train tick :)
Additionally, would you mind sharing a savegame that shows off this PR in an extreme case? That always helps for review, as now we have to find a game our self ;) Especially as I myself don't have a 100+ms train tick game on file :)

I did test out this PR on games I do have on file. None of these games hit very high train tick values, but improvements are still noticeable. A random game with a lot of NewGRF trains went from ~7ms to ~5ms. Which is still a very impressive speed-up for such a small (as in, line of code) change :)

JGRennison · 2021-01-03T11:51:12Z

Updating vehicles which are in the same hash buckets but not within the redrawing bounds (by means of is_viewport_candidate) may have a higher false positive rate than would be expected.
The viewport hash is a fixed size hash table, multiple separate/distant locations on the viewport are aliased to the same hash bucket. Updating vehicles which are nowhere near the viewport redrawing area but happen to be in the same bucket introduces some inefficiency. It may well be that this is too small to matter though.

mattkimber · 2021-01-03T13:07:16Z

This is my test game - on an i5-6600 in vanilla 1.10.3 it is just beginning to slow down with train ticks taking 20ms on average. (I don't have statistics for a release build of my fix on this computer yet, but this is the same game which resulted in 230ms in debug mode on the older i7)

such_trains.zip

ETA: I should add it contains GRF files and game scripts, all are using their standard versions available on BaNaNaS.

LordAro · 2021-01-03T13:10:51Z

dbg: [grf] NewGRF 54574607 (timberwolfs_trains\timberwolfs_trains.grf) not found; checksum 839ED7E5F973C3FB219C263063017530

Is not a file that appears to be on BaNaNaS

mattkimber · 2021-01-03T13:26:05Z

dbg: [grf] NewGRF 54574607 (timberwolfs_trains\timberwolfs_trains.grf) not found; checksum 839ED7E5F973C3FB219C263063017530

Is not a file that appears to be on BaNaNaS

Oops.. my fault, my 1.10.3 directory is polluted by test versions of my GRF work. I'll fix and upload a more friendly version.

mattkimber · 2021-01-03T13:30:24Z

Here's the test game without references to unpublished test versions of Timberwolf's Trains.

such_trains_fixed.zip

…cle to a mutable type in render methods

mattkimber · 2021-01-03T13:48:47Z

I still want to look at JGR's suggestion to use something more specific than "the vehicle is in the hash" for viewport recalculation, but this is a good point to check in some performance stats for release build so I know if that makes a further improvement.

On my desktop (an i5-6600), this is the situation with regular 1.10.3 both zoomed in to the area around Parkhouse Limestone Mines and fully zoomed out:

The CPU is running turbo boost to a consistently 3.75GHz at this point.

Here are the same situations from a Release build of my changes:

The CPU boosts slightly lower in this situation, but not enough to be significant; it oscillates between 3.68GHz and 3.74GHz.

Note that unlike standard 1.10.3, there is a notable difference to the performance zoomed in and zoomed out, where more vehicles are considered for sprite updates. Even the bad case is significantly faster though, at ~4ms for train ticks rather than ~20ms.

mattkimber · 2021-01-03T14:11:51Z

JGR's suggestion is correct, there are a lot of false positives generated by checking only the viewport hash. Statistics from the latest update which also checks the vehicle location:

Zoomed in:

Zoomed out:

The most noticeable improvement is in the zoomed-out case, with train ticks now down to ~2ms (not much worse than the ~1.9ms of the new update at regular zoom)

src/vehicle_base.h

glx22 · 2021-01-03T15:43:58Z

I tested an old openttdcoop save (1327 trains) with master

And with this PR

It's an improvement for me. (AMD FX-6100 3.30GHz)

mattkimber · 2021-01-03T16:55:48Z

Attached is an example of an extreme case which does not require downloading as many files to reproduce the situation. In 1.10.3 the train ticks are ~32ms, and worsening as more of the long steel trains exit the depot. With this change, the ticks remain around 1-2ms, even when viewing the busy stations.

extreme trains.zip

…andidate as using only the hash generates false positives

LordAro

I'm happy with it :)

Eddi-z · 2021-01-04T13:12:02Z

Quick thought about the whole CETS-style graphics chain corner case:
can we just update the whole train when one vehicle in the chain needs to be updated?

mattkimber · 2021-01-04T20:57:18Z

Quick thought about the whole CETS-style graphics chain corner case:
can we just update the whole train when one vehicle in the chain needs to be updated?

I did wonder about this, but decided I preferred the approach of checking when sprites became "viewport candidates". The main reason is that curvature may not be the only reason for this. Currently, the known affected sets are CETS, Timberwolf's and a set Michael Blunck is working on, all of which change sprite bounds in response to curvature information. However it is possible that a future set might use a different variable - perhaps an aircraft set might introduce a "sonic boom" based on the current speed, or a vehicle may change its size based on the motion counter.

There may be further possible optimisation where we simply use the geometry of the largest known sprite(s) in a set, and only call GetImage() in the cases it is potentially possible that a vehicle might be shown in a viewport. This would be more of a disruptive change, though - whereas the current approach still offers identical behaviour in the case of sets which do not change their sprite bounds through means other than a change in direction.

Codechange: improve performance for complex vehicle chains by resolvi…

819ef85

…ng sprites less often

michicc reviewed Jan 2, 2021

View reviewed changes

src/vehicle.cpp Outdated Show resolved Hide resolved

Codechange: create MutableSpriteCache to remove the need to cast Vehi…

b4c6b88

…cle to a mutable type in render methods

mattkimber force-pushed the better-viewport-recalc-performance branch 2 times, most recently from 5239c7f to 773b63c Compare January 3, 2021 14:37

michicc reviewed Jan 3, 2021

View reviewed changes

src/vehicle_base.h Outdated Show resolved Hide resolved

mattkimber force-pushed the better-viewport-recalc-performance branch from 773b63c to d373d26 Compare January 3, 2021 14:54

Codechange: consider vehicle co-ordinates when identifying viewport c…

39a98ea

…andidate as using only the hash generates false positives

mattkimber force-pushed the better-viewport-recalc-performance branch from d373d26 to 39a98ea Compare January 3, 2021 19:21

LordAro approved these changes Jan 3, 2021

View reviewed changes

TrueBrain merged commit 5728f9c into OpenTTD:master Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485

Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485

mattkimber commented Jan 2, 2021

Eddi-z commented Jan 2, 2021

mattkimber commented Jan 2, 2021

Eddi-z commented Jan 2, 2021

LordAro commented Jan 2, 2021

mattkimber commented Jan 2, 2021

mattkimber commented Jan 3, 2021 •

edited

TrueBrain commented Jan 3, 2021

JGRennison commented Jan 3, 2021

mattkimber commented Jan 3, 2021 •

edited

LordAro commented Jan 3, 2021 •

edited

mattkimber commented Jan 3, 2021

mattkimber commented Jan 3, 2021

mattkimber commented Jan 3, 2021

mattkimber commented Jan 3, 2021

glx22 commented Jan 3, 2021 •

edited

mattkimber commented Jan 3, 2021 •

edited

LordAro left a comment

Eddi-z commented Jan 4, 2021

mattkimber commented Jan 4, 2021

Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485

Codechange: improve performance for complex vehicle chains by resolving sprites less often #8485

Conversation

mattkimber commented Jan 2, 2021

Motivation / Problem

Description

Limitations

Eddi-z commented Jan 2, 2021

mattkimber commented Jan 2, 2021

Eddi-z commented Jan 2, 2021

LordAro commented Jan 2, 2021

mattkimber commented Jan 2, 2021

mattkimber commented Jan 3, 2021 • edited

TrueBrain commented Jan 3, 2021

JGRennison commented Jan 3, 2021

mattkimber commented Jan 3, 2021 • edited

LordAro commented Jan 3, 2021 • edited

mattkimber commented Jan 3, 2021

mattkimber commented Jan 3, 2021

mattkimber commented Jan 3, 2021

mattkimber commented Jan 3, 2021

glx22 commented Jan 3, 2021 • edited

mattkimber commented Jan 3, 2021 • edited

LordAro left a comment

Choose a reason for hiding this comment

Eddi-z commented Jan 4, 2021

mattkimber commented Jan 4, 2021

mattkimber commented Jan 3, 2021 •

edited

mattkimber commented Jan 3, 2021 •

edited

LordAro commented Jan 3, 2021 •

edited

glx22 commented Jan 3, 2021 •

edited

mattkimber commented Jan 3, 2021 •

edited