Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations in string handling #9405

Closed
wants to merge 1 commit into from

Conversation

fsimonis
Copy link
Contributor

@fsimonis fsimonis commented Jun 29, 2021

Motivation / Problem

I often use OpenTTD to get accustomed with new tools. This time I used heaptrack to analyse and locate allocations in the code.

There are some places in string handling that reallocate temporary char buffers per call. As OpenTTD has to format quite a few strings per frame, this accumulates quickly.

Description

edit: skip this part

Proxy functions often follow this pattern:

  1. Allocate char buffer
  2. Populate char buffer with a string
  3. Process the string
    I marked these buffers as static to prevent unnecessary allocations. The content will simply be overwritten by the population function.

Next is the FormatString function. Currently, this function uses a stack to keep track of the remaining string.
This is not necessary as this stack can never reach 2 values. (Correct me if I am wrong.) Hence using a pointer suffices and prevents allocations altogether.

I used the first 30s of this savegame to analyse the differences. I am using the sdl(_opengl) drivers.
The overall temporary allocations are down by ~90% and FormatString now doesn't allocate at all.
ViewportDrawStrings + Callers is now responsible for 26% of total temporary allocation (48k), which was 67% (413k).

Temp allocation hotspots after this PR are:

  • Layouter::GetCachedParagraphLayout ~ 25%
  • ViewportSortParentSpritesSSE41~ 34%
    Main allocation hotspot is ViewportSortParentSpritesSSE41 with ~85% of all allocations.

I am not sure how much this translates to FPS improvement. The 512 samples are not capable of smoothening out the frame time jitter on my dev system. Maybe a reviewer can give this a try.

Limitations

none

Checklist for review

Some things are not automated, and forgotten often. This list is a reminder for the reviewers.

  • The bug fix is important enough to be backported? (label: 'backport requested')
  • This PR affects the save game format? (label 'savegame upgrade')
  • This PR affects the GS/AI API? (label 'needs review: Script API')
    • ai_changelog.hpp, gs_changelog.hpp need updating.
    • The compatibility wrappers (compat_*.nut) need updating.
  • This PR affects the NewGRF API? (label 'needs review: NewGRF')

@LordAro
Copy link
Member

LordAro commented Jun 29, 2021

Intriguing! My immediate thought - will making those string buffers static break thread safety?

@TrueBrain
Copy link
Member

Hihi, using OpenTTD as a benchmark application .. things I never expected people to do :D Nice!

Question for you: I do not fully understand why that tool considers a stack variable "an allocation"; as that shouldn't be an allocation at all. At least, from what I see in the assembly code, these buffers don't end up on the heap, but stay on the stack. So the stack is just increased with, in this case, 2048, and the buffer sits there on the stack. That is not an "allocation" in the classical sense, as it doesn't happen on the heap.

So I wonder if that tool sees something I am not seeing (very much possible :D), or if some compilers with some optimization settings move the buffer to the heap or something? Do you have more detail about what is going on exactly here? I am really curious if it is really on the heap what is going on, as that is not something I would expect to happen here :D

Any insights would be appreciated :)

PS: btw, not saying it is such a great idea to have such large variables on the stack, but that is not the subject of this PR :D

@fsimonis
Copy link
Contributor Author

fsimonis commented Jun 29, 2021

Intriguing! My immediate thought - will making those string buffers static break thread safety?

Are there multiple drawing threads? If that is the case, then yes.

Question for you: I do not fully understand why that tool considers a stack variable "an allocation"; as that shouldn't be an allocation at all. At least, from what I see in the assembly code, these buffers don't end up on the heap, but stay on the stack. So the stack is just increased with, in this case, 2048, and the buffer sits there on the stack. That is not an "allocation" in the classical sense, as it doesn't happen on the heap.

This made me scratch my head at first, but could remember some similar situation I once ran into.
edit: This is in fact another issue.

These buffers are in fact allocated on the heap because of this:

static const int DRAW_STRING_BUFFER = 2048;

DRAW_STRING_BUFFER is in fact not a compile time constant. There is a gcc extension that allows to use this syntax also with runtime values which allocates the buffer on the heap instead.
So it would actually be very interresting to turn the constant into a compile-time constant and see how this perfoms:

-static const int DRAW_STRING_BUFFER = 2048;
+constexpr int DRAW_STRING_BUFFER = 2048;

One can disallow compiler extensions in CMake using CXX_EXTENSIONS.

@vituscze
Copy link
Contributor

vituscze commented Jun 30, 2021

DRAW_STRING_BUFFER is a constant-expression as far as C++ is concerned. In fact, you can even do:

constexpr int x = DRAW_STRING_BUFFER;

Using static const int as array size produces the usual stack "allocation" in both gcc and clang: subtracting and then adding a fixed number to the stack pointer.

@fsimonis
Copy link
Contributor Author

@vituscze Hi, I did some more digging regarding this and in fact you are right. Thanks for pointing that out!

I also checked the output from heaptrack again and the buffers are in fact not allocated on the heap. I simply misread the information presented by it. Part of getting used to a new tool, I guess.

@LordAro I removed the commit making the buffers static, so no worries about thread safety there.

The allocation reduction stays identical though, so the rest of the claims remain valid.

@ldpl
Copy link
Contributor

ldpl commented Jun 30, 2021

Next is the FormatString function. Currently, this function uses a stack to keep track of the remaining string.
This is not necessary as this stack can never reach 2 values. (Correct me if I am wrong.)

Doesn't look that way to me. Have you tested with something that actually uses SCC_NEWGRF_STRINL or SCC_NEWGRF_PRINT_WORD_STRING_ID? No idea what uses it though but to me it looks like it should break with this patch as it will throw away any leftover symbols after encountering those string codes.

What can probably be done though is postponing stack/vector allocation until it's actually required like how I did it in #7922

@fsimonis
Copy link
Contributor Author

fsimonis commented Jul 1, 2021

@ldpl Hmm, right.

One could also get rid of the stack by rewriting the function so that it operates on a single formatting string and recursively calls itself for these formatting codes. This makes the code cleaner, reduces the allocations and remains backwards compatible.
I'll look into that.

@fsimonis
Copy link
Contributor Author

fsimonis commented Jul 5, 2021

Will close this PR. The required refactoring won't have a lot to do with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants