Codechange: validate what strings are defined but not used #9518

TrueBrain · 2021-08-28T15:25:23Z

Motivation / Problem

I assumed we had strings that were no longer used in our language files, as it is pretty easy to miss this. We do not validate if strings are used.

After trying to write some scripting to figure out if this is the case, I ended up figuring out how poorly we name our strings, and the weird hacks we have to reference a string.

For example, the setting-flag "zero-is-special" uses the next string after the "value" string. In other words, this special string is not referenced directly, but you have to calculate that based on another.

And it is not always in the forward direction. You can say you want to draw a "tiny" variant of a string, which is "- 1" from the original string.

After fiddling a while with a script I wrote, I finally managed to get through all the strings, and detect which are unused.

Description

I had to add markers in english.txt for the script to understand how long lists were going to be. I did not validate if the code expects the same list length; I just added them for what english.txt needed to pass validation.

With these ##length markers, I could also remove a lot of DO NOT CHANGE THIS ORDER stuff, as it is much more obvious now you shouldn't. As the strings make a list.

One thing I noticed early on, that it was easy to miss that the list was one bigger than you expected. To validate lists, I check what their common prefix is (the prefix all entries from a list have), and see if the next string doesn't have this. As that is an indication the entry should be part of the list. Additionally, if the prefix collapse to STR_, it means the list was too big. Just a nice little safe-guard.

But because of the way we name strings, this doesn't always work. For example, we do a lot of STR_NNN_TRAIN_XXX, STR_NNN_ROAD_XXX, and not STR_NNN_XXX_TRAIN. This means the common prefix can be too narrow. To indicate you are sure the list terminated at the right place, I added the marker ##next-name-looks-similar. In other instances it was easier to move strings around to avoid this detection triggering. For example, moving STR_COLOUR_DEFAULT on top of the list instead of the bottom. Also makes it easier for human to understand it is not part of the list.

This leaves us with three exceptions:

We have "external" strings, strings that are not used by the source by various of other bits. ##external <count> tells the script about that.
Settings have this lovely: use this value string except if you are zero, flag. This is done with magic. To indicate the script that the string is part of that magic, ##setting-zero-is-special exists.
Similar but different from the previous, you can render strings as "tiny" in the viewport. This does "string - 1" internally. To make it a bit easier on us, if any STR_NNN_TINY is followed by STR_NNN, the script assumes this is the tiny version, and that it is used. Owh, and we have a variant on this, where the postfix is _SMALL .. don't ask. Just don't.

I hooked this into the CI, to tell us when someone is being lazy and forgetting to remove a string from the language file.

Limitations

Checklist for review

Some things are not automated, and forgotten often. This list is a reminder for the reviewers.

The bug fix is important enough to be backported? (label: 'backport requested')
This PR touches english.txt or translations? Check the guidelines
This PR affects the save game format? (label 'savegame upgrade')
This PR affects the GS/AI API? (label 'needs review: Script API')
- ai_changelog.hpp, gs_changelog.hpp need updating.
- The compatibility wrappers (compat_*.nut) need updating.
This PR affects the NewGRF API? (label 'needs review: NewGRF')
- newgrf_debug_data.h may need updating.
- PR must be added to API tracker

.github/unused-strings.py

TrueBrain force-pushed the string_check branch 3 times, most recently from ae2d42d to 2f4811f Compare August 28, 2021 19:50

frosch123 reviewed Aug 31, 2021

View reviewed changes

.github/unused-strings.py Outdated Show resolved Hide resolved

.github/unused-strings.py Show resolved Hide resolved

Codechange: validate what strings are defined but not used

1e8ed7c

TrueBrain force-pushed the string_check branch from 2f4811f to 1e8ed7c Compare September 2, 2021 19:31

frosch123 approved these changes Sep 2, 2021

View reviewed changes

TrueBrain merged commit 9c74dc2 into OpenTTD:master Sep 2, 2021

TrueBrain deleted the string_check branch September 2, 2021 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codechange: validate what strings are defined but not used #9518

Codechange: validate what strings are defined but not used #9518

TrueBrain commented Aug 28, 2021 •

edited

Codechange: validate what strings are defined but not used #9518

Codechange: validate what strings are defined but not used #9518

Conversation

TrueBrain commented Aug 28, 2021 • edited

Motivation / Problem

Description

Limitations

Checklist for review

TrueBrain commented Aug 28, 2021 •

edited