Codechange: validate what strings are defined but not used #9518
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation / Problem
I assumed we had strings that were no longer used in our language files, as it is pretty easy to miss this. We do not validate if strings are used.
After trying to write some scripting to figure out if this is the case, I ended up figuring out how poorly we name our strings, and the weird hacks we have to reference a string.
For example, the setting-flag "zero-is-special" uses the next string after the "value" string. In other words, this special string is not referenced directly, but you have to calculate that based on another.
And it is not always in the forward direction. You can say you want to draw a "tiny" variant of a string, which is "- 1" from the original string.
After fiddling a while with a script I wrote, I finally managed to get through all the strings, and detect which are unused.
Description
I had to add markers in
english.txt
for the script to understand how long lists were going to be. I did not validate if the code expects the same list length; I just added them for whatenglish.txt
needed to pass validation.With these
##length
markers, I could also remove a lot of DO NOT CHANGE THIS ORDER stuff, as it is much more obvious now you shouldn't. As the strings make a list.One thing I noticed early on, that it was easy to miss that the list was one bigger than you expected. To validate lists, I check what their common prefix is (the prefix all entries from a list have), and see if the next string doesn't have this. As that is an indication the entry should be part of the list. Additionally, if the prefix collapse to
STR_
, it means the list was too big. Just a nice little safe-guard.But because of the way we name strings, this doesn't always work. For example, we do a lot of
STR_NNN_TRAIN_XXX
,STR_NNN_ROAD_XXX
, and notSTR_NNN_XXX_TRAIN
. This means the common prefix can be too narrow. To indicate you are sure the list terminated at the right place, I added the marker##next-name-looks-similar
. In other instances it was easier to move strings around to avoid this detection triggering. For example, movingSTR_COLOUR_DEFAULT
on top of the list instead of the bottom. Also makes it easier for human to understand it is not part of the list.This leaves us with three exceptions:
##external <count>
tells the script about that.##setting-zero-is-special
exists.STR_NNN_TINY
is followed bySTR_NNN
, the script assumes this is the tiny version, and that it is used. Owh, and we have a variant on this, where the postfix is_SMALL
.. don't ask. Just don't.I hooked this into the CI, to tell us when someone is being lazy and forgetting to remove a string from the language file.
Limitations
Checklist for review
Some things are not automated, and forgotten often. This list is a reminder for the reviewers.