Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: zstandard compression for network games #8845

Closed
wants to merge 3 commits into from

Conversation

TrueBrain
Copy link
Member

@TrueBrain TrueBrain commented Mar 11, 2021

Closes #8805.
Closes #8773.

Motivation / Problem

#8773 introduces zstandard, with good argumentation why it should be default for network games. It does, however, only introduce zstandard. It does not enable it by default, nor for network games.
#8805 introduces a handshake between server and client to find the common savegame formats. Alternatively, JGRPP introduces similar functionality in a different way: JGRennison/OpenTTD-patches@c2ae758

Both have their merits, but both also have their issues. #8805 is on one side of the spectrum, making it fully configurable and tunable. JGRPP on the other hand, minimizes it and makes zstd a special case. I did not like either approach (but highly appreciate both views on the matter): to me, it is more than fine to have a clear opinion on the best settings, but I also like to think a bit about the future, in case we ever add more compression libraries; it is likely that in the next few years even better and faster compression algorithms will be found, looking at amount of new methods I already see around us.

Finally, I had the realisation that it is not network games or zstd that makes this special. But it is the fact that sometimes we want to prioritize on savegame filesize, and sometimes on savegame creation time. Later I adjusted this (tnx the conversation below), that it is more about temporary vs persistent storage: one can be a very new compression method, but the persistent storage should be able to survive for years to come.

With this in hand, I present you this alternative. It combines all the above work in a single Pull Request:

  • It introduces zstandard as possible compression
  • It allows any save function to select either a compressor that produces savegames that persist for a long time and work most likely on other devices, or savegames that are temporary, like network savegames. The latter can use newer compression methods, where the first needs to be a bit reluctant to accept new ones. Currently, temporary savegames can use zstd, where persistent cannot. This is heavily opinionated, of course.
  • It handshakes a bitmask between client and server so the server knows what compression algorithms the client supports; and uses the best one for savegames.

Description

After inspiration on the work in JGRPP, I did some rewriting. Instead of passing threaded everywhere, and with this PR that would also include temporary / persistent and bitmask, they are now stored in _sl instead. Additionally, JGRPP uses flags, which is a good idea. But I took a slightly different implementation choice here, and basically make them bitwise switches: either SGF_TEMPORARY or SGF_PERSISTENT and either SGF_NO_THREAD or SGF_THREADED. This makes calling save functions a lot easier and more descriptive, instead of a list of true, false, etc.

I integrated several fixes JGRPP already made for the zstandard implementation from #8773, fixes some coding style, made compression_level signed, and other minor issues, as while I was reading the code why not integrate the review. So it is "mostly" #8773, with some changes.

Limitations

  • If you set savegame_format in your configuration, it overwrites auto-detection completely. So if a server would be to do that, the handshake is fully ignored and the savegame is made in what-ever format is described. Feature: Allow client and server to negotiate on compression to use for the savegame #8805 in contrast allows multiple formats to be set, which are all tried for network games. I think this is unneeded and only complicates the code. If a server admin really does not want to support a compression or experiment with different compression levels, he might as well recompile OpenTTD with these changes. Personally, I am not a big fan of these endless amount of settings and configuration options, and I think it is fine we have an opinion about what is best. However, if there is truly a demand for this, an additional PR can be created to add support for this.

Checklist for review

Some things are not automated, and forgotten often. This list is a reminder for the reviewers.

  • The bug fix is important enough to be backported? (label: 'backport requested')
  • This PR affects the save game format? (label 'savegame upgrade')
  • This PR affects the GS/AI API? (label 'needs review: Script API')
    • ai_changelog.hpp, gs_changelog.hpp need updating.
    • The compatibility wrappers (compat_*.nut) need updating.
  • This PR affects the NewGRF API? (label 'needs review: NewGRF')

@TrueBrain TrueBrain force-pushed the zstd-compression branch 2 times, most recently from 32a4c85 to 2441a28 Compare March 11, 2021 14:19
@TrueBrain TrueBrain changed the title Zstd compression Feature: zstandard compression for network games Mar 11, 2021
@JGRennison
Copy link
Contributor

I've not combed through this yet, but I've got no objections to what I see so far. It seems well thought out.

However, as well speed and size, there is a third axis to think about which is compatibility of persistent save files.
Users do things like manually send savegame files to each other, upload scenarios to bananas, or share savegame files between two machines with network shares, etc.
Part of the rational for the slightly odd design I went for is that I assumed that zstd is not universally available but lzma is.
If a user manually presses the save button, they implicitly expect that the resulting file is portable. Not meeting this would probably create user support issues.
If in the future a further compression algorithm was added which outperformed lzma in the "small" mode, this would also need to be taken into account.

@TrueBrain
Copy link
Member Author

TrueBrain commented Mar 11, 2021

I've not combed through this yet, but I've got no objections to what I see so far. It seems well thought out.

However, as well speed and size, there is a third axis to think about which is compatibility of persistent save files.
Users do things like manually send savegame files to each other, upload scenarios to bananas, or share savegame files between two machines with network shares, etc.
Part of the rational for the slightly odd design I went for is that I assumed that zstd is not universally available but lzma is.
If a user manually presses the save button, they implicitly expect that the resulting file is portable. Not meeting this would probably create user support issues.
If in the future a further compression algorithm was added which outperformed lzma in the "small" mode, this would also need to be taken into account.

A very solid point. And looking back, I think indeed that SGF_SMALLEST is misleading. As the intention indeed is to have a format that survives for years to come and works on as many clients as possible. So possibly it should be named SGF_PERSISTENT or something similar. That might avoid future-us making the mistake of deprecating LZMA without thinking about this pretty important fact :D So SGF_VOLATILE vs SGF_PERSISTENT or something. And that can lead to selecting other compressions.
Or maybe add this in addition to fastest/smallest. I need to give this some more brain-cycles :D

@TrueBrain
Copy link
Member Author

After some brain-cycles, I indeed ended up changing "smallest" and "fastest" for "persistent" and "temporary", as that is much more what I wanted to say anyway. I noticed that while documented that "smallest" is not really the smallest we can do, something wasn't fully right yet. I think this change makes a lot more sense, as that is what we want: we always want the smallest and fastest compression we can get, but in a balance with each other. So that is not the axis we are configuring. Indeed, the thing we want to indicate: are you allowed to use methods that are a bit less widespread and more experimental. I think SGF_PERSISTENT and SGF_TEMPORARY reflect that a lot more, as the latter can use any experimental algorithm it likes: they are temporary savegames anyway.

Tnx @JGRennison , very valuable comment :)

@TrueBrain TrueBrain force-pushed the zstd-compression branch 6 times, most recently from dc33160 to 8f86ca8 Compare March 11, 2021 22:11
@ldpl
Copy link
Contributor

ldpl commented Mar 12, 2021

Few things I noticed this pr is lacking compared to #8805. Though most important ones are easy to fix I guess.

  1. Network saves use the same (default) compression levels as persistent ones when even for the same format it would be better to choose level depending on use.
  2. (Malicious) client can request formats that are disadvantageous for the server like none or lzma(:2).
  3. Servers can't easily choose compressions that better suit their specialty (e.g. use higher compression levels for large maps). Even if they set _savegame_format it will affect their persistent saves and they lose the ability to accept multiple formats.
  4. In general there are more than 2 types of saves. Autosaves aren't quite the same as network saves and definitely even less similar to manual saves this pr seems to use. Even ignoring that there can be server controller doing saves via admin port. For example, end game saves (for the archive) prioritizing max compression or debug saves (on client desync) prioritizing speed. So ideally save command should accept savegame format, which would be quite easy to do in Feature: Allow client and server to negotiate on compression to use for the savegame #8805 but not with flags.

@TrueBrain
Copy link
Member Author

TrueBrain commented Mar 12, 2021

1. Network saves use the same (default) compression levels as persistent ones when even for the same format it would be better to choose level depending on use.

This is an alternative of what I mention at limitation. I tried to scope this PR to keep it easily reviewable; such extensions can be made in various of ways, but that shouldn't be part of this PR. For example, I considered making default_compression part of persistent/temporary too, so they could differ. But reality is, that it would first require some analysis of the other compressions too, and kick out what doesn't work for temporary saves, etc. This really is future-work, and is even more opinionated than what this PR presents :)

2. (Malicious) client can request formats that are disadvantageous for the server like none or lzma(:2).

Yes. The LZMA is of course exactly the point; clients that do not have zstd should use LZMA instead. Other formats like "none" are only bad for the bandwidth, but otherwise surprisingly fast. The only one in the list that would be somewhat bad is LZO, but not terrible bad. In general, all compressions we have set are not "bad". Just zstd is better.
Either way, the server could already use these formats to serve to clients; the only thing that changes now is that clients can force certain formats to servers. Not sure this is really a problem, honestly. Clients can waste CPU and bandwidth of servers, sure, but .. I am sure if people want to do this harm, they can find easier ways to do that. So in the balance of things, I currently do not see a reason for this to be a problem in need of addressing by this PR.

3. Servers can't easily choose compressions that better suit their specialty (e.g. use higher compression levels for large maps). Even if they set _savegame_format it will affect their persistent saves and they lose the ability to accept multiple formats.

As mentioned by limitations. I do not think this is a feature that is required for the average server admin. Expert server admins can compile their own server (and most do anyway), so I do not think it is needed for this to be a setting. But if there is really a request for this, future PRs can extend on this. For example, I considered creating savegame_persistent_format and savegame_temporary_format, which allows manually tuning both. But to keep the PR scoped and well defined, I did not do that.

4. In general there are more than 2 types of saves. Autosaves aren't quite the same as network saves and definitely even less similar to manual saves this pr seems to use. Even ignoring that there can be server controller doing saves via admin port. For example, end game saves (for the archive) prioritizing max compression or debug saves (on client desync) prioritizing speed. So ideally save command should accept savegame format, which would be quite easy to do in #8805 but not with flags.

This is already the case before this PR, so I read this more as a feature request. And if I translate it correctly, you basically would like that you can set the savegame format in the console command. I think this is a reasonable request, but for another PR of course.
Implementing this would be relative trivial, and has little to do with the flags introduced by this PR. Basically, how I would do it, is allow the console command to influence savegame format temporary, so it can enforce a certain format for that save. This to me would make the most sense, and is perfectly viable (with or without this PR landing in master). So yeah, sure, good idea. Just not for this PR of course :D (scoping is everything)

Copy link
Member

@LordAro LordAro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor amount of thoughts from scrolling through

src/lang/english.txt Outdated Show resolved Hide resolved
src/lang/english.txt Outdated Show resolved Hide resolved
src/saveload/saveload.cpp Outdated Show resolved Hide resolved
src/saveload/saveload.cpp Outdated Show resolved Hide resolved
SGF_NO_THREAD = 0, ///< Do not use threads.
SGF_THREADED = 1 << 0, ///< Save threaded.
SGF_PERSISTENT = 0, ///< Savegame should persistent through time and different clients.
SGF_TEMPORARY = 1 << 1, ///< Savegame will only be used temporary or locally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that the enum contains duplicate values with entirely different meanings is a bit odd

Copy link
Member Author

@TrueBrain TrueBrain Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are toggles. The idea is that you either useNO_THREAD or THREADED. We could do away with the 0-value ones, but than it is harder to see how a savegame is made and easier to think either one is on-by-default.

As I wrote in the PR description, I use this to avoid calls with true, true, which are not very descriptive and easy to mess up. But I am fine reverting back to the boolean approach; or of course if you have any other suggestions, I am all ears :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I can do 0 << 0 and 0 << 1 to make it more clear they are toggles? Not sure that helps :D

@TrueBrain TrueBrain force-pushed the zstd-compression branch 3 times, most recently from 98d2959 to a97a5f6 Compare March 14, 2021 11:13
@TrueBrain TrueBrain removed this from the 1.11.0 milestone Mar 30, 2021
src/saveload/saveload.cpp Outdated Show resolved Hide resolved
src/saveload/saveload.cpp Outdated Show resolved Hide resolved
@@ -2422,6 +2424,7 @@ struct ZSTDSaveFilter : SaveFilter {

/** The format for a reader/writer type of a savegame */
struct SaveLoadFormat {
uint8 index; ///< Index of this entry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should say that the index has to be unique, unless you actually intended to have multiple formats at one index in the bitmask.

Some compile time assert that index is unique would probably be nice, but I'm not sure it's possible. (Maybe with constexpr/consteval, but not sure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love some compile time asserts, but I wouldn't even know where to start :D And consteval is C++20, not?

ldpl and others added 3 commits April 5, 2021 13:17
Use zstd for network savegames, and lzma for all other.
This should speed up joining of multiplayer games, especially on
large maps.
…ormats

This means that if a client doesn't have all formats, it can still
join a multiplayer server.

The client informs the server of his available formats, after which
the server finds the best format given these constraints. If no
match is found, the client is disconnected with an error.
@TrueBrain
Copy link
Member Author

Original author indicates that zstd is a lot bigger than expected in the real world, so sadly enough this first needs further looking into what is going on exactly.

@TrueBrain
Copy link
Member Author

Every time I look at this PR, I don't like what I did myself. Mainly the way I solved flags is just shit. But also the whole way this is wired in is not in a way I like. I also miss a solid way of testing this .. which just annoys me.

So I could leave this open as inspiration for others, but honestly .. not sure that is useful. So I am going to close it up .. maybe I revisit this some day. But I doubt it will be this year :D

@TrueBrain TrueBrain closed this Jun 27, 2021
@TrueBrain TrueBrain deleted the zstd-compression branch October 31, 2021 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants