New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mysteriously poor performance on macOS #7644
Comments
This manifests for me on Mojave (10.14) and did not occur on High Sierra (10.13). Notably:
|
Some additional notes:
Example, taken from times measured on my Windows system running 1.9.2 64 bit:
The frame rate window (and fps console command) also display a line for Sound mixing, this is generally not relevant, since sound mixing is run in a separate thread. It should never affect total frame rate. (It could get relevant if someone reported crackling/popping sound.) |
As requested:
1.8.0 1.9.0
Full animation Disabled:
1.9.2
Full animation Disabled:
I'd like to add that this only occurs when playing full-screen or full window. If I reduce the window size to 1/4 of the screen, the FPS increases to ~30. |
Confirm MacBook Pro 15 2017 |
OpenTTD is causing color space / pixel format conversion in the background. Likely due to mismatch of backing surface format chosen by OpenTTD and native surface format of the display/OS. The slowdown scales with OpenTTD resolution and Mac machine's high resolution exacerbates the problem. This format/color space conversion happens on OpenTTDs main thread after a call to CGContextDrawImageWithOptions. I don't have time to look at this in the source, but that is where the problem comes from. Perhaps these pointers are enough information? |
I don't have any skills in Mac debugging, but I used Apple instructions to set up an Instruments run, long paste of results here: https://paste.openttdcoop.org/pef0g4kgn/mnd7ty/raw TL;DR
A cursory google search turns up a few other people with similar issue, nothing conclusive
Apple developer guide is here: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CocoaViewsGuide/Optimizing/Optimizing.html#//apple_ref/doc/uid/TP40002978-CH11-SW4 My results are for macOS 10.14, on a 2.7Ghz i7 |
So if the rendering was changed to use explicit GPU acceleration instead of blitting 2D surfaces, it might resolve the issue? Since the GPU could receive texture data in our native pixel format, and a shader could convert it as necessary. |
You could indeed switch to a MTKView (or other Metal view) and use Metal to do the transfer while having a shader convert the image to the final format. This would work just fine. Though I don't think you have to do it via Metal as it should be possible to set things up correctly without requiring Metal. This new behaviour likely has to do with changes to Layer Backed Views introduced in 10.14 SDK, I'll see if I can find out what exactly introduces the observed behavior. In the end it's just a choice but I could imagine not wanting to introduce a new dependency if it's not necessary. |
I think we definitely want to support macOS versions as far back as reasonably possible, at least not having hard dependencies on newer features. |
Just wanted to confirm here that the issue is important and in place. Tested on 10.14 and 10.13 both showing bad results, low fps, mouse pointer lags. |
Trying to track this down but having issues getting a build working. Built master repo without any compression libs, build succeeded, but when executing I immediately encounter an error in libdyld: stack_not_16_byte_aligned. Probably missing some steps getting the build running. Can dig further if someone can help me out here :) |
Anything I can test? |
From my thread on Reddit:
So is the issue only Retina related? |
I think it's worth testing with both 32bpp and 8bpp blitters. Probably worth testing with driver debug enabled (level 1) to be sure the expected blitter is selected. I'm not entirely sure about how to run with a commandline on macOS, it's too many years since, but try this from a terminal: |
I'm interested in what the default blitter is when you don't override it on the commandline. |
I would guess 32bpp-anim. At least that gave me the same performance as before. |
If you run The expected framerate (when simulation is not too heavy for the CPU) is 33.33 fps, but anything between 32 and 34 is acceptable. |
As I mentioned earlier; this is a colourspace/format problem. The OS has detected that the pixels it is being given by OpenTTD are not in the exact colourspace/format that the monitor is accepting at that time. An external monitor likely has a different, in this case more compatible, colourspace than the built-in screen, which is why the problem doesn't exist. I've checked the colourspace code and it seems fine though. Obviously selecting a different blitter changes how the OS handles things, which is great info. It would be useful to know the exact difference between the blitters, in particular what kind of surface they request from the OS and what colourspace they choose for it. |
Oh OK. I read something about expecting 800 frames in the thread that led me here, so that put me off. |
@SoothedTau Ok, sorry. If there’s anything I can test, let me know. |
No need to apologise! All data is good data, your comment cemented what I already thought. I've seen this behavior on macOS before where external screens don't have the same issues simply due to being a different type of screen. (Retina vs Non-retina for example, but HDR/Non-HDR as well). I'm still stuck on not being able to run the game after having built it, I wanted to debug and test some minor changes but as I can't produce working executables it's a little hard :). |
That's in fast-forward. The game needs to run at 33.33 fps in normal mode to run at full speed, but should generally be capable of running much faster in fast-forward mode, depending on CPU. Definitely do test with a new game (no vehicles, 256x256 map) how fast it can run. |
Ah. Makes sense. I’ll do some more testing in my systems. |
Had some time to look at this last evening and the issue is, as expected, due to a mismatch in colourspaces: OpenTTD queries the system for the colourspace of the monitor. On many monitors (including external ones!) this returns a pretty standard sRGB colourspace which works just fine with how OpenTTD handles colours internally. What happens in this case, and which causes problems for OpenTTD, is that the main display of iMacs and MacBooks actually returns a P3 colourspace as its best fit because these displays are capable of more than what sRGB supports (they are pretty good displays!). Now OpenTTD tries to use P3 even though it really shouldn't as its outputting in the sRGB colourspace which means the pixel data is incompatible and thus won't look right. Additionally, the system introduces a conversion step from an internal image to the image the OS will use to show the window, this step is very costly as the P3 colourspace requires much more data per pixel than the usual sRGB space and the pixel formats involved in P3 are far more complex to convert between. All relevant code for this problem is in wnd_quartz.mm. Here is a breakdown of it: The NSWindow that OpenTTD uses (created on line 287, WindowQuartzSubdriver::SetVideoMode) is in the P3 colourspace by default when running on the main display of a Mac. The CGContext that is used (created on line 605, WindowQuartzSubdriver::WindowResized) to store OpenTTD's drawing results internally is created with a colourspace from QZ_GetCorrectColorSpace (line 109) which returns whatever CGDisplayCopyColorSpace (line 116) returns which is in the case of running on the main display of a Mac; also P3. These two combined result in very slow calls to CGContextDrawImage (line 191 and 212, (void)drawRect:(NSRect)invalidRect) which is what drags down the performance of the game. To get around the issue I made changes to 2 lines. Objective is to prevent running with a P3 colourspace. macOS is pretty smart and will make sure everything looks right even when mixing colourspaces as long as we tell it in what space the pixels we give it are. With that in mind I changed line 116 to always return a standard sRGB colourspace:
And added a new line after line 291 that sets the colourspace of the window to match the one we use internally:
With these two changes I get much improved performance and everything looks right. I can't make the actual change in the code for reasons so someone else will have to do that but it should be pretty straightforward. I should add that this fix did not go through extensive testing and there are edge-cases I'm likely unaware of but I think it should work for most if not all cases. Probably important to test on an external monitor too to see if that's still alright. P.S.: If someone is willing to; is anyone else seeing issues with compiling the release version on macOS? I was seeing misaligned stack errors and had to enable debug mode to compile a working binary. Wondering if any of you sees the same error? I was compiling with Xcode 11 beta on Catalina. |
I just had a look. Unfortunately, it doesn't seem to have resolved the issue for me. Still a significant performance issue that goes away when switching the monitor over to an sRGB profile. I built from the latest |
Interesting! I tried your build on Catalina (10.15.3), and didn't see performance issues (one core hovers at ~40% cpu). I've attached my build which works well for me if you want to try? openttd-custom-20200227-cocoa_set_colorspace-gda504c127d-OSX.zip |
Thanks, I'll try your build tomorrow. Weird that it works for you. Do you not get any performance difference based on setting the monitor profile? edit: just in case, I'm on 10.14.6, and I'll provide some more info tomorrow. |
@msikma If you don't mind me asking; what OS build are you on and what hardware setup? I may try to repro. Edit: Ah, Mojave. Hardware setup would be useful though! Edit2: Performance is a lot better, but still not as it should be. I'll investigate further when I have time. |
@SoothedTau Apologies for late response, here's my hardware setup:
As you can probably guess it's not an official Apple setup, so I think it's going to be difficult to compare actual benchmarks, but I can provide those. I am running OSX Mojave 10.14.6, and almost all of it is vanilla Apple software aside from a few things during startup. I've got a triple monitor setup, and I didn't realize this before but that might make a difference too. I'll give it a try with only one connected later. All three of my monitors have separate color profiles and none of them are sRGB. Re: @spauka, sorry but I couldn't run your build since it requires OSX Catalina. The dreaded compatibility changes are forcing me to stay on Mojave unfortunately. |
Ah right, don't have such a system lying around! Found what is causing the slowdown though:
You can prevent using drawRect entirely by using CGBitmapContextGetData on the view's context and memcpy'ing or otherwise copying the data in directly. With the format set to sRGB you shouldn't need any conversion, though you may want to implement 2x scaling to account for HiDPI when it is indeed present. When this is done, drawRect isn't necessary anymore and the only time spent is OpenTTD drawing and the upscale if HiDPI is enabled. You now have full control over how to implement the copies and you should be able to improve the performance should the need arise. All the copying now takes significant time so you may want to see what can be done. |
I see this issue just got closed, but I don't understand why. I tested this fix and it didn't work for me. The last commit that got added is identical to the one I tested. Has anyone else tested it? |
I commented on the pull request here #8023 (comment) TL;DR. For me:
|
That's weird, because the patch definitely did not bring back 1.8.0 performance for me, to the point where I could tell with the naked eye easily. Did anybody else test it? And did they get the same performance as 1.8.0 or simply "better" performance without being too specific? I'm sorry if I sound a bit disappointed. I can log the difference in FPS to get hard numbers for this. The difference in colors in the screenshot is an example of colors being interpreted as the wrong profile. The screenshot has "Display P3" as the embedded profile which is the user's system profile. Here's what happens when I take the left half of the image into Photoshop, convert it to sRGB, and then assign it the Display P3 profile without converting the color values: They look identical again (minus some loss from the conversion roundtrip). The color differences could be greater or smaller than that of the user who posted the original image, depending on what color profile they use. |
FWIW I think it might be worth reopening this issue, or opening a new issue regarding the colorspace issue and continuing performance problems on OSX. The patch above makes the game playable but it does seem to fix all the issues that have been raised... |
I think it would be a better use of time to work on an OpenGL based renderer which would most likely avoid all these issues. |
I am happy in principle to have this or another issue open. It would be helpful though to have quantified the issues. It's unclear whether there is one mac performance issue or several. Apple is pretty much the only vendor where we can rely on there being 2 OS versions and only a handful of hardware versions :) It should be possible to categorise and reproduce the issues more reliably. :) Currently we lack consistency, e.g. a common set of FPS numbers on a common set of savegames (or something like that). |
Ideas for what might be a useful test:
|
OpenGL is deprecated on macOS. I wouldn't recommend developing new features with it. |
You're absolutely right. I'm sorry to say I tried to get some numbers on this but I haven't quite gotten them yet, because I tried to compile 1.8.0 to have a baseline and somehow that one doesn't run well on my system either (the official release does). It ran basically like the 1.9.0 release. So I'm not sure what's going on. I need to investigate this further. I'd love to help out in some way so I'll be looking into this when I've got some more time. 🙂 |
Well, the official 1.8.0 release was build using a very old cross-compiler using and targeting the 10.6 (or maybe even 10.5) SDK. The current builds are done with a real native compiler on Azure infrastructure, using a SDK 10.10+ (not sure which exact versions Azure provides). It it entirely possible that macOS exhibits different app compatibility changes depending on the OS version the app was compiled against. |
Yeah, that's what I was thinking too. Could it be that this bug is really a result of SDK changes? I'll look into compiling with an older SDK to see what happens. |
…S performance issues (OpenTTD#7644)
… output performance degradation.
Version of OpenTTD
1.9.0 and later, at least
Expected result
All operating systems have comparable performance when running on comparable hardware.
Actual result
Several people are reporting that the macOS version of OpenTTD has unreasonably poor performance, and the framerate window does not reveal any obvious reason.
This bug covers the situation when all of the following occur at the same time:
Reported on TT-Forums and this comment on bug #7247.
Note that this is not related to #7247, as this bug manifests even when there are zero vehicles in game. The poor performance is also caused by something not being measured, since the sum of the game loop, drawing, and video output times do not add up to the time it should to cause the total frame rate.
Steps to reproduce
Run official OpenTTD builds on a macOS 10.14 system.
The text was updated successfully, but these errors were encountered: