New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building on MacOS fails with "bus error: 10" #9386
Comments
I did some initial investigation into this, and there are a few things to observe: Bisecting shows the above commit is the first commit after which it breaks. This doesn't mean it is the direct cause, but I will get back to that in a bit. Why during LTO? Well, when I compile on GCC with So I would reason the two events are related. We might already have been pushing I have not looked into solutions; I just did some initial triage on this problem. I am 99.99% sure that commit is causing the issue in a direct sense, but it might already have been an issue for years, just not showing up yet as it didn't hit the thresholds :) |
I notice that the MacOS builds failing also results in the releases for Windows and Linux builds being cancelled for that specific nightly. In other words, currently (as of the time of writing), no-one can test the newest features of OpenTTD without compiling the source code themselves. This makes this a high-priority if not critical issue from the perspective of players and testers who want the latest features as soon as possible, regardless of the OS they play on. |
As experiment I removed ~200 settings from our So yeah, that kinda confirms we hit some kind of upper-limit here :D |
How is JGRPP able to still build on macOS given that JGRPP already adds dozens of new settings on top of the base game settings? Is it that @JGRennison has more memory on his MacOS compiling machines or is there a different trick used by JGRPP which lets the setting count scale indefinitely without things breaking? |
Nothing can scale indefinitely, so that is not true for sure ;) But I am not saying that the amount of settings is the issue. Just saying that since that commit there are too many variables to track. And that we are now just over the limit clang can handle. And I can proof that by removing ~50 settings, and it works again. This can have been dormant for years, or it can be that the above mentioned commit tripled the amount of variables a compiler checks. I dunno atm :) Reducing the settings is a means to an end, nothing else. |
When I tested some stuff it was very weird, I could build 8dd846b (latest know buildable nightly) using release workflow, but same commit plus a little CI workflow modification (just switching build mode to |
At present bf500c3 and many other recent commits from master are not in my branch. I have not done anything specifically on this issue. |
Would it be possible to temporarily drop support for MacOS (on the front end) until this issue can be solved, so that at least Windows and Linux users can continue to enjoy the newest features via the nightly releases? |
It should be okay to allow failure, and still upload working bundles, for AWS, but I don't know for steam. |
It is not wise to skip MacOS just because we broke it; we could always reverse the patch that broke it. But even there, I think we are better off trying to figure out what causes the problem and how to solve it. I think that fixing the GCC issue also fixes the clang issue, so there are more platforms to test solutions on, as I strongly suspect they both point to the same issue. But I can verify that claim tomorrow with some more triage :) Sadly, the GCC issue only shows up with LTO, which takes a very long time :P So debug-sessions will be slow :) |
See #9389, but basically on a hunch I made a change that seems to work. I still have 0 clues what causes this problem, and I can only guess why this change fixes the issue. So any insights would be appreciated, but this might be a stop-gap solution to get the builds going again. |
Bit out of my league here, but this I find interesting: https://godbolt.org/z/1hao5fafT Mind the When using So I think in the core of things, this is the problem we run into. The compilers fail to see that this memory will never be destructed normally, and in result start to do all kind of weird shit :P The godbolt adds a simple So the way we create the tables seems to be much more the issue than the A few commits before this commit mention above, we changed the settings system to use subclasses to indicate type. This also meant we switched to As I mentioned earlier, this really feels like we kept adding stuff up, and the above mentioned commit was just the tipping point. Although the solution in godbolt works also in our code-base, it generates a lot of warnings about a virtual dtor .. so not sure if we can apply this to our code-base, but at least I hope this analysis helps someone to say: owh, but we should just do .... :D |
…nfused SettingDesc is only used in static lists, so never destructed anyway (well, on close of course, but that the OS will clean up for us nicely). And having a dtor, which the compiler sees is never used btw, causes compilers to start optimizing that, and .. get lost somewhere. GCC complaints with: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without clang on MacOS just crashes (with bus error 10, read: out of memory). This commit produces tons of warnings, as it is not really meant to be accepted upstream. More to show that the problem of OpenTTD#9386 is not just "std::string" -> "std::string_view", but much deeper in fact.
So we have too much stuff defined in global scope basically. So many things to refactor :) |
I tried something in master...glx22:my_try_at_9386, but that just moves the location of gcc |
…ants instead of new + unique_ptr With std::variant all memory can be figured out at compile time, so the compiler needs to keep track of fewer elements. It also saves out a unique_ptr and its memory management, over a slight impact for resolving a setting.
…ants instead of new + unique_ptr With std::variant all memory can be figured out at compile time, so the compiler needs to keep track of fewer elements. It also saves out a unique_ptr and its memory management, over a slight impact for resolving a setting.
…ants instead of new + unique_ptr With std::variant all memory can be figured out at compile time, so the compiler needs to keep track of fewer elements. It also saves out a unique_ptr and its memory management, over a slight impact for resolving a setting.
…stead of new + unique_ptr With std::variant all memory can be figured out at compile time, so the compiler needs to keep track of fewer elements. It also saves out a unique_ptr and its memory management, over a slight impact for resolving a setting.
And ... this problem is back. Exact same MO, GCC complains, MacOS clang crashes. Introduced by cdb3dd0 .. and the solutions of last time are not really going to work. The I am pretty sure I am missing something here, and that the solution is far easier, but it is eluding me atm. Either way .. IT IS BAAACCCKKKKKKK :D PS: I expected the commit before this one to be the issue, but it seemly is not. So my conclusion it is the |
I did a quick (well not so quick as building release is so slow) test, defining a |
Sadly .. although we fixed the GCC warning, clang on MacOS is still crashing with the same error. I do not really know how to debug that further .. |
Version of OpenTTD
Every commit after bf500c3
Expected result
Being able to compile on MacOS with
RelWithDebInfo
.Actual result
Compiling on MacOS fails with "Bus error: 10" when using
RelWithDebInfo
.Steps to reproduce
Build on MacOS. Or see our nightly .. it fails every night.
The text was updated successfully, but these errors were encountered: