Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wpt crash when running ./wpt run chrome on macOS High Sierra #9007

Closed
sideshowbarker opened this issue Jan 12, 2018 · 37 comments
Closed

wpt crash when running ./wpt run chrome on macOS High Sierra #9007

sideshowbarker opened this issue Jan 12, 2018 · 37 comments
Labels
infra priority:backlog wptrunner The automated test runner, commonly called through ./wpt run

Comments

@sideshowbarker
Copy link
Contributor

When I try to run ./wpt run chrome … in my MacOS High Sierra environment, it fails:

 0:15.29 pid:63137 Full command: /Users/mike/workspace/web-platform-tests/_venv/bin/chromedriver --port=4452 --url-base=/
pid:63137 Starting ChromeDriver 2.35.528157 (4429ca2590d6988c0745c24c8858745aaaec01ef) on port 4452
 0:15.29 pid:63137 Only local connections are allowed.
 0:15.78 INFO Starting runner
objc[63138]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[63138]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

Apparently the cause is some change that was made in High Sierra in Objective-C fork() handling:

There’s a possibly-related Python bug at https://bugs.python.org/issue30837.

It’s not clear what could be changed in the wpt sources to prevent this — nor why it’s not a problem when using wpt run with Firefox though it is with Chrome — but a workaround is to call wpt like this:

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES ./wpt run chrome …

…that is, with that environment variable set.

Or if you don’t want to have do it that way every time you call ./wpt run, then you can do this:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

…to persistently set that environment variable in that shell session.

@foolip foolip added infra wptrunner The automated test runner, commonly called through ./wpt run labels Jan 15, 2018
@foolip
Copy link
Member

foolip commented Jan 15, 2018

@gsnedders @sideshowbarker, any sense of what the priority of this should be?

@gsnedders
Copy link
Member

gsnedders commented Jan 16, 2018

Calling this a dupe of #6998.

@foolip
Copy link
Member

foolip commented Feb 23, 2018

Reopening because I'm seeing the same problem when trying out #8979, i.e. it'll affect Safari too. And it's not a crash for me, it just seems to hang. Seems like not necessarily a dupe of #6998.

@foolip foolip reopened this Feb 23, 2018
@foolip
Copy link
Member

foolip commented Feb 25, 2018

See #8979 (comment) for some findings. The best I can come up with here is to detect the version of python used, and exit with an error if OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES isn't set.

@jgraham
Copy link
Contributor

jgraham commented Feb 27, 2018

So the low-effort solution here is presumably to set that environment variable in a wrapper (e.g. wpt run could set it and then launch the actual harness as a child process; I assume that would be enough and merely setting it before forking would not).

The high effort (i.e. good) solution is to avoid using multiprocessing since it fundamentally violates the expected semantics here by calling fork but not exec. That would involve writing a wrapper around the runner process and implementing some protocol to pass data in.

The other high effort (and also good) solution is to add Python 3 support and use the new option to multiprocessing that avoids the problematic behaviour.

@gsnedders
Copy link
Member

@foolip I'm pretty sure this is all ultimately the same issue, just the unsafe fork stuff breaking everything and randomly crashing.

@gsnedders
Copy link
Member

It sounds like no_proxy='*' is a better idea than OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES, given it should have much fewer repercussions.

@foolip
Copy link
Member

foolip commented Mar 27, 2018

@gsnedders, what's no_proxy?

@gsnedders
Copy link
Member

@foolip OBJC_DISABLE_INITIALIZE_FORK_SAFETY disables various safety checks in Obj-C code generally, which makes it a pretty bad workaround generally; the issue that we're hitting is things looking up proxy configuration during forking and we can disable that specifically without losing any safety guarantees.

If I'm not mistaken, the places where we create processes with multiprocessing are:

  • tools/serve/serve.py: to start each server in its own Process (is there any actual need for this? could we just use threads here?)
  • tools/wptrunner/wptrunner/testrunner.py: we run each TestRunner in its own Process (again, could we just use threads here?)

@foolip
Copy link
Member

foolip commented Jul 12, 2018

@gsnedders, what do you think we should do here? You mentioned no_proxy='*', and I see it's mentioned in https://bugs.python.org/issue30837#msg297630. I guess that disabling proxy servers kind of makes sense anyway when trying to run wpt, so that seems fine.

I think the next step is just trying to see if it works?

@liuguanyu
Copy link

liuguanyu commented Jul 18, 2018

I got this error when I try to run with chrome using this command the second time , and the shell was hung. I try to rerun it by remove chromedriver. Mac OS version is 10.13.4 and Python version is 2.7.10

@foolip
Copy link
Member

foolip commented Sep 25, 2018

@gsnedders can you provide a status update on this? This is tracked in Resolve known important infra issues and Q3 is almost at an end.

@gsnedders
Copy link
Member

Did we ever reach any agreement as to how to solve this? Just exit quickly if some environment variable isn't set?

@foolip
Copy link
Member

foolip commented Sep 26, 2018

If having an environment variable set before invoking ./wpt is required, then that would be one way.

But I am curious if @jugglinmike has found it necessary to work around this issue in the Buildbot setup for Safari?

@jugglinmike
Copy link
Contributor

Indeed I have

@foolip
Copy link
Member

foolip commented Sep 27, 2018

Argh, that is unfortunate, but unsurprising.

I guess that just erroring out and telling the user to try again with no_proxy='*' ./wpt ... would be an improvement over just hanging. Pointing to an issue about resolving the root issue would be good too.

@gsnedders
Copy link
Member

gsnedders commented Sep 30, 2018

OK, I'm very confused right now. I can't reproduce it failing now?!

$ echo $OBJC_DISABLE_INITIALIZE_FORK_SAFETY

pnin:web-platform-tests gsnedders$ echo $no_proxy

pnin:web-platform-tests gsnedders$ ./wpt run safari infrastructure/assumptions/
…
web-platform-test
~~~~~~~~~~~~~~~~~
Ran 23 checks (8 tests, 15 subtests)
Expected results: 21
Unexpected results: 2
  subtest: 2 (2 fail)
…

Both Safari and Chrome WFM now?!

@jugglinmike
Copy link
Contributor

macOS 10.13.6 was released on 2018-08-12, so this may have been fixed at the operating system level. Any chance you've recently upgraded?

@gsnedders
Copy link
Member

@jugglinmike that seems… unlikely? this was a deliberate behaviour change at the OS level to avoid race-condition.

@jugglinmike
Copy link
Contributor

In that case, "fixed" is the wrong word. I'm still curious if you're using a different OS version, though.

@gsnedders
Copy link
Member

I'm on 10.13.6 (17G65), to be clear.

@foolip foolip changed the title wpt crash when running './wpt run chrome …' on MacOS High Sierra wpt crash when running ./wpt run chrome on macOS High Sierra Oct 8, 2018
@mdittmer
Copy link
Contributor

Ping from your friendly neighbourhood ecosystem infra rotation

Any updates on this issue, @gsnedders?

@foolip
Copy link
Member

foolip commented Mar 22, 2019

Ping @gsnedders. I think all we need here is a check that no_proxy is set and otherwise exit with a message saying it has to be set.

@whimboo
Copy link
Contributor

whimboo commented Jun 14, 2019

I can still reproduce this crash with MacOS Mojave (latest patch level). When I add no_proxy the crash of Python is gone, but I see the hang as also mentioned before. So if there is a crash or hang doesn't make a difference. We might want to find out which multiprocessing usage is causing the hang. For me it always hangs right after Using 1 client processes.

@foolip
Copy link
Member

foolip commented Jun 17, 2019

That's interesting, I've had the crash a bunch of times but never the hanging IIRC. But maybe I don't recall correctly.

@whimboo can you print python --version and the relevant bits of sw_vers?

@gsnedders
Copy link
Member

@whimboo What happens with OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES? I presume still a hang?

@whimboo
Copy link
Contributor

whimboo commented Jun 25, 2019

Sorry, I have seen the problem with Fennec but not Chrome, so maybe my hang is not that closely related.

But we got more insight on the hang via bug 1560960 and James landed a patch for it which will soon reach mozilla-central. Reason here was that the HTTP servers at port 8000/8001 could not be started and as such s.connect() was hanging forever.

@whimboo
Copy link
Contributor

whimboo commented Jun 25, 2019

There is also https://bugzilla.mozilla.org/show_bug.cgi?id=1561224 now, which I will use to do some investigation around the crash or hang. OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES doesn't actually help in all the cases.

@whimboo
Copy link
Contributor

whimboo commented Jun 25, 2019

Is anyone still able replicate this issue with #17488 being merged? I have a hard time to do so.

@marcoscaceres
Copy link
Contributor

Quickly trying this, I get:

$ ./wpt run chrome
dyld: Library not loaded: @executable_path/../.Python
  Referenced from: /Users/mcaceres/dev/web-platform-tests/_venv/bin/python2.7
  Reason: image not found
CRITICAL:tools.wpt.utils:('/Users/mcaceres/dev/web-platform-tests/_venv/bin/pip', 'install', '--prefer-binary', u'zstandard') exited with return code -6
CRITICAL:tools.wpt.utils:
Traceback (most recent call last):
  File "./wpt", line 5, in <module>
    wpt.main()
  File "/Users/mcaceres/dev/web-platform-tests/tools/wpt/wpt.py", line 143, in main
    venv = setup_virtualenv(main_args.venv, main_args.skip_venv_setup, props)
  File "/Users/mcaceres/dev/web-platform-tests/tools/wpt/wpt.py", line 121, in setup_virtualenv
    venv.install(name)
  File "/Users/mcaceres/dev/web-platform-tests/tools/wpt/virtualenv.py", line 105, in install
    call(self.pip_path, "install", "--prefer-binary", *requirements)
  File "/Users/mcaceres/dev/web-platform-tests/tools/wpt/utils.py", line 49, in call
    return subprocess.check_output(args)
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 223, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '('/Users/mcaceres/dev/web-platform-tests/_venv/bin/pip', 'install', '--prefer-binary', u'zstandard')' returned non-zero exit status -6

@gsnedders
Copy link
Member

@marcoscaceres oh, yeah, that's unusually common on macOS using Homebrew Python given when it cleans up the old versions the paths no longer exist; we probably ought to deal with that somehow more obviously.

@gsnedders
Copy link
Member

@marcoscaceres #17522 should force regeneration in that case, finally.

@marcoscaceres
Copy link
Contributor

Appreciate that @gsnedders. Saves me a bunch of time and pain as I don’t know anything about how python works.

@foolip
Copy link
Member

foolip commented May 20, 2020

Does anyone still have no_proxy=* in their environment? Could you test if you still need it? I see I don't have it, and I haven't come across this problem in a long time despite running ./wpt run with all browsers from time to time. I'm now on macOS 10.15.4 and using the built-in Python 2.7.16.

@gsnedders
Copy link
Member

It's doing guaranteed-to-be-unsafe things at the OS level; regardless, this will be fixed by moving to Python 3, and the rare occurrences nowadays I think further suggests this is a backlog that will remain as such till we drop Python 2 support.

@foolip
Copy link
Member

foolip commented Aug 12, 2020

OK, let's just leave this open to make it easier to find for anyone who runs into it until we drop Python 2.

@foolip
Copy link
Member

foolip commented May 6, 2021

We have now dropped Python 2, closing. Issues like this can still occur, but they'll be new problems, like #28663.

@foolip foolip closed this as completed May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra priority:backlog wptrunner The automated test runner, commonly called through ./wpt run
Projects
None yet
Development

No branches or pull requests

10 participants