Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safari stable runs produce duplicate, conflicting results for some tests #20607

Closed
stephenmcgruer opened this issue Dec 4, 2019 · 15 comments
Closed

Comments

@stephenmcgruer
Copy link
Contributor

stephenmcgruer commented Dec 4, 2019

There is no Safari stable run for December 3rd, 2019 (https://wpt.fyi/runs?label=master&label=stable&from=2019-12-02&to=2019-12-05).

Looking at azure pipelines, the build for epochs/daily for that day was https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=37728

wpt.fyi shows that as INVALID in the status page (having grabbed the run id from internal logs): https://wpt.fyi/status

Checking internal server logs, seems there are two errors that were logged on Dec 3rd:

ConflictingDataError: Conflicting '/webrtc/RTCRtpParameters-rtcp.html' found in the merged report
summarize (/home/vmagent/app/wptreport.py:318)

and

ConflictingDataError: Conflicting 'browser_version' found in the merged report
update_property (/home/vmagent/app/wptreport.py:167)

The webrtc conflict was seen 3 times on Dec 3rd, the browser_version conflict once.

The results link for this build was https://dev.azure.com/web-platform-tests/b14026b4-9423-4454-858f-bf76cf6d1faa/_apis/build/builds/37728/artifacts?artifactName=safari-results&api-version=5.1&%24format=zip, I'm going to check now to see which of the two happened to this report.

@stephenmcgruer
Copy link
Contributor Author

$ grep -r /webrtc/RTCRtpParameters-rtcp.html --files-with-matches
wpt_report_2.json
$ python3 -m json.tool < wpt_report_2.json > wpt_report_2_pretty.json
$ grep -r /webrtc/RTCRtpParameters-rtcp.html wpt_report_2_pretty.json
            "test": "/webrtc/RTCRtpParameters-rtcp.html",
            "test": "/webrtc/RTCRtpParameters-rtcp.html",

Checking with vim, there are two entries for /webrtc/RTCRtpParameters-rtcp.html beside each other:

        {
            "status": "TIMEOUT",
            "known_intermittent": [],
            "test": "/webrtc/RTCRtpParameters-rtcp.html",
            "subtests": [],
            "duration": 27096,
            "message": "TestRunner hit external timeout (this may indicate a hang)",
            "expected": "OK"
        },
        {
            "status": "OK",
            "known_intermittent": [],
            "test": "/webrtc/RTCRtpParameters-rtcp.html",
            "subtests": [
                {
                    "status": "FAIL",
                    "expected": "PASS",
                    "message": "assert_equals: Expect dictionary.rtcp to be plain object expected \"object\" but got \"undefined\"",
                    "name": "setParameters() with modified rtcp.cname should reject with InvalidModificationError",
                    "known_intermittent": []
                },
                {
                    "status": "FAIL",
                    "expected": "PASS",
                    "message": "assert_equals: Expect dictionary.rtcp to be plain object expected \"object\" but got \"undefined\"",
                    "name": "setParameters() with modified rtcp.reducedSize should reject with InvalidModificationError",
                    "known_intermittent": []
                }
            ],
            "duration": 693,
            "message": null
        },

@foolip
Copy link
Member

foolip commented Dec 4, 2019

This is the first time I've seen this. @Hexcles have you ever come across it?

My initial guess for how this could happen is if the manifest generated in each job isn't identical, but even then hashing should be by test name...

@foolip
Copy link
Member

foolip commented Dec 4, 2019

Oh wait, the duplication was in a single report. Then I don't have a guess.

@foolip
Copy link
Member

foolip commented Dec 4, 2019

The message "TestRunner hit external timeout (this may indicate a hang)" was added in #20321 so adding @jgraham @LukeZielinski who were on that PR in addition to @Hexcles.

Was it expected from those changes that it could result in duplicate entries in the report?

@stephenmcgruer
Copy link
Contributor Author

stephenmcgruer commented Dec 4, 2019

According to stackdriver, duplicate "/webrtc/RTCRtpParameters-rtcp.html", was first seen 8 days ago (Nov 26th). There have been 8 instances in that time.

Conflicting 'browser_name' was first seen on the 25th June 2019. There have been 2 instances in the last 30 days.

@foolip
Copy link
Member

foolip commented Dec 4, 2019

@stephenmcgruer do we have an easy way to see when the browser or OS version has changed in the history of runs?

@Hexcles
Copy link
Member

Hexcles commented Dec 4, 2019

This looks like a race condition related to #20321 , but I'm not sure why it only happens to that particular test. Note that the duration of the second entry isn't even close to timeout.

@Hexcles
Copy link
Member

Hexcles commented Dec 4, 2019

ConflictingDataError: Conflicting 'browser_version' found in the merged report

That's an Edge Canary run. Filed #20613.

@Hexcles Hexcles changed the title Safari stable run missing Dec 3rd - status INVALID Safari stable runs produce duplicate, conflicting results for /webrtc/RTCRtpParameters-rtcp.html Dec 4, 2019
@Hexcles Hexcles changed the title Safari stable runs produce duplicate, conflicting results for /webrtc/RTCRtpParameters-rtcp.html Safari stable runs produce duplicate, conflicting results for some tests Dec 4, 2019
@Hexcles
Copy link
Member

Hexcles commented Dec 4, 2019

Actually, this is happening to other tests as well. Here's a list of tests with conflicting results we've seen so far:

ConflictingDataError: Conflicting '/media-source/mediasource-redundant-seek.html' found in the merged report
ConflictingDataError: Conflicting '/webrtc/RTCRtpParameters-rtcp.html' found in the merged report
ConflictingDataError: Conflicting '/infrastructure/assumptions/allowed-to-play.html' found in the merged report
ConflictingDataError: Conflicting '/referrer-policy/gen/req.attr/no-referrer/script-tag/cross-https.swap-origin.http.html' found in the merged report
ConflictingDataError: Conflicting '/css/motion/offset-path-ray-003.html' found in the merged report
ConflictingDataError: Conflicting '/webstorage/storage_session_setitem_quotaexceedederr.html' found in the merged report
ConflictingDataError: Conflicting '/webstorage/storage_setitem.html' found in the merged report
ConflictingDataError: Conflicting '/remote-playback/disable-remote-playback-watch-availability-throws.html' found in the merged report

@Hexcles
Copy link
Member

Hexcles commented Dec 4, 2019

Reverting #20321 is difficult -- we have tests that can hang Firefox and wptrunner indefinitely, at least #20245 and probably more. We'd need to revert them all.

@stephenmcgruer
Copy link
Contributor Author

@stephenmcgruer do we have an easy way to see when the browser or OS version has changed in the history of runs?

Offhand I'm not sure; what is your underlying thought here @foolip?

@stephenmcgruer
Copy link
Contributor Author

Reverting #20321 is difficult -- we have tests that can hang Firefox and wptrunner indefinitely, at least #20245 and probably more. We'd need to revert them all.

Agreed, I think we want to come up with a fix instead. Do we understand yet what the race condition is? ('No' is a reasonable answer here!)

@foolip
Copy link
Member

foolip commented Dec 5, 2019

I just wanted to check if there was a change in browser or OS version around the time this started happening.

@foolip foolip assigned stephenmcgruer and unassigned foolip and mustjab Dec 5, 2019
@Hexcles
Copy link
Member

Hexcles commented Dec 6, 2019

Do we understand yet what the race condition is?

I did not, but after some investigation today I have a clue.

93:29.57 TEST_START: /webrtc/RTCRtpParameters-rtcp.html
93:54.72 INFO Got timeout in harness
93:56.67 TEST_END: TIMEOUT, expected OK - TestRunner hit external timeout (this may indicate a hang)
94:06.46 ERROR test_end for /webrtc/RTCRtpParameters-rtcp.html logged while not in progress. Logged with data: {"status": "TIMEOUT", "extra": {"test_timeout": 10}, "known_intermittent": [], "test": "/webrtc/RTCRtpParameters-rtcp.html", "message": "Executor hit external timeout (this may indicate a hang)\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py\", line 774, in __bootstrap\n    self.__bootstrap_inner()\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py\", line 801, in __bootstrap_inner\n    self.run()\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py\", line 754, in run\n    self.__target(*self.__args, **self.__kwargs)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/wptrunner/wptrunner/executors/executorwebdriver.py\", line 311, in run_func\n    self.result = True, self.func(self.protocol, self.url, self.timeout)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/wptrunner/wptrunner/executors/executorwebdriver.py\", line 382, in do_testharness\n    protocol.webdriver.url = url\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py\", line 19, in inner\n    return func(self, *args, **kwargs)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py\", line 528, in url\n    return self.send_session_command(\"POST\", \"url\", body)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py\", line 515, in send_session_command\n    return self.send_command(method, url, body)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py\", line 470, in send_command\n    session=self)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/transport.py\", line 175, in send\n    response = self._request(method, uri, payload, headers)\n  File \"/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/transport.py\", line 191, in _request\n    return self.connection.getresponse()\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py\", line 1148, in getresponse\n    response.begin()\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py\", line 448, in begin\n    version, status, reason = self._read_status()\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py\", line 404, in _read_status\n    line = self.fp.readline(_MAXLINE + 1)\n  File \"/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py\", line 480, in readline\n    data = self._sock.recv(self._rbufsize)\n", "expected": "OK"}
94:27.64 INFO Closing logging queue
94:27.65 INFO queue closed
94:39.74 INFO Starting runner
94:40.93 TEST_START: /webrtc/RTCRtpParameters-rtcp.html
94:41.62 TEST_END: Test OK. Subtests passed 0/2. Unexpected 2
TIMEOUT /webrtc/RTCRtp    data = self._sock.recv(self._rbufsize)
Parameters-rtcp.html - TestRunner hit external timeout (this may indicate a hang)FAIL setParameters() with modified rtcp.cname should reject with InvalidModificationError - assert_equals: Expect dictionary.rtcp to be plain object expected "object" but got "undefined"

Two things stood out:

  1. ERROR test_end for /webrtc/RTCRtpParameters-rtcp.html logged while not in progress This seems to be mozlog called in:
    def _timeout(self):
    self.logger.info("Got timeout in harness")
    test = self.state.test
    self.test_ended(test,
    (test.result_cls("EXTERNAL-TIMEOUT",
    "TestRunner hit external timeout "
    "(this may indicate a hang)"), []))

    which is running in a different thread. Perhaps its logger state is wrong somehow?
  2. The message of the first timeout already says Executor hit external timeout. I'm not sure why Executor does not kill the browser and sends test_ended.
Executor hit external timeout (this may indicate a hang)
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 311, in run_func
    self.result = True, self.func(self.protocol, self.url, self.timeout)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/wptrunner/wptrunner/executors/executorwebdriver.py", line 382, in do_testharness
    protocol.webdriver.url = url
  File "/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py", line 19, in inner
    return func(self, *args, **kwargs)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py", line 528, in url
    return self.send_session_command("POST", "url", body)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py", line 515, in send_session_command
    return self.send_command(method, url, body)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/client.py", line 470, in send_command
    session=self)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/transport.py", line 175, in send
    response = self._request(method, uri, payload, headers)
  File "/Users/runner/runners/2.160.1/work/1/s/tools/webdriver/webdriver/transport.py", line 191, in _request
    return self.connection.getresponse()
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1148, in getresponse
    response.begin()
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 448, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 404, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/local/Cellar/python@2/2.7.17/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)

cc @jgraham

@stephenmcgruer
Copy link
Contributor Author

Based on @Hexcles work in #20664, assigning this to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants