Azure pipelines looks unhealthy for epoch branches #21200

stephenmcgruer · 2020-01-15T18:20:45Z

I noticed because the stable runs check went red; Safari is missing for bd72596; https://wpt.fyi/runs?label=master&label=stable&max-count=100&product=chrome&product=firefox&product=safari

Looking at https://dev.azure.com/web-platform-tests/wpt/_build?definitionId=1&_a=summary&view=branches, that run failed but also runs on the three_hourly have been failing for a while.

The epochs/daily run failed with: ##[error]We stopped hearing from agent Azure Pipelines 5. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610

https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=40111&view=logs&j=e8032fe0-b3da-5a76-081d-ec1db1bea50b

The epochs/three_hourly runs also seem to fail with the same error, and some of them have a similar error for Edge jobs (##[error]We stopped hearing from agent w10c00000D. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610)

@mustjab can you have a look thanks?

The text was updated successfully, but these errors were encountered:

mustjab · 2020-01-15T19:44:40Z

@stephenmcgruer Looked at Windows VMs and they all appear to be in a working state. However, i do see a bunch of errors to connect to github:

https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=40180&view=logs&j=425415bf-d1d5-5063-08bf-7eadf688f647&t=07c0114d-1a67-59a4-150d-697f6f86a11d
https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=40163&view=logs&j=4702523a-cbac-514e-b996-f7fda8a0f1fe&t=8d2d87c9-4a83-5db4-aca0-083522e77c3c
fatal: unable to access 'https://github.com/web-platform-tests/wpt/': Failed to connect to github.com port 443: Timed out

stephenmcgruer · 2020-01-16T13:12:53Z

@mustjab I see that too, but I doubt it's a GitHub issue. Other branches on Azure aren't having the problem (epochs/daily looks fine, sampled a few random PR runs and they seem fine, ...) and I'm not aware of any other reports of GitHub issues.

Could there be a network problem with the machines assigned to epochs/three_hourly somehow?

mustjab · 2020-01-16T17:17:09Z

@stephenmcgruer You're correct, one of the VMs in Windows pool (w10c00000J) is having network issues and was the cause for most of the failures. I deleted that VM, so we should see future runs pass. If we find that there are other machines in the pool that have these issues, then i'll open a ticket with Azure and will try to re-create the VM pool again to see if we can workaround this issue.

mustjab · 2020-01-16T20:21:44Z

Looks like latest three_hourly run has passed.

stephenmcgruer · 2020-01-20T16:47:32Z

Yes, I think we're good. There were some 500s from wpt.fyi over the last few days, but that's probably on our end to start looking at. Thanks for your work here!

stephenmcgruer assigned mustjab Jan 15, 2020

stephenmcgruer closed this as completed Jan 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure pipelines looks unhealthy for epoch branches #21200

Azure pipelines looks unhealthy for epoch branches #21200

stephenmcgruer commented Jan 15, 2020

mustjab commented Jan 15, 2020

stephenmcgruer commented Jan 16, 2020

mustjab commented Jan 16, 2020

mustjab commented Jan 16, 2020

stephenmcgruer commented Jan 20, 2020

Azure pipelines looks unhealthy for epoch branches #21200

Azure pipelines looks unhealthy for epoch branches #21200

Comments

stephenmcgruer commented Jan 15, 2020

mustjab commented Jan 15, 2020

stephenmcgruer commented Jan 16, 2020

mustjab commented Jan 16, 2020

mustjab commented Jan 16, 2020

stephenmcgruer commented Jan 20, 2020