Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure pipelines looks unhealthy for epoch branches #21200

Closed
stephenmcgruer opened this issue Jan 15, 2020 · 5 comments
Closed

Azure pipelines looks unhealthy for epoch branches #21200

stephenmcgruer opened this issue Jan 15, 2020 · 5 comments
Assignees

Comments

@stephenmcgruer
Copy link
Contributor

I noticed because the stable runs check went red; Safari is missing for bd72596; https://wpt.fyi/runs?label=master&label=stable&max-count=100&product=chrome&product=firefox&product=safari

Looking at https://dev.azure.com/web-platform-tests/wpt/_build?definitionId=1&_a=summary&view=branches, that run failed but also runs on the three_hourly have been failing for a while.

The epochs/daily run failed with: ##[error]We stopped hearing from agent Azure Pipelines 5. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610

https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=40111&view=logs&j=e8032fe0-b3da-5a76-081d-ec1db1bea50b

The epochs/three_hourly runs also seem to fail with the same error, and some of them have a similar error for Edge jobs (##[error]We stopped hearing from agent w10c00000D. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610)

@mustjab can you have a look thanks?

@mustjab
Copy link
Contributor

mustjab commented Jan 15, 2020

@stephenmcgruer
Copy link
Contributor Author

@mustjab I see that too, but I doubt it's a GitHub issue. Other branches on Azure aren't having the problem (epochs/daily looks fine, sampled a few random PR runs and they seem fine, ...) and I'm not aware of any other reports of GitHub issues.

Could there be a network problem with the machines assigned to epochs/three_hourly somehow?

@mustjab
Copy link
Contributor

mustjab commented Jan 16, 2020

@stephenmcgruer You're correct, one of the VMs in Windows pool (w10c00000J) is having network issues and was the cause for most of the failures. I deleted that VM, so we should see future runs pass. If we find that there are other machines in the pool that have these issues, then i'll open a ticket with Azure and will try to re-create the VM pool again to see if we can workaround this issue.

@mustjab
Copy link
Contributor

mustjab commented Jan 16, 2020

Looks like latest three_hourly run has passed.

@stephenmcgruer
Copy link
Contributor Author

Yes, I think we're good. There were some 500s from wpt.fyi over the last few days, but that's probably on our end to start looking at. Thanks for your work here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants