Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Taskcluster] Switch to repeat-only for stability checks #25149

Merged
merged 1 commit into from Sep 25, 2020

Conversation

stephenmcgruer
Copy link
Contributor

We frequently hear from test authors that they find it frustrating when
the stability checks turn up failures that they cannot reproduce. One
common case comes from the fact that stability checks run the same test
repeatedly without restarting the browser (the AAABBBCCC behavior). This
is unlike any other way we run tests, and can cause some tests to
consistently appear flaky due to global state (e.g. the
origin-isolation/ tests).

To fix this, switch the stability checks to only use 'repeat-restart'
flake detection (previously we used both 'repeat-loop' and
'repeat-restart'). This mode run the tests in entire sets, then restarts
the browser and runs them again, aka ABC[restart]ABC[restart]ABC. The
hope is that we will not lose too much flake coverage, but will reduce
the amount of non-addressable flake that is reported.

This also makes it more feasible to implement a timeout-avoiding
mechanism for the stability checks; see
https://docs.google.com/document/d/1dAlCSHUQldtgWDDTrGJR-ksm19FZZ3k8ppqc5-kSwIk/edit#

@stephenmcgruer stephenmcgruer force-pushed the smcgruer/stability-checks-repeat-onky branch from 8543e39 to a1722bd Compare September 10, 2020 11:40
@wpt-pr-bot wpt-pr-bot temporarily deployed to wpt-preview-25149 September 10, 2020 11:46 Inactive
We frequently hear from test authors that they find it frustrating when
the stability checks turn up failures that they cannot reproduce. One
common case comes from the fact that stability checks run the same test
repeatedly without restarting the browser (the AAABBBCCC behavior). This
is unlike any other way we run tests, and can cause some tests to
consistently appear flaky due to global state (e.g.  the
origin-isolation/ tests).

To fix this, switch the stability checks to only use 'repeat-restart'
flake detection (previously we used both 'repeat-loop' and
'repeat-restart'). This mode run the tests in entire sets, then restarts
the browser and runs them again, aka ABC[restart]ABC[restart]ABC. The
hope is that we will not lose too much flake coverage, but will reduce
the amount of non-addressable flake that is reported.

This also makes it more feasible to implement a timeout-avoiding
mechanism for the stability checks; see
https://docs.google.com/document/d/1dAlCSHUQldtgWDDTrGJR-ksm19FZZ3k8ppqc5-kSwIk/edit#
@stephenmcgruer stephenmcgruer force-pushed the smcgruer/stability-checks-repeat-onky branch from a1722bd to d377806 Compare September 21, 2020 20:10
@stephenmcgruer stephenmcgruer marked this pull request as ready for review September 21, 2020 20:13
@stephenmcgruer stephenmcgruer changed the title [WIP] [Taskcluster] Switch to repeat-only for stability checks [Taskcluster] Switch to repeat-only for stability checks Sep 21, 2020
@Hexcles
Copy link
Member

Hexcles commented Sep 21, 2020

Shall we hack the "verify" logic instead of adding these flags?

@stephenmcgruer
Copy link
Contributor Author

It wouldn't be a hack; we could just set the default value for --verify-repeat-loop to be 0.

Really depends on how we want developers to use this. With verify, we've moved away from a lot of the defaults already (e.g. no chaos mode on FF), but we've so far done that by setting the flags rather than changing the defaults, which implies that we want any developer running ./wpt run --verify today to get e.g. chaos mode, etc.

I mean personally I could even see us dropping this flag entirely and just saying you have to wpt run --rerun=X directly to get loops.

Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that bare --verify is currently used in Firefox CI, so changing the meaning of that would be a breaking change for us (technically requiring an RFC &c.). Which isn't to say that we can't do it, but it definitely feels like the default is more likely to find problems, and the flags we're passing are compromises to make the CI work better.

@jgraham jgraham merged commit fdee354 into master Sep 25, 2020
@jgraham jgraham deleted the smcgruer/stability-checks-repeat-onky branch September 25, 2020 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants