[Taskcluster] Switch to repeat-only for stability checks #25149

stephenmcgruer · 2020-08-20T14:00:58Z

We frequently hear from test authors that they find it frustrating when
the stability checks turn up failures that they cannot reproduce. One
common case comes from the fact that stability checks run the same test
repeatedly without restarting the browser (the AAABBBCCC behavior). This
is unlike any other way we run tests, and can cause some tests to
consistently appear flaky due to global state (e.g. the
origin-isolation/ tests).

To fix this, switch the stability checks to only use 'repeat-restart'
flake detection (previously we used both 'repeat-loop' and
'repeat-restart'). This mode run the tests in entire sets, then restarts
the browser and runs them again, aka ABC[restart]ABC[restart]ABC. The
hope is that we will not lose too much flake coverage, but will reduce
the amount of non-addressable flake that is reported.

This also makes it more feasible to implement a timeout-avoiding
mechanism for the stability checks; see
https://docs.google.com/document/d/1dAlCSHUQldtgWDDTrGJR-ksm19FZZ3k8ppqc5-kSwIk/edit#

We frequently hear from test authors that they find it frustrating when the stability checks turn up failures that they cannot reproduce. One common case comes from the fact that stability checks run the same test repeatedly without restarting the browser (the AAABBBCCC behavior). This is unlike any other way we run tests, and can cause some tests to consistently appear flaky due to global state (e.g. the origin-isolation/ tests). To fix this, switch the stability checks to only use 'repeat-restart' flake detection (previously we used both 'repeat-loop' and 'repeat-restart'). This mode run the tests in entire sets, then restarts the browser and runs them again, aka ABC[restart]ABC[restart]ABC. The hope is that we will not lose too much flake coverage, but will reduce the amount of non-addressable flake that is reported. This also makes it more feasible to implement a timeout-avoiding mechanism for the stability checks; see https://docs.google.com/document/d/1dAlCSHUQldtgWDDTrGJR-ksm19FZZ3k8ppqc5-kSwIk/edit#

Hexcles · 2020-09-21T20:19:28Z

Shall we hack the "verify" logic instead of adding these flags?

stephenmcgruer · 2020-09-21T20:23:25Z

It wouldn't be a hack; we could just set the default value for --verify-repeat-loop to be 0.

Really depends on how we want developers to use this. With verify, we've moved away from a lot of the defaults already (e.g. no chaos mode on FF), but we've so far done that by setting the flags rather than changing the defaults, which implies that we want any developer running ./wpt run --verify today to get e.g. chaos mode, etc.

I mean personally I could even see us dropping this flag entirely and just saying you have to wpt run --rerun=X directly to get loops.

jgraham

Note that bare --verify is currently used in Firefox CI, so changing the meaning of that would be a breaking change for us (technically requiring an RFC &c.). Which isn't to say that we can't do it, but it definitely feels like the default is more likely to find problems, and the flags we're passing are compromises to make the CI work better.

stephenmcgruer force-pushed the smcgruer/stability-checks-repeat-onky branch from 8543e39 to a1722bd Compare September 10, 2020 11:40

wpt-pr-bot temporarily deployed to wpt-preview-25149 September 10, 2020 11:46 Inactive

stephenmcgruer force-pushed the smcgruer/stability-checks-repeat-onky branch from a1722bd to d377806 Compare September 21, 2020 20:10

stephenmcgruer marked this pull request as ready for review September 21, 2020 20:13

wpt-pr-bot added ci infra labels Sep 21, 2020

wpt-pr-bot assigned LukeZielinski Sep 21, 2020

wpt-pr-bot requested review from Hexcles, jgraham and LukeZielinski September 21, 2020 20:13

stephenmcgruer assigned jgraham and unassigned LukeZielinski Sep 21, 2020

stephenmcgruer changed the title ~~[WIP] [Taskcluster] Switch to repeat-only for stability checks~~ [Taskcluster] Switch to repeat-only for stability checks Sep 21, 2020

wpt-pr-bot deployed to wpt-preview-25149 September 21, 2020 20:17 View deployment

jgraham approved these changes Sep 25, 2020

View reviewed changes

jgraham merged commit fdee354 into master Sep 25, 2020

jgraham deleted the smcgruer/stability-checks-repeat-onky branch September 25, 2020 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Taskcluster] Switch to repeat-only for stability checks #25149

[Taskcluster] Switch to repeat-only for stability checks #25149

stephenmcgruer commented Aug 20, 2020

Hexcles commented Sep 21, 2020

stephenmcgruer commented Sep 21, 2020

jgraham left a comment

[Taskcluster] Switch to repeat-only for stability checks #25149

[Taskcluster] Switch to repeat-only for stability checks #25149

Conversation

stephenmcgruer commented Aug 20, 2020

Hexcles commented Sep 21, 2020

stephenmcgruer commented Sep 21, 2020

jgraham left a comment

Choose a reason for hiding this comment