feat: implement flag to fail flaky tests #30618

Joe-Hendley · 2024-05-01T09:24:02Z

Implements feature requested in #30457

The test runner treats flaky tests as failures when the flag is enabled, but still reports flaky tests as flaky in the reporting interface. It feels like something worth discussing as this behaviour makes sense to me, but looked a bit odd to @BJSS-russell-pollock when I ran this past him.

Closes #30457.

Joe-Hendley · 2024-05-01T09:26:44Z

@microsoft-github-policy-service agree company="BJSS"

BJSS-russell-pollock · 2024-05-01T10:49:09Z

Talking point of the failing on flaky behaviour, is about the reporter not indicating that the suite failed because of the flaky test. The exit code is 1 but the reporter doesn't look like it.

To try this out, I set the the playwright.config to:

retries: process.env.CI ? 2 : 5,
failOnFlakyTests: true,

And then the test:

test('flaky test', async () => {

  const randomFail = Math.round(Math.random());
  expect(randomFail).toBe(0)
});

When running the test through the CLI - the retry would pass and then mark the test as flaky:

dgozman

Thank you for the PR!

I'd like to understand the feature a little better. Do we want to stop recognizing flaky tests at all? Are we after the test runner exit code? Or some changes in the terminal/html report?

Looking at the issue, my initial understanding is "introduce --fail-on-flaky cli option that will set exitCode=1 when at least one test was flaky". However, this PR introduces many more changes, so let's align on the intended behavior first.

Joe-Hendley · 2024-05-01T20:07:05Z

Hi there,

The intended behaviour from this PR, and my understanding of the issue matches yours - any changes in the code extra to that reflect my unfamiliarity with the codebase.

I think Russell's comment / the discussion around behaviour reflects the user / manual tester perspective where the change in exit code isn't immediately visible.

dgozman · 2024-05-02T15:25:25Z

The intended behaviour from this PR, and my understanding of the issue matches yours - any changes in the code extra to that reflect my unfamiliarity with the codebase.

Sounds great! In this case, I think we should plumb the new CLI option similar to --pass-with-no-tests. Search for passWithNoTests across the codebase for the reference. And this is a place that determines whether everything was successful or something has failed, and it needs to be tweaked.

@BJSS-russell-pollock I think this should be clear since reporter shows the "Flaky" count as non-zero. I'd assume anyone using this option will know that flakiness means "failed test run", because the behavior was opt in. Let me know what you think.

BJSS-russell-pollock · 2024-05-02T15:34:13Z

@BJSS-russell-pollock I think this should be clear since reporter shows the "Flaky" count as non-zero. I'd assume anyone using this option will know that flakiness means "failed test run", because the behavior was opt in. Let me know what you think.

Absolutely makes sense.

Joe-Hendley · 2024-05-02T16:17:35Z

Sounds great! In this case, I think we should plumb the new CLI option similar to --pass-with-no-tests. Search for passWithNoTests across the codebase for the reference. And this is a place that determines whether everything was successful or something has failed, and it needs to be tweaked.

Excellent - I'll have a look and update it accordingly.

Joe-Hendley · 2024-05-09T13:42:41Z

Okay, reworked as suggested - much less additional code!

Joe-Hendley · 2024-05-15T08:52:30Z

@dgozman This should be ok for re-review when you're ready - the failed tests in previous pipeline runs look like noise to me, but happy to dig if you think it's deeper

github-actions · 2024-05-15T09:17:13Z

Test results for "tests 1"

1 failed
❌ [playwright-test] › playwright.ct-react.spec.ts:253:5 › should pass "key" attribute from JSX in variable

2 flaky

⚠️ [playwright-test] › ui-mode-test-screencast.spec.ts:21:5 › should show screenshots
⚠️ [playwright-test] › ui-mode-test-watch.spec.ts:96:5 › should batch watch updates

27346 passed, 662 skipped
✔️✔️✔️

Merge workflow run.

dgozman

Looks great, merging in. Thank you for the PR!