Skip to content

Newly introduced tests that are flaky: what to do? #4370

Open
@apostasie

Description

@apostasie

What is the problem you're trying to solve

It looks like some recently introduced tests are flaky.

Some of it is because the tests were written incorrectly (failure to EnsureContainerStarted after a run -d - too short delays failing to account for slow environments - etc).
Some of it is because of bugs in the code (mostly TOCTOU on filesystem or other concurrency issues).
Overall, these may escape pure code review ^.

This is of course a problem, as these things will then require follow-on work to fix them (which no one wants to do, because, it is objectively painful and ungrateful grunt work :-)), and meanwhile contribute to growing CI instability, further preventing clean testing for new code, making it more likely to further merge more problematic tests, etc.

Of course it is tempting for maintainers to just rerun failed pipelines in all cases until they pass, so that certain PRs that are OK with code review can be merged.

But then, this is just making the problem worse unfortunately...

So, what can we do?

Describe the solution you'd like

Suggesting that we adopt a new (informal) policy for PRs: if a run fails, make sure the failure is looked into and documented (instead of being ignored + re-run).

Authors can be encouraged to identify the failure themselves, or a reviewer can do that.

There is a (hopefully) comprehensive list of known test failures in #4120 that can be used as a quick reference.

If the failure is already known (hence likely unrelated to the PR):

  • systematically add a comment saying so and linking to the ticket (<- that helps keeping tallies)
  • then rerun the pipeline

If the failure is not known, suggesting that we do not rerun the pipeline.
There is a high probability that the issue is coming from the PR, and (especially if it is flaky) it should not be ran again until the root cause is pinpointed.
Such a situation should either prevent merging the PR and/or requiring more scrutiny.

I am always happy to help with that when I see them (eg: like in #4368 (comment)).

What do you folks think?

Additional context

Alternatively, maybe we could consider building some form of reporting from the pipeline, that would automatically add a comment on the PR with the name of the tests that failed (or ideally automatically retrieve a link to the documented failure and post it as well).

Not quite sure how feasible ^, but that would go a long way in surfacing these problems.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions