Newly introduced tests that are flaky: what to do?

### What is the problem you're trying to solve

It looks like some recently introduced tests are flaky.

Some of it is because the tests were written incorrectly (failure to EnsureContainerStarted after a `run -d` - too short delays failing to account for slow environments - etc).
Some of it is because of bugs in the code (mostly TOCTOU on filesystem or other concurrency issues).
Overall, these may escape pure code review ^.

This is of course a problem, as these things will then require follow-on work to fix them (which no one wants to do, because, it is objectively painful and ungrateful grunt work :-)), and meanwhile contribute to growing CI instability, further preventing clean testing for new code, making it more likely to further merge more problematic tests, etc.

Of course it is tempting for maintainers to just rerun failed pipelines in all cases until they pass, so that certain PRs that are OK with code review can be merged.

But then, this is just making the problem worse unfortunately...

So, what can we do?

### Describe the solution you'd like

Suggesting that we adopt a new (informal) policy for PRs: if a run fails, make sure the failure is looked into and documented (instead of being ignored + re-run).

Authors can be encouraged to identify the failure themselves, or a reviewer can do that.

There is a (hopefully) comprehensive list of known test failures in #4120 that can be used as a quick reference.

If the failure is already known (hence likely unrelated to the PR):
- systematically add a comment saying so and linking to the ticket (<- that helps keeping tallies)
- _then_ rerun the pipeline

If the failure is _not_ known, suggesting that we _do not_  rerun the pipeline.
There is a high probability that the issue is coming from the PR, and (especially if it is flaky) it should not be ran again until the root cause is pinpointed.
Such a situation should either prevent merging the PR and/or requiring more scrutiny.

I am always happy to help with that when I see them (eg: like in https://github.com/containerd/nerdctl/pull/4368#issuecomment-3001311846).

What do you folks think?

### Additional context

Alternatively, maybe we could consider building some form of reporting from the pipeline, that would automatically add a comment on the PR with the name of the tests that failed (or ideally automatically retrieve a link to the documented failure and post it as well).

Not quite sure how feasible ^, but that would go a long way in surfacing these problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Newly introduced tests that are flaky: what to do? #4370

What is the problem you're trying to solve

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Newly introduced tests that are flaky: what to do? #4370

Description

What is the problem you're trying to solve

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions