Initial stab at front end cache checks #522

nathanchance · 2023-02-26T00:29:44Z

NOTE: This is currently based on #519 for ease of development, I will rebase this on main once that PR is merged.

There are times where neither the kernel revision nor the compiler have
changed since the last run, which means there is little point running the
build because it is unlikely anything would change (except due to
flakiness with GitHub Actions, more on that later). While tuxsuite does
have compiler caching enabled, it still relies on spinning up all the
build machines, letting the cache work, then firing off the boot tests,
which can be expensive.

By caching the compiler string and the kernel revision, it is possible
to avoid even spinning up the tuxsuite jobs if nothing has changed.
should_run.py takes care of this by exiting:

* 0 if the build should run (something changed or there was no
  previous results)
* 1 for any internal assertion failure
* 2 if nothing changed from the previous run

If there is any internal assertion failure, the cache step fails,
failing the whole workflow, so that those situations can be properly
dealt with. The script should never do this, but it is possible there
are things I have not considered happening yet.

To avoid contention with pushing and pulling, each should_run.py job
gets its own branch. While this will result in a lot of branches, they
should not cause too many issues because they will just contain the JSON
file with the last run info.

should_run.py does not try to account for flakiness on the GitHub
Actions or TuxSuite side, so it is possible that a previous build will
fail due to flakiness and not be retried on the next cron if nothing
changes since then. To attempt to account for a situation where we
re-run a known flaky build, the script gets out of the way when the
GitHub Actions event is workflow_dispatch, meaning a workflow was
manually run. Additionally, if the previous run was only flaky during
the QEMU boots (rather than during the TuxSuite stage), they can
generally just be re-run right away, since the kernels do not need to be
rebuilt. I do not think this will happen too often but if it does, we
can try to come up with a better heuristic.

Closes: #308

nickdesaulniers · 2023-02-27T21:38:21Z

We might be able to use docker manifest inspect tuxmake/clang-nightly to check whether the docker container has been updated. If it has, we'll have to see if tuxmake is putting the same executable in docker (hopefully not).

There are times where neither the kernel revision nor the compiler have changed since the last run, which means there is little point running the build because it is unlikely anything would change (except due to flakiness with GitHub Actions, more on that later). While tuxsuite does have compiler caching enabled, it still relies on spinning up all the build machines, letting the cache work, then firing off the boot tests, which can be expensive. By caching the compiler string and the kernel revision, it is possible to avoid even spinning up the tuxsuite jobs if nothing has changed. should_run.py takes care of this by exiting: * 0 if the build should run (something changed or there was no previous results) * 1 for any internal assertion failure * 2 if nothing changed from the previous run If there is any internal assertion failure, the cache step fails, failing the whole workflow, so that those situations can be properly dealt with. The script should never do this, but it is possible there are things I have not considered happening yet. To avoid contention with pushing and pulling, each should_run.py job gets its own branch. While this will result in a lot of branches, they should not cause too many issues because they will just contain the JSON file with the last run info. should_run.py does not try to account for flakiness on the GitHub Actions or TuxSuite side, so it is possible that a previous build will fail due to flakiness and not be retried on the next cron if nothing changes since then. To attempt to account for a situation where we re-run a known flaky build, the script gets out of the way when the GitHub Actions event is "workflow_dispatch", meaning a workflow was manually run. Additionally, if the previous run was only flaky during the QEMU boots (rather than during the TuxSuite stage), they can generally just be re-run right away, since the kernels do not need to be rebuilt. I do not think this will happen too often but if it does, we can try to come up with a better heuristic. Closes: ClangBuiltLinux#308 Signed-off-by: Nathan Chancellor <[email protected]>

Signed-off-by: Nathan Chancellor <[email protected]>

nathanchance · 2023-02-28T00:30:20Z

We might be able to use docker manifest inspect tuxmake/clang-nightly to check whether the docker container has been updated.

Right, this would save us the pull to check the image. As we discussed, that might result in cache misses compared to the compiler version string, as tuxmake rebuilds the images monthly, meaning the digest will change even if the compiler version has not. I think that is okay for a potentially simpler implementation.

If it has, we'll have to see if tuxmake is putting the same executable in docker (hopefully not).

If I am parsing this correctly, yes, I believe it is possible that the digest of the image changes while the compiler does not (think old apt.llvm.org versions that are not seeing anymore updates).

We could use docker manifest inspect to see if the image has changed then pull the image and check its compiler if it has?

JustinStitt · 2023-12-20T22:29:33Z

superseded by #664.

@nathanchance safe to close?

nathanchance · 2023-12-20T22:30:10Z

Yup!

nathanchance requested a review from nickdesaulniers as a code owner February 26, 2023 00:29

nathanchance mentioned this pull request Feb 26, 2023

front end caching #308

Closed

7 tasks

nathanchance added 2 commits February 27, 2023 17:20

ci: Regenerate GitHub Actions workflow and TuxSuite files

9af81c8

Signed-off-by: Nathan Chancellor <[email protected]>

nathanchance force-pushed the frontend-caching branch from 1570b7d to 9af81c8 Compare February 28, 2023 00:30

nickdesaulniers mentioned this pull request Feb 28, 2023

Update build frequencies #526

Merged

JustinStitt mentioned this pull request Dec 2, 2023

CI Caching #664

Merged

nathanchance requested review from JustinStitt and bwendling as code owners December 19, 2023 22:44

nathanchance force-pushed the main branch from 6368978 to a37fd93 Compare December 19, 2023 22:46

nathanchance closed this Dec 20, 2023

nathanchance deleted the frontend-caching branch December 5, 2024 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial stab at front end cache checks #522

Initial stab at front end cache checks #522

Uh oh!

nathanchance commented Feb 26, 2023

Uh oh!

nickdesaulniers commented Feb 27, 2023

Uh oh!

nathanchance commented Feb 28, 2023

Uh oh!

JustinStitt commented Dec 20, 2023

Uh oh!

nathanchance commented Dec 20, 2023

Uh oh!

Uh oh!

Initial stab at front end cache checks #522

Initial stab at front end cache checks #522

Uh oh!

Conversation

nathanchance commented Feb 26, 2023

Uh oh!

nickdesaulniers commented Feb 27, 2023

Uh oh!

nathanchance commented Feb 28, 2023

Uh oh!

JustinStitt commented Dec 20, 2023

Uh oh!

nathanchance commented Dec 20, 2023

Uh oh!

Uh oh!