-
Notifications
You must be signed in to change notification settings - Fork 8
Initial stab at front end cache checks #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We might be able to use |
There are times where neither the kernel revision nor the compiler have changed since the last run, which means there is little point running the build because it is unlikely anything would change (except due to flakiness with GitHub Actions, more on that later). While tuxsuite does have compiler caching enabled, it still relies on spinning up all the build machines, letting the cache work, then firing off the boot tests, which can be expensive. By caching the compiler string and the kernel revision, it is possible to avoid even spinning up the tuxsuite jobs if nothing has changed. should_run.py takes care of this by exiting: * 0 if the build should run (something changed or there was no previous results) * 1 for any internal assertion failure * 2 if nothing changed from the previous run If there is any internal assertion failure, the cache step fails, failing the whole workflow, so that those situations can be properly dealt with. The script should never do this, but it is possible there are things I have not considered happening yet. To avoid contention with pushing and pulling, each should_run.py job gets its own branch. While this will result in a lot of branches, they should not cause too many issues because they will just contain the JSON file with the last run info. should_run.py does not try to account for flakiness on the GitHub Actions or TuxSuite side, so it is possible that a previous build will fail due to flakiness and not be retried on the next cron if nothing changes since then. To attempt to account for a situation where we re-run a known flaky build, the script gets out of the way when the GitHub Actions event is "workflow_dispatch", meaning a workflow was manually run. Additionally, if the previous run was only flaky during the QEMU boots (rather than during the TuxSuite stage), they can generally just be re-run right away, since the kernels do not need to be rebuilt. I do not think this will happen too often but if it does, we can try to come up with a better heuristic. Closes: ClangBuiltLinux#308 Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Right, this would save us the pull to check the image. As we discussed, that might result in cache misses compared to the compiler version string, as
If I am parsing this correctly, yes, I believe it is possible that the digest of the image changes while the compiler does not (think old We could use |
1570b7d
to
9af81c8
Compare
superseded by #664. @nathanchance safe to close? |
Yup! |
NOTE: This is currently based on #519 for ease of development, I will rebase this on main once that PR is merged.
There are times where neither the kernel revision nor the compiler have
changed since the last run, which means there is little point running the
build because it is unlikely anything would change (except due to
flakiness with GitHub Actions, more on that later). While tuxsuite does
have compiler caching enabled, it still relies on spinning up all the
build machines, letting the cache work, then firing off the boot tests,
which can be expensive.
By caching the compiler string and the kernel revision, it is possible
to avoid even spinning up the tuxsuite jobs if nothing has changed.
should_run.py
takes care of this by exiting:If there is any internal assertion failure, the cache step fails,
failing the whole workflow, so that those situations can be properly
dealt with. The script should never do this, but it is possible there
are things I have not considered happening yet.
To avoid contention with pushing and pulling, each
should_run.py
jobgets its own branch. While this will result in a lot of branches, they
should not cause too many issues because they will just contain the JSON
file with the last run info.
should_run.py
does not try to account for flakiness on the GitHubActions or TuxSuite side, so it is possible that a previous build will
fail due to flakiness and not be retried on the next cron if nothing
changes since then. To attempt to account for a situation where we
re-run a known flaky build, the script gets out of the way when the
GitHub Actions event is
workflow_dispatch
, meaning a workflow wasmanually run. Additionally, if the previous run was only flaky during
the QEMU boots (rather than during the TuxSuite stage), they can
generally just be re-run right away, since the kernels do not need to be
rebuilt. I do not think this will happen too often but if it does, we
can try to come up with a better heuristic.
Closes: #308