-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add artifact cache to track artifacts for faster restart #1632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add artifact cache to track artifacts for faster restart #1632
Conversation
I added an artifact cache (stored in ~/.skaffold/cache) which maps a hash of the artifact's dependencies : image name and image tag. I used kaniko's snapshotting library, which does exactly this for kaniko caching. Before building an artifact, we check the cache to see if an image with the same hash and image name has already been built. If it has, we skip rebuilding this image. If either the hash or the image name is different, skaffold will rebuild the image. The cache can be disabled with the --cache-artifacts=false flag. Future work could include optimization around retagging cached images (if we find the hash for an image, but the image name is different, perhaps we can try to retag the image instead of rebuilding).
Fixed linting errors. Also explicitly hid --azure-container-registry-config flag which is coming in through a dependency which uses pflag.String directly. See GoogleContainerTools/kaniko#445 for more details.
Slightly modified the hasher from kaniko to use with skaffold. Couldn't use kaniko's hasher because kaniko doesn't support building windows conatiners, and so the windows skaffold build was failing since syscall.Stat_T is used.
1. Cache is now a map of workspace-hash:image-ID When there is a cache hit, skaffold will check if an image already exists locally with the same ID. If it exists locally and not remotely, skaffold will retag and push the image, and then continue with the deploy. Also added a unit test to check this logic.
Codecov Report
@@ Coverage Diff @@
## master #1632 +/- ##
==========================================
+ Coverage 46.54% 46.76% +0.22%
==========================================
Files 125 126 +1
Lines 5657 5919 +262
==========================================
+ Hits 2633 2768 +135
- Misses 2751 2860 +109
- Partials 273 291 +18
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me. It doesn't seem to work for local builds. I get this error:
DEBU[0000] error getting id for gcr.io/k8s-skaffold/skaffold-example:d8651272889c8ba6206df06411238ec9bb12ed6962e7975dade1c6b5e4a1d633: a digest must contain exactly one '@' separator (e.g. registry/repository@digest) saw: , skipping caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it always pushed images that are already pushed
@dgageot could you provide the skaffold.yaml you're using? I haven't seen either of these issues yet, but I've only tested the getting-started and microservices examples |
As discussed, I had the same issue using minikube - we have a repro. |
Do we need to do another hash of the workspace here? The workspace hash is already calculated in the (git) tag |
and digest When using the cache with minikube, there were no image digests available since the images had not been pushed to and pulled from a registry. To get around this, I store image ID as well and search for a local match to both the digest and the ID. I also check if running against a local cluster, and only push cached images if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing to see this in the logs on the default loglevel contradicting messages:
...
Checking cache...
Found gcr.io/k8s-skaffold/leeroy-web locally...
gcr.io/k8s-skaffold/leeroy-web:b24f9a316d1640623810de5b822a49bdd99386897459bb9e3e967b816a3f5703 ready, skipping rebuild
Found gcr.io/k8s-skaffold/leeroy-app locally...
gcr.io/k8s-skaffold/leeroy-app:34ac91a8021972e0b025fe0c641ac21f0f2bbc2e3852be86b20626abdef0d758 ready, skipping rebuild
Starting build...
Found [minikube] context, using local docker daemon.
Build complete in 107.123µs
Retagging cached images...
...
We should make those logs conditional on whether there is any build happening or all artifacts are cached.
plus I added some nits.
I'm still testing on windows now...but the minikube issue did go away - it is really fast, I like it a lot!
Works on windows + GKE |
Build logs should only appear if there is anything to build now, and similarly cache retag logs should only appear if there is anything to retag! |
pkg/skaffold/build/cache.go
Outdated
} | ||
hashTag := fmt.Sprintf("%s:%s", a.ImageName, hash) | ||
// Check if tagged image exists remotely with the same digest | ||
existsRemotely := imageExistsRemotely(hashTag, imageDetails.Digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks to me like this is being called for each artifact. with n artifacts defined in the skaffold.yaml, this results in n calls to remote.Image()
, and possibly an image push if we're on a remote cluster, which can take quite a while (especially with a slow network connection).
I think we should at the very least retrieve each artifact in parallel so these calls aren't blocking each other, but I also think that we should rethink this logic. if users are building their images locally we probably shouldn't be checking a remote registry at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've changed this logic to only check for a remote image if using a remote cluster.
Parallelizing this could definitely make it faster, but I do think we need to do it because it can save a retag/push later on if the image already exists.
Also, only check if image exists remotely when using a remote cluster.
086c8d3
to
f90e0b4
Compare
a7306b3
to
a683578
Compare
Get details about an artifact first (if it needs to be rebuilt, retagged or pushed), and then execute. Also added a unit test for this.
a683578
to
49c70bc
Compare
b2a7a52
to
dc584cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@priyawadhwa Here's some feedback:
- I find the logs in case of a cache hit too verbose. We say "retagging and pushing if necessary". Could we print that only if it's necessary?
- Also, I think a cache hit should be in blue.
- Checking cache sometimes takes a few seconds. Do we list images? Could we not list all images and lookup the image by digest or imageID or tag?
- When there's a cache hit, we miss the build timings. It would be nice to see that it's actually faster than without the cache.
- It seems to take some time after the
Resolved ..., skipping rebuild
message. What is it doing? - Reading the logs, it seems Skaffold computes the list of dependencies twice on a
skaffold build
. Is that expected? - When changing the name of the artifact's image, Skaffold thinks the image is already pushed. But it's not.
@dgageot -- updated the logging, colors, and added timing to the cache.
We do have to list all images, AFAICT there isn't a filter for image ID and the digest filter isn't reliable 👎 I'll keep looking for a faster way, but right now I think this is the best option.
If it's taking some time, it's probably moved on to checking if the next artifact is the cache.
Yes, since we need to get the dependencies for an artifact to determine its associated hash.
I haven't been able to repro this, could you provide more details about your build? |
@priyawadhwa It seems to work better now. Maybe I did something wrong or maybe it's because I have fewer images. Could you rebase so that I can run another round of tests? Thanks! |
@dgageot done! |
Old PR for this: #1532
I added an artifact cache, which stores artifacts at ~/.skaffold/cache in the format [workspace hash : image digest]
Before building an artifact, skaffold checks if the associated workspace hash is in the cache. If it isn't, the artifact will be rebuilt. If it is, then skaffold will check if the image
imageName:workspaceHash
(with the same digest in the cache) exists. If it does, skaffold will push if it doesn't exist remotely.If
imageName:workspaceHash
doesn't exist locally, skaffold will see if another image with the same digest exists locally. If it finds one, it will retag that image asimageName:workspaceHash
and push it.TODO (I'll probably implement this in another PR):
imageName:workspaceHash
format, which will add that image to the local cache file. This way, if one dev builds the image another dev can easily add it to their cache.