Skip to content

Add artifact cache to track artifacts for faster restart #1632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Feb 28, 2019

Conversation

priyawadhwa
Copy link
Contributor

Old PR for this: #1532

I added an artifact cache, which stores artifacts at ~/.skaffold/cache in the format [workspace hash : image digest]

Before building an artifact, skaffold checks if the associated workspace hash is in the cache. If it isn't, the artifact will be rebuilt. If it is, then skaffold will check if the image imageName:workspaceHash (with the same digest in the cache) exists. If it does, skaffold will push if it doesn't exist remotely.

If imageName:workspaceHash doesn't exist locally, skaffold will see if another image with the same digest exists locally. If it finds one, it will retag that image as imageName:workspaceHash and push it.

TODO (I'll probably implement this in another PR):

  • Add "skaffold cache" command, where a user can specify an image in the imageName:workspaceHash format, which will add that image to the local cache file. This way, if one dev builds the image another dev can easily add it to their cache.

Priya Wadhwa added 9 commits January 24, 2019 10:23
I added an artifact cache (stored in ~/.skaffold/cache) which maps a
hash of the artifact's dependencies : image name and image tag. I used
kaniko's snapshotting library, which does exactly this for kaniko
caching.

Before building an artifact, we check the cache to see if an image with
the same hash and image name has already been built. If it has, we skip
rebuilding this image. If either the hash or the image name is
different, skaffold will rebuild the image.

The cache can be disabled with the --cache-artifacts=false flag.

Future work could include optimization around retagging cached images
(if we find the hash for an image, but the image name is different,
perhaps we can try to retag the image instead of rebuilding).
Fixed linting errors. Also explicitly hid
--azure-container-registry-config flag which is coming in through a
dependency which uses pflag.String directly. See
GoogleContainerTools/kaniko#445 for more
details.
Slightly modified the hasher from kaniko to use with skaffold. Couldn't
use kaniko's hasher because kaniko doesn't support building windows
conatiners, and so the windows skaffold build was failing since
syscall.Stat_T is used.
1. Cache is now a map of workspace-hash:image-ID

When there is a cache hit, skaffold will check if an image already
exists locally with the same ID. If it exists locally and not remotely,
skaffold will retag and push the image, and then continue with the deploy.

Also added a unit test to check this logic.
@codecov-io
Copy link

codecov-io commented Feb 8, 2019

Codecov Report

Merging #1632 into master will increase coverage by 0.22%.
The diff coverage is 47.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1632      +/-   ##
==========================================
+ Coverage   46.54%   46.76%   +0.22%     
==========================================
  Files         125      126       +1     
  Lines        5657     5919     +262     
==========================================
+ Hits         2633     2768     +135     
- Misses       2751     2860     +109     
- Partials      273      291      +18
Impacted Files Coverage Δ
pkg/skaffold/config/options.go 88.23% <ø> (ø) ⬆️
pkg/skaffold/util/util.go 44.13% <0%> (-1.91%) ⬇️
cmd/skaffold/app/cmd/cmd.go 0% <0%> (ø) ⬆️
pkg/skaffold/build/cache.go 43.39% <43.39%> (ø)
pkg/skaffold/runner/timings.go 16.66% <71.42%> (+1.28%) ⬆️
pkg/skaffold/runner/runner.go 60.4% <75%> (+0.19%) ⬆️
pkg/skaffold/docker/image.go 58.38% <75.75%> (+6.95%) ⬆️
pkg/skaffold/docker/client.go 71.56% <0%> (+8.82%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3039001...5ad0980. Read the comment docs.

Copy link
Contributor

@dgageot dgageot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. It doesn't seem to work for local builds. I get this error:

DEBU[0000] error getting id for gcr.io/k8s-skaffold/skaffold-example:d8651272889c8ba6206df06411238ec9bb12ed6962e7975dade1c6b5e4a1d633: a digest must contain exactly one '@' separator (e.g. registry/repository@digest) saw: , skipping caching

Copy link
Contributor

@dgageot dgageot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it always pushed images that are already pushed

@priyawadhwa
Copy link
Contributor Author

@dgageot could you provide the skaffold.yaml you're using? I haven't seen either of these issues yet, but I've only tested the getting-started and microservices examples

@balopat
Copy link
Contributor

balopat commented Feb 11, 2019

As discussed, I had the same issue using minikube - we have a repro.

@r2d4
Copy link
Contributor

r2d4 commented Feb 12, 2019

Do we need to do another hash of the workspace here? The workspace hash is already calculated in the (git) tag

Priya Wadhwa added 3 commits February 11, 2019 17:14
and digest

When using the cache with minikube, there were no image digests
available since the images had not been pushed to and pulled from a
registry. To get around this, I store image ID as well and search for a
local match to both the digest and the ID.

I also check if running against a local cluster, and only push cached
images if not.
@priyawadhwa
Copy link
Contributor Author

priyawadhwa commented Feb 12, 2019

@r2d4 it's actually a hash of artifact dependencies only (updated the comments in the code to say that)!

cc @balopat @dgageot I think this should work w/ minikube now

Copy link
Contributor

@balopat balopat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing to see this in the logs on the default loglevel contradicting messages:

...
Checking cache...
Found gcr.io/k8s-skaffold/leeroy-web locally...
gcr.io/k8s-skaffold/leeroy-web:b24f9a316d1640623810de5b822a49bdd99386897459bb9e3e967b816a3f5703 ready, skipping rebuild
Found gcr.io/k8s-skaffold/leeroy-app locally...
gcr.io/k8s-skaffold/leeroy-app:34ac91a8021972e0b025fe0c641ac21f0f2bbc2e3852be86b20626abdef0d758 ready, skipping rebuild
Starting build...
Found [minikube] context, using local docker daemon.
Build complete in 107.123µs
Retagging cached images...
...

We should make those logs conditional on whether there is any build happening or all artifacts are cached.
plus I added some nits.
I'm still testing on windows now...but the minikube issue did go away - it is really fast, I like it a lot!

@balopat
Copy link
Contributor

balopat commented Feb 13, 2019

Works on windows + GKE

@priyawadhwa
Copy link
Contributor Author

Build logs should only appear if there is anything to build now, and similarly cache retag logs should only appear if there is anything to retag!

}
hashTag := fmt.Sprintf("%s:%s", a.ImageName, hash)
// Check if tagged image exists remotely with the same digest
existsRemotely := imageExistsRemotely(hashTag, imageDetails.Digest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks to me like this is being called for each artifact. with n artifacts defined in the skaffold.yaml, this results in n calls to remote.Image(), and possibly an image push if we're on a remote cluster, which can take quite a while (especially with a slow network connection).

I think we should at the very least retrieve each artifact in parallel so these calls aren't blocking each other, but I also think that we should rethink this logic. if users are building their images locally we probably shouldn't be checking a remote registry at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've changed this logic to only check for a remote image if using a remote cluster.

Parallelizing this could definitely make it faster, but I do think we need to do it because it can save a retag/push later on if the image already exists.

Also, only check if image exists remotely when using a remote cluster.
Get details about an artifact first (if it needs to be rebuilt, retagged
or pushed), and then execute. Also added a unit test for this.
Copy link
Contributor

@dgageot dgageot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyawadhwa Here's some feedback:

  • I find the logs in case of a cache hit too verbose. We say "retagging and pushing if necessary". Could we print that only if it's necessary?
  • Also, I think a cache hit should be in blue.
  • Checking cache sometimes takes a few seconds. Do we list images? Could we not list all images and lookup the image by digest or imageID or tag?
  • When there's a cache hit, we miss the build timings. It would be nice to see that it's actually faster than without the cache.
  • It seems to take some time after the Resolved ..., skipping rebuild message. What is it doing?
  • Reading the logs, it seems Skaffold computes the list of dependencies twice on a skaffold build. Is that expected?
  • When changing the name of the artifact's image, Skaffold thinks the image is already pushed. But it's not.

@priyawadhwa
Copy link
Contributor Author

@dgageot -- updated the logging, colors, and added timing to the cache.

Checking cache sometimes takes a few seconds. Do we list images? Could we not list all images and lookup the image by digest or imageID or tag?

We do have to list all images, AFAICT there isn't a filter for image ID and the digest filter isn't reliable 👎 I'll keep looking for a faster way, but right now I think this is the best option.

It seems to take some time after the Resolved ..., skipping rebuild message. What is it doing?

If it's taking some time, it's probably moved on to checking if the next artifact is the cache.

Reading the logs, it seems Skaffold computes the list of dependencies twice on a skaffold build. Is that expected?

Yes, since we need to get the dependencies for an artifact to determine its associated hash.

When changing the name of the artifact's image, Skaffold thinks the image is already pushed. But it's not.

I haven't been able to repro this, could you provide more details about your build?

@dgageot
Copy link
Contributor

dgageot commented Feb 25, 2019

@priyawadhwa It seems to work better now. Maybe I did something wrong or maybe it's because I have fewer images. Could you rebase so that I can run another round of tests? Thanks!

@priyawadhwa
Copy link
Contributor Author

@dgageot done!

@priyawadhwa priyawadhwa requested a review from balopat February 27, 2019 18:29
@dgageot dgageot dismissed balopat’s stale review February 28, 2019 16:13

We agreed to merge the PR

@dgageot dgageot merged commit 7792e90 into GoogleContainerTools:master Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants