Skip to content

Skaffold not waiting for deployments to stabilize #5966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ananyasaxena opened this issue Jun 7, 2021 · 24 comments · Fixed by #8236
Closed

Skaffold not waiting for deployments to stabilize #5966

ananyasaxena opened this issue Jun 7, 2021 · 24 comments · Fixed by #8236
Assignees
Labels
area/status-check kind/regression needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug priority/p1 High impact feature/bug.
Milestone

Comments

@ananyasaxena
Copy link

ananyasaxena commented Jun 7, 2021

Note: The Skaffold team are unable to reproduce this issue. If you see this issue, please attach a trace from running with -vtrace or provide a sample repository.**

Expected behavior & Actual behavior

On version 1.23.0, skaffold would wait for deployments to stabilize correctly

Deployments stabilized in 2 minutes 43.697 seconds

After v1.24.0, it doesn't seem to be waiting for deployments to stabilize

Waiting for deployments to stabilize...
Deployments stabilized in 16.720773ms

Information

  • Skaffold version: v1.24.0
  • Operating system: linux on CI
  • Installed via: skaffold.dev
  • Contents of skaffold.yaml:
apiVersion: skaffold/v2alpha1
kind: Config
metadata:
  name: spending-api
build:
  tagPolicy:
    gitCommit:
      variant: AbbrevCommitSha
  artifacts:
    - image: XXXX
      docker:
        dockerfile: XXXX
    - image: XXXX
      docker:
        dockerfile: XXXX
deploy:
  statusCheckDeadlineSeconds: 600
  kubectl:
    manifests:
      - k8s-manifest.yaml
@tejal29 tejal29 added priority/awaiting-more-evidence Lowest Priority. May be useful, but there is not yet enough supporting evidence. kind/regression labels Jun 8, 2021
@tejal29
Copy link
Contributor

tejal29 commented Jun 8, 2021

@ananyasaxena Can you please add some trace logs?
Could it be that your application stabilized quickly?

@tejal29 tejal29 self-assigned this Jun 8, 2021
@tejal29 tejal29 added the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Jun 8, 2021
@tejal29 tejal29 added this to the v1.27.0 milestone Jun 8, 2021
@ananyasaxena
Copy link
Author

ananyasaxena commented Jun 8, 2021

@tejal29

@ananyasaxena Can you please add some trace logs?


skaffold deploy --kubeconfig ~/.kube/config --kube-context stage  --build-artifacts skaffold-tags.json --verbosity trace
INFO[0000] Skaffold &{Version:v1.24.0 ConfigVersion:skaffold/v2beta16 GitVersion: GitCommit:XXXX BuildDate:2021-05-11T22:51:04Z GoVersion:go1.14.14 Compiler:gc Platform:linux/amd64 User:} 
INFO[0000] Loaded Skaffold defaults from "/home/circleci/.skaffold/config" 
DEBU[0000] config version out of date: upgrading to latest "skaffold/v2beta16" 
DEBU[0000] parsed 1 configs from configuration file /home/circleci/project/skaffold.yaml 
DEBU[0000] Defaulting build type to local build         
INFO[0000] Activated kube-context "stage"               
TRAC[0000] validating yamltags of struct SkaffoldConfig 
TRAC[0000] validating yamltags of struct Metadata       
TRAC[0000] validating yamltags of struct Pipeline       
TRAC[0000] validating yamltags of struct BuildConfig    
TRAC[0000] validating yamltags of struct Artifact       
TRAC[0000] validating yamltags of struct ArtifactType   
TRAC[0000] validating yamltags of struct DockerArtifact 
TRAC[0000] validating yamltags of struct Artifact       
TRAC[0000] validating yamltags of struct ArtifactType   
TRAC[0000] validating yamltags of struct DockerArtifact 
TRAC[0000] validating yamltags of struct TagPolicy      
TRAC[0000] validating yamltags of struct GitTagger      
TRAC[0000] validating yamltags of struct BuildType      
TRAC[0000] validating yamltags of struct LocalBuild     
TRAC[0000] validating yamltags of struct DeployConfig   
TRAC[0000] validating yamltags of struct DeployType     
TRAC[0000] validating yamltags of struct KubectlDeploy  
TRAC[0000] validating yamltags of struct KubectlFlags   
TRAC[0000] validating yamltags of struct LogsConfig     
INFO[0000] Using kubectl context: stage                 
DEBU[0000] Running command: [minikube version --output=json] 
TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH 
TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH 
DEBU[0000] setting Docker user agent to skaffold-v1.24.0 
DEBU[0000] Using builder: local                         
DEBU[0000] push value not present in NewBuilder, defaulting to true because cluster.PushImages is true 
INFO[0000] build concurrency first set to 1 parsed from *local.Builder[0] 
INFO[0000] final build concurrency value is 1           
Tags used in deployment:
 - spending-api-db-migrations -> ********************************************/XXXX
 - spending-api -> ********************************************/XXXX
DEBU[0000] push value not present in isImageLocal(), defaulting to true because cluster.PushImages is true 
DEBU[0000] push value not present in isImageLocal(), defaulting to true because cluster.PushImages is true 
DEBU[0000] getting client config for kubeContext: `stage` 
TRAC[0000] latest skaffold version: v1.25.0             
Starting deploy...
DEBU[0000] Running command: [kubectl version --client -ojson] 
DEBU[0000] Command output: [{
  "clientVersion": {
    "major": "1",
    "minor": "21",
    "gitVersion": "v1.21.1",
    "gitCommit": "XXXX",
    "gitTreeState": "clean",
    "buildDate": "2021-05-12T14:18:45Z",
    "goVersion": "go1.16.4",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}
] 
DEBU[0000] Running command: [kubectl --context stage --kubeconfig /home/circleci/.kube/config create --dry-run=client -oyaml -f /home/circleci/project/k8s-manifest.yaml] 
DEBU[0002] Command output: [apiVersion: v1
kind: Service
metadata:
...........
] 
DEBU[0002] manifests with tagged images: apiVersion: v1
kind: Service
......
DEBU[0002] Running command: [kubectl --context stage --kubeconfig /home/circleci/.kube/config get -f - --ignore-not-found -ojson] 
DEBU[0003] Command output: [{
    "apiVersion": "v1",
    "items": [
        {
.......
] 
DEBU[0003] 6 manifests to deploy. 6 are updated or new  
DEBU[0003] Running command: [kubectl --context stage --kubeconfig /home/circleci/.kube/config apply -f -] 
 - XXXX configured
 - XXXX configured
 - XXXX configured
 - XXXX configured
 - XXXX created
 - XXXX configured
INFO[0004] Deploy completed in 4.081 seconds            
Waiting for deployments to stabilize...
DEBU[0004] getting client config for kubeContext: `stage` 
Deployments stabilized in 19.143569ms
There is a new version (1.25.0) of Skaffold available. Download it from:
  https://github.com/GoogleContainerTools/skaffold/releases/tag/v1.25.0

Help improve Skaffold with our 2-minute anonymous survey: run 'skaffold survey'
To help improve the quality of this product, we collect anonymized usage data for details on what is tracked and how we use this data visit <https://skaffold.dev/docs/resources/telemetry/>. This data is handled in accordance with our privacy policy <https://policies.google.com/privacy>

You may choose to opt out of this collection by running the following command:
	skaffold config set --global collect-metrics false

Could it be that your application stabilized quickly?

Nope, I can see it taking time on the k8s dashboard and also using v1.23.0 waits correctly for it to stabilize

@gsquared94
Copy link
Contributor

@ananyasaxena is it possible to provide a small example to reproduce your issue? I think PR #6010 might help here, but I need a way to test it. Otherwise you can build my branch from source or wait for it to be merged to use the master branch build to verify if that fixes it.

@ananyasaxena
Copy link
Author

@gsquared94 I'll try building from your branch, if I run into any challenges I'll just wait for the merge and release to test this out and report back.

@gsquared94
Copy link
Contributor

gsquared94 commented Jun 15, 2021

@ananyasaxena my PR was merged so you can also try with the bleeding edge version (https://skaffold.dev/docs/install/)
For macOS:

curl -Lo skaffold https://storage.googleapis.com/skaffold/builds/latest/skaffold-darwin-amd64 && chmod +x skaffold && sudo mv skaffold /usr/local/bin

@tejal29
Copy link
Contributor

tejal29 commented Jun 22, 2021

@ananyasaxena is this a still an issue?

@ananyasaxena
Copy link
Author

@gsquared94 @tejal29 I used https://storage.googleapis.com/skaffold/builds/latest/skaffold-linux-amd64 but still ran into the same issue

Waiting for deployments to stabilize...
Deployments stabilized in 24.511647ms

@tejal29
Copy link
Contributor

tejal29 commented Jun 24, 2021

Thanks for confirming @ananyasaxena. I will look into this

@tejal29 tejal29 removed the priority/awaiting-more-evidence Lowest Priority. May be useful, but there is not yet enough supporting evidence. label Jun 24, 2021
@tejal29 tejal29 removed this from the v1.27.0 milestone Jul 16, 2021
@somnistudio
Copy link

somnistudio commented Jul 17, 2021

Same problem here

Waiting for deployments to stabilize... Deployments stabilized in 3.306427ms

after i upgrading from v1.27.0 to v1.28.0
On MacOS

@nkubala nkubala added priority/p1 High impact feature/bug. area/status-check labels Aug 2, 2021
@nkubala
Copy link
Contributor

nkubala commented Aug 2, 2021

@ananyasaxena @somnistudio we haven't been seeing this in any of our testing, so it seems like this might be specific to your project setups. would either of you be able/willing to provide a small sample project for us to reproduce this issue?

@cmdjulian
Copy link

cmdjulian commented Sep 7, 2021

I'm facing the same issue with v1.30.0 and a kustomize deployment.

apiVersion: skaffold/v2beta5
kind: Config
profiles:
  - name: dev-skaffold
    build:
      tagPolicy:
        sha256: { }
      artifacts:
        - image: someDevRegistryWithRepo
          buildpacks:
            builder: paketobuildpacks/builder:base
    deploy:
      kustomize:
        paths: [ k8s/overlays/dev-skaffold ]

  - name: dev-cluster
    deploy:
      kustomize:
        paths: [ k8s/overlays/dev-cluster ]
skaffold -p dev-cluster deploy --images=registry.gitlab.com/XXX --tag=0.7.0 --status-check

Which yields:

Tags used in deployment:
 - registry.gitlab.com/XXX -> registry.gitlab.com/XXX:0.7.0
Starting deploy...
 - configmap/config-4mcf2g2ghh created
 - service/svc created
 - deployment.apps/deployment created
 - ingress.networking.k8s.io/ingress created
Waiting for deployments to stabilize...
Deployments stabilized in 28.180575ms

when using run in favor of deploy everything works as expected.
It then yields the following:

skaffold -p dev-cluster run
Generating tags...
Checking cache...
Starting test...
Tags used in deployment:
Starting deploy...
 - configmap/config-4mcf2g2ghh configured
 - service/svc configured
 - deployment.apps/deployment configured
 - ingress.networking.k8s.io/ingress configured
Waiting for deployments to stabilize...
 - default:deployment/deployment: waiting for rollout to finish: 1 old replicas are pending termination...
 - default:deployment/deployment is ready.
Deployments stabilized in 21.166 seconds
You can also run [skaffold run --tail] to get the logs

I also wonder if there is an option to silence this You can also run [skaffold run --tail] to get the logs line, because setting --tail=false doesn't silence it.

@briandealwis
Copy link
Member

@cmdjulian could you provide a log running with -vtrace (suitably redacted)?

@cmdjulian
Copy link

Sure the log yields the following for skaffold -p dev-cluster deploy --images=registry.gitlab.com/XXX --tag=0.7.0 --status-check=true:

INFO[0000] Loaded Skaffold defaults from "!!REDACTED!!" 
DEBU[0000] config version out of date: upgrading to latest "skaffold/v2beta21" 
DEBU[0000] parsed 1 configs from configuration file !!REDACTED!!/skaffold.yaml 
INFO[0000] applying profile: dev-cluster                    
DEBU[0000] overlaying profile on config for field Build 
DEBU[0000] overlaying profile on config for field artifacts 
DEBU[0000] overlaying profile on config for field insecureRegistries 
DEBU[0000] overlaying profile on config for field tagPolicy 
INFO[0000] no values found in profile for field TagPolicy, using original config values 
DEBU[0000] overlaying profile on config for field BuildType 
INFO[0000] no values found in profile for field BuildType, using original config values 
DEBU[0000] overlaying profile on config for field Test  
DEBU[0000] overlaying profile on config for field Deploy 
DEBU[0000] overlaying profile on config for field DeployType 
DEBU[0000] overlaying profile on config for field -     
DEBU[0000] overlaying profile on config for field helm  
DEBU[0000] overlaying profile on config for field kpt   
DEBU[0000] overlaying profile on config for field kubectl 
DEBU[0000] overlaying profile on config for field kustomize 
DEBU[0000] overlaying profile on config for field statusCheck 
DEBU[0000] overlaying profile on config for field statusCheckDeadlineSeconds 
DEBU[0000] overlaying profile on config for field kubeContext 
DEBU[0000] overlaying profile on config for field logs  
DEBU[0000] overlaying profile on config for field prefix 
DEBU[0000] overlaying profile on config for field PortForward 
DEBU[0000] Defaulting build type to local build         
TRAC[0000] validating yamltags of struct SkaffoldConfig 
TRAC[0000] validating yamltags of struct Metadata       
TRAC[0000] validating yamltags of struct Pipeline       
TRAC[0000] validating yamltags of struct BuildConfig    
TRAC[0000] validating yamltags of struct TagPolicy      
TRAC[0000] validating yamltags of struct GitTagger      
TRAC[0000] validating yamltags of struct BuildType      
TRAC[0000] validating yamltags of struct LocalBuild     
TRAC[0000] validating yamltags of struct DeployConfig   
TRAC[0000] validating yamltags of struct DeployType     
TRAC[0000] validating yamltags of struct KustomizeDeploy 
TRAC[0000] validating yamltags of struct KubectlFlags   
TRAC[0000] validating yamltags of struct DeployHooks    
TRAC[0000] validating yamltags of struct LogsConfig     
INFO[0000] Using kubectl context: !!REDACTED!!                  
DEBU[0000] Running command: [minikube version --output=json] 
TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH 
TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH 
DEBU[0000] setting Docker user agent to skaffold-v1.30.0 
DEBU[0000] Using builder: local                         
DEBU[0000] push value not present in NewBuilder, defaulting to true because cluster.PushImages is true 
INFO[0000] build concurrency first set to 0 parsed from *local.Builder[0] 
INFO[0000] final build concurrency value is 0           
Tags used in deployment:
 - registry.gitlab.com/XXX -> registry.gitlab.com/XXX:0.7.1
DEBU[0000] push value not present in isImageLocal(), defaulting to true because cluster.PushImages is true 
Starting deploy...
DEBU[0000] getting client config for kubeContext: `!!REDACTED!!` 
DEBU[0000] Running command: [kubectl version --client -ojson] 
TRAC[0000] latest skaffold version: v1.31.0             
DEBU[0000] Command output: [{
  "clientVersion": {
    "major": "1",
    "minor": "21",
    "gitVersion": "v1.21.3",
    "gitCommit": "ca643a4d1f7bfe34773c74f79527be4afd95bf39",
    "gitTreeState": "archive",
    "buildDate": "2021-07-16T17:16:46Z",
    "goVersion": "go1.16.5",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}
] 
DEBU[0000] Running command: [kustomize build k8s/overlays/dev-cluster] 
DEBU[0000] Command output: 
!!REDACTED MANIFESTS!!
DEBU[0000] Running command: [kubectl --context !!REDACTED!! get -f - --ignore-not-found -ojson] 
DEBU[0001] Command output: []                           
DEBU[0001] 4 manifests to deploy. 4 are updated or new  
DEBU[0001] Running command: [kubectl --context !!REDACTED!! apply -f -] 
 - configmap/config-4mcf2g2ghh created
 - service/svc created
 - deployment.apps/deployment created
 - ingress.networking.k8s.io/ingress created
INFO[0001] Deploy completed in 1.788 second             
Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!` 
Deployments stabilized in 15.963637ms
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!` 

DEBU[0001] exporting metrics

@cmdjulian
Copy link

for the run command I'm seeing the following:

nearly same output
...
INFO[0001] Deploy completed in 1.149 second             
Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!` 
DEBU[0001] checking status default:deployment/deployment 
DEBU[0002] Running command: [kubectl --context !!REDACTED!! rollout status deployment deployment --namespace default --watch=false] 
DEBU[0002] Command output: [Waiting for deployment "deployment" rollout to finish: 0 of 1 updated replicas are available...
...
loops over and over again
...
DEBU[0021] Pod "deployment-75c9dd798b-znm2m" scheduled but not ready: checking container statuses 
DEBU[0021] Fetching events for pod "deployment-75c9dd798b-znm2m" 
DEBU[0022] Running command: [kubectl --context !!REDACTED!! rollout status deployment deployment --namespace !!REDACTED!! --watch=false] 
DEBU[0022] Command output: [deployment "deployment" successfully rolled out
] 
DEBU[0022] Fetching events for pod "deployment-75c9dd798b-znm2m" 
 - default:deployment/deployment is ready.
Deployments stabilized in 21.183 seconds
DEBU[0022] getting client config for kubeContext: `!!REDACTED!!` 
You can also run [skaffold run --tail] to get the logs
WARN[0022] got unexpected event of type ERROR           

DEBU[0022] exporting metrics

@cmdjulian
Copy link

When using the latest version from https://storage.googleapis.com/skaffold/builds/latest/skaffold-linux-amd64 I'm also seeing the following for the deploy task:

Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!`  subtask=-1 task=DevLoop
Deployments stabilized in 18.632405ms
INFO[0001] Deploy completed in 1.064 second              subtask=-1 task=Deploy
Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!`  subtask=-1 task=DevLoop
Deployments stabilized in 2.88494ms
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!`  subtask=-1 task=DevLoop
WARN[0001] got unexpected event of type ERROR            subtask=-1 task=DevLoop

DEBU[0001] exporting metrics                             subtask=-1 task=DevLoop
DEBU[0001] metrics uploading complete in 804.018095ms    subtask=-1 task=DevLoop

I'm using k3s if this helps and my kustomize version is {Version:4.2.0 GitCommit:$Format:%H$ BuildDate:2021-07-22T22:12:15Z GoOs:linux GoArch:amd64}

@simonjpartridge
Copy link

I'm also experiencing this problem using skaffold 1.32.0.

My deployments report being stabilised in a few milliseconds when using skaffold deploy even when the deployments haven't rolled out yet and kubectl rollout status still reports "waiting for rollout to finish".

Strangely skaffold run has the correct behaviour and waits for the deployments to properly stabilise (takes about 20s for us) before reporting success. A failing deployment correctly reports an error when using skaffold run but reports all successful when using skaffold deploy.

Using kubernetes v1.20.9, istio 1.11.2, and kustomize 4.3

@simonjpartridge
Copy link

This appears to have been fixed for me in 1.33.0. I suspect the change was #6674 which fixed it. Thanks :)

@cmdjulian
Copy link

Hey @simonjpartridge, after updating to the newest version 1.33.0 I see normal deployment times. It appears to be fixed for me as well.
Thanks

@gsquared94
Copy link
Contributor

closing for now, please comment to reopen if it reoccurs.

@afallou
Copy link

afallou commented Oct 24, 2022

Seeing this issue again with the Skaffold 2.0 release. Pinning to v1.39.2 fixes it.

@aaron-prindle aaron-prindle reopened this Nov 29, 2022
@aaron-prindle aaron-prindle added this to the v2.1.0 milestone Nov 29, 2022
@aaron-prindle aaron-prindle assigned gsquared94 and unassigned tejal29 Dec 1, 2022
@aaron-prindle
Copy link
Contributor

aaron-prindle commented Dec 14, 2022

@afallou can you add some more information on the skaffold.yaml used when you encountered this (esp. what k8s objects were deployed - Deployment, StatefulSet, etc.) and what deployer was used?) . Also have you been able to try v2.0.3 and is the issue still present there? Thanks

@aaron-prindle
Copy link
Contributor

Here is a sample deployment that this seems to occur for:

# skaffold apply -f skaffold.yaml manifest.yaml 
Starting deploy...
 - deployment.apps/blah-deployment created
Waiting for deployments to stabilize...
Deployments stabilized in 307.114876ms
# kubectl get deployments
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
blah-deployment                       0/3     3            0           13s
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   app: blah
   skaffold.dev/run-id: static
 name: blah-deployment
spec:
 replicas: 3
 selector:
   matchLabels:
     app: blah
 strategy:
   rollingUpdate:
     maxSurge: 1
     maxUnavailable: 0
   type: RollingUpdate
 template:
   metadata:
     labels:
       app: blah
   spec:
     containers:
     - image: us-east1-docker.pkg.dev/sample-app/sample-repo/hello-app:556538f3-0569-430e-9856-4ca8ed770646
       imagePullPolicy: Always
       name: hello-app
       readinessProbe:
         initialDelaySeconds: 10
         periodSeconds: 30
         tcpSocket:
           port: 80

@gsquared94
Copy link
Contributor

I could not repro this issue. I tried against pods and deployments and this app mentioned above.

I tried against skaffold main branch, along with v2.0.2 and v2.0.3, and against minikube and GKE clusters. It is possible that this regression existed in v2.0.0 release but I think that release has been archived and the earliest available version is now v2.0.2.

Closing it again. Please provide the exact kubernetes manifest with a prebuilt image that I can pull and run as repro to reopen.

@renzodavid9
Copy link
Contributor

Making triage party happy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/status-check kind/regression needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug priority/p1 High impact feature/bug.
Projects
None yet