Skaffold dev uses older deployment pod state as new deployment error #4947

taisph · 2020-10-26T09:56:41Z

Expected behavior

While skaffold dev is running, a new deployment should not be marked as failed if a previous deployment is in backoff state or exits with an error when terminated.

Actual behavior

As seen below, pod/my-service-v1-6d576c8f74-48qhc is the new pod created by the new deployment cycle where the two previous attempts are in backoff mode. The older deployments are terminated by Kubernetes as soon as the new deployment enters the running state which can be seen in the kubectl output below.

Starting deploy...
 - deployment.apps/my-service-v1 configured
Waiting for deployments to stabilize...
 - deployment/my-service-mydb-v1 is ready. [1/2 deployment(s) still pending]
 - deployment/my-service-v1: container mycontainer is backing off waiting to restart
    - pod/my-service-v1-6bb8d85cd6-t527f: container mycontainer is backing off waiting to restart
      > Error retrieving logs for pod my-service-v1-6bb8d85cd6-t527f. Try `kubectl logs my-service-v1-6bb8d85cd6-t527f -n default -c mycontainer`
    - pod/my-service-v1-6d576c8f74-48qhc: creating container mycontainer
    - pod/my-service-v1-9d46d76b5-l6279: container mycontainer is backing off waiting to restart
      > Error retrieving logs for pod my-service-v1-9d46d76b5-l6279. Try `kubectl logs my-service-v1-9d46d76b5-l6279 -n default -c mycontainer`
 - deployment/my-service-v1 failed. Error: container mycontainer is backing off waiting to restart.
WARN[1832] Skipping deploy due to error: 1/2 deployment(s) failed 
Watching for changes...

# kubectl --context kind-project get pods
NAME                                 READY   STATUS        RESTARTS   AGE
my-service-mydb-v1-85554dc6b-zpxkr   1/1     Running       0          30m
my-service-v1-6bb8d85cd6-t527f       0/1     Terminating   6          6m13s
my-service-v1-6d576c8f74-48qhc       1/1     Running       0          5s
my-service-v1-9d46d76b5-l6279        0/1     Terminating   6          7m59s

Information

Skaffold version: v1.14.0
Operating system: Ubuntu 20.04.1 LTS
Contents of skaffold.yaml:

apiVersion: skaffold/v2beta7
kind: Config

build:
  artifacts:
  - image: project-my-service
    context: .
    docker:
      dockerfile: build/Dockerfile

deploy:
  kustomize:
    paths:
    - deployments/my_service
  kubeContext: kind-project
  statusCheckDeadlineSeconds: 300

profiles:
- name: development
  activation:
  - command: dev
  deploy:
    kustomize:
      paths:
      - deployments/my_service
      - deployments/dev
    kubeContext: kind-project

portForward:
- resourceType: service
  resourceName: my-service
  port: 8080

The text was updated successfully, but these errors were encountered:

tejal29 · 2020-10-27T23:04:54Z

Skaffold fetches pods/services based on the label which is new for every dev iteration.
This should not happen and needs some investigation.

Jrenk · 2020-12-02T09:11:42Z

I'm experiencing the same behavior with skaffold dev.

When skaffold tries to update one of my pods the deployment fails with following error message:

 - deployment/router: creating container router
    - pod/router-dc9d8c748-qhpkf: creating container router
    - pod/router-76bf54f974-xcgs8: container router terminated with exit code 2
      > Error retrieving logs for pod router-76bf54f974-xcgs8. Try `kubectl logs router-76bf54f974-xcgs8 -n default -c router`
    - pod/router-dc9d8c748-rjcjb: creating container router
 - deployment/router failed. Error: creating container router.
WARN[0105] Skipping deploy due to error: 1/5 deployment(s) failed

My whole application becomes unresponsive at this point.
The new pod gets created successfully and the old one gets terminated successfully but skaffold does not recover from this error and needs a restart to be running again.

Information

Skaffold Version v1.17.0
Operating System: Ubuntu 18.04.5 LTS

briandealwis · 2020-12-17T21:54:28Z

I saw this same behaviour with kind: I had deployed the getting-started into a different namespace, and yet it was being latched onto by Skaffold. Will try to reproduce.

wojtek-viirtue · 2021-03-15T20:24:24Z

I'm seeing the same behavior in 1.20.0. has anyone come across any workarounds? Currently the only thing I can do is kill skaffold and relaunch at which point the deployment will be detected as stabilized (once the new pod is spun up). Defeats the purpose of the "dev" flag.

pot-code · 2021-04-14T15:15:29Z

similar issue, but the problem is that the old deployment starved my CPU resource because the old deployment doesn't get pruned when the skaffold is trying to create the new one.

briandealwis · 2021-05-03T20:51:13Z

@pot-code just to be clear, the deployment management is performed by Kubernetes, not Skaffold. It sounds like you should look at the Deployment Recreate strategy.

briandealwis · 2021-05-03T21:19:06Z

There have been a number of improvements to Skaffold's status checking since this issue was first opened. In particular, Skaffold changed its default status check timeout in v1.18.0 to 10 minutes to match Kubernetes' default (#5247).

I'm going to close this issue: if you're seeing errors relating to redeploys then please open a new issue with details to reproduce.

taisph · 2021-06-08T12:31:30Z

This is still an issue with skaffold v1.25.0. I'll see if I can find time to create a new issue.

taisph · 2021-07-14T22:06:10Z

This seems to be fixed in skaffold v1.27.0. Thank you. ❤️

tejal29 added area/status-check kind/bug Something isn't working priority/p2 May take a couple of releases labels Oct 26, 2020

tejal29 added the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Oct 27, 2020

nkubala added priority/p3 agreed that this would be good to have, but no one is available at the moment. and removed priority/p2 May take a couple of releases labels Feb 12, 2021

briandealwis closed this as completed May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skaffold dev uses older deployment pod state as new deployment error #4947

Skaffold dev uses older deployment pod state as new deployment error #4947

taisph commented Oct 26, 2020

tejal29 commented Oct 27, 2020

Jrenk commented Dec 2, 2020 •

edited

Loading

briandealwis commented Dec 17, 2020

wojtek-viirtue commented Mar 15, 2021

pot-code commented Apr 14, 2021

briandealwis commented May 3, 2021

briandealwis commented May 3, 2021

taisph commented Jun 8, 2021

taisph commented Jul 14, 2021

Skaffold dev uses older deployment pod state as new deployment error #4947

Skaffold dev uses older deployment pod state as new deployment error #4947

Comments

taisph commented Oct 26, 2020

Expected behavior

Actual behavior

Information

tejal29 commented Oct 27, 2020

Jrenk commented Dec 2, 2020 • edited Loading

briandealwis commented Dec 17, 2020

wojtek-viirtue commented Mar 15, 2021

pot-code commented Apr 14, 2021

briandealwis commented May 3, 2021

briandealwis commented May 3, 2021

taisph commented Jun 8, 2021

taisph commented Jul 14, 2021

Jrenk commented Dec 2, 2020 •

edited

Loading