Skip to content

Skaffold dev uses older deployment pod state as new deployment error #4947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
taisph opened this issue Oct 26, 2020 · 9 comments
Closed

Skaffold dev uses older deployment pod state as new deployment error #4947

taisph opened this issue Oct 26, 2020 · 9 comments
Labels
area/status-check kind/bug Something isn't working needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug priority/p3 agreed that this would be good to have, but no one is available at the moment.

Comments

@taisph
Copy link

taisph commented Oct 26, 2020

Expected behavior

While skaffold dev is running, a new deployment should not be marked as failed if a previous deployment is in backoff state or exits with an error when terminated.

Actual behavior

As seen below, pod/my-service-v1-6d576c8f74-48qhc is the new pod created by the new deployment cycle where the two previous attempts are in backoff mode. The older deployments are terminated by Kubernetes as soon as the new deployment enters the running state which can be seen in the kubectl output below.

Starting deploy...
 - deployment.apps/my-service-v1 configured
Waiting for deployments to stabilize...
 - deployment/my-service-mydb-v1 is ready. [1/2 deployment(s) still pending]
 - deployment/my-service-v1: container mycontainer is backing off waiting to restart
    - pod/my-service-v1-6bb8d85cd6-t527f: container mycontainer is backing off waiting to restart
      > Error retrieving logs for pod my-service-v1-6bb8d85cd6-t527f. Try `kubectl logs my-service-v1-6bb8d85cd6-t527f -n default -c mycontainer`
    - pod/my-service-v1-6d576c8f74-48qhc: creating container mycontainer
    - pod/my-service-v1-9d46d76b5-l6279: container mycontainer is backing off waiting to restart
      > Error retrieving logs for pod my-service-v1-9d46d76b5-l6279. Try `kubectl logs my-service-v1-9d46d76b5-l6279 -n default -c mycontainer`
 - deployment/my-service-v1 failed. Error: container mycontainer is backing off waiting to restart.
WARN[1832] Skipping deploy due to error: 1/2 deployment(s) failed 
Watching for changes...
# kubectl --context kind-project get pods
NAME                                 READY   STATUS        RESTARTS   AGE
my-service-mydb-v1-85554dc6b-zpxkr   1/1     Running       0          30m
my-service-v1-6bb8d85cd6-t527f       0/1     Terminating   6          6m13s
my-service-v1-6d576c8f74-48qhc       1/1     Running       0          5s
my-service-v1-9d46d76b5-l6279        0/1     Terminating   6          7m59s

Information

  • Skaffold version: v1.14.0
  • Operating system: Ubuntu 20.04.1 LTS
  • Contents of skaffold.yaml:
apiVersion: skaffold/v2beta7
kind: Config

build:
  artifacts:
  - image: project-my-service
    context: .
    docker:
      dockerfile: build/Dockerfile

deploy:
  kustomize:
    paths:
    - deployments/my_service
  kubeContext: kind-project
  statusCheckDeadlineSeconds: 300

profiles:
- name: development
  activation:
  - command: dev
  deploy:
    kustomize:
      paths:
      - deployments/my_service
      - deployments/dev
    kubeContext: kind-project

portForward:
- resourceType: service
  resourceName: my-service
  port: 8080
@tejal29 tejal29 added area/status-check kind/bug Something isn't working priority/p2 May take a couple of releases labels Oct 26, 2020
@tejal29
Copy link
Contributor

tejal29 commented Oct 27, 2020

Skaffold fetches pods/services based on the label which is new for every dev iteration.
This should not happen and needs some investigation.

@tejal29 tejal29 added the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Oct 27, 2020
@Jrenk
Copy link

Jrenk commented Dec 2, 2020

I'm experiencing the same behavior with skaffold dev.

When skaffold tries to update one of my pods the deployment fails with following error message:

 - deployment/router: creating container router
    - pod/router-dc9d8c748-qhpkf: creating container router
    - pod/router-76bf54f974-xcgs8: container router terminated with exit code 2
      > Error retrieving logs for pod router-76bf54f974-xcgs8. Try `kubectl logs router-76bf54f974-xcgs8 -n default -c router`
    - pod/router-dc9d8c748-rjcjb: creating container router
 - deployment/router failed. Error: creating container router.
WARN[0105] Skipping deploy due to error: 1/5 deployment(s) failed 

My whole application becomes unresponsive at this point.
The new pod gets created successfully and the old one gets terminated successfully but skaffold does not recover from this error and needs a restart to be running again.

Information

  • Skaffold Version v1.17.0
  • Operating System: Ubuntu 18.04.5 LTS

@briandealwis
Copy link
Member

I saw this same behaviour with kind: I had deployed the getting-started into a different namespace, and yet it was being latched onto by Skaffold. Will try to reproduce.

@nkubala nkubala added priority/p3 agreed that this would be good to have, but no one is available at the moment. and removed priority/p2 May take a couple of releases labels Feb 12, 2021
@wojtek-viirtue
Copy link

I'm seeing the same behavior in 1.20.0. has anyone come across any workarounds? Currently the only thing I can do is kill skaffold and relaunch at which point the deployment will be detected as stabilized (once the new pod is spun up). Defeats the purpose of the "dev" flag.

@pot-code
Copy link

similar issue, but the problem is that the old deployment starved my CPU resource because the old deployment doesn't get pruned when the skaffold is trying to create the new one.

@briandealwis
Copy link
Member

@pot-code just to be clear, the deployment management is performed by Kubernetes, not Skaffold. It sounds like you should look at the Deployment Recreate strategy.

@briandealwis
Copy link
Member

There have been a number of improvements to Skaffold's status checking since this issue was first opened. In particular, Skaffold changed its default status check timeout in v1.18.0 to 10 minutes to match Kubernetes' default (#5247).

I'm going to close this issue: if you're seeing errors relating to redeploys then please open a new issue with details to reproduce.

@taisph
Copy link
Author

taisph commented Jun 8, 2021

This is still an issue with skaffold v1.25.0. I'll see if I can find time to create a new issue.

@taisph
Copy link
Author

taisph commented Jul 14, 2021

This seems to be fixed in skaffold v1.27.0. Thank you. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/status-check kind/bug Something isn't working needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug priority/p3 agreed that this would be good to have, but no one is available at the moment.
Projects
None yet
Development

No branches or pull requests

7 participants