Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up all task/job executions does not clean up tasks on unknown state #6110

Open
juanpablo-santos opened this issue Feb 19, 2025 · 4 comments
Labels
type/enhancement Is an enhancement request

Comments

@juanpablo-santos
Copy link

Description:
Not sure if bug or improvement request. Currently, the "Clean up all task/job executions" menu option at Tools section requires that the pods associated to the execution to be present at the cluster where te were run in order to remove the task execution from SCDF database.

In our case, our platform team runs a pipeline on every k8s cluster which wipes every pod that has been on finished/err state for more than 6 hours, so when we run the 'Clean up all task/job executions', every execution that hasn't its pod present on the cluster doesn't get deleted. The cleanup process tries to fetch the pod (to delete it, I presume), raises an exception that appears on the SCDF server logs, and then carries on and tries with the next execution.

Current workaround is to manually delete the rows at dabase level.

Release versions:

{
  "versions": {
    "implementation": {
      "name": "spring-cloud-dataflow-server",
      "version": "2.11.5"
    },
    "core": {
      "name": "Spring Cloud Data Flow Core",
      "version": "2.11.5"
    },
    "dashboard": {
      "name": "Spring Cloud Dataflow UI",
      "version": "3.4.6"
    },
    "shell": {
      "name": "Spring Cloud Data Flow Shell",
      "version": "2.11.5",
      "url": "https://repo.maven.apache.org/maven2/org/springframework/cloud/spring-cloud-dataflow-shell/2.11.5/spring-cloud-dataflow-shell-2.11.5.jar"
    }
  },
  "features": {
    "streams": true,
    "tasks": true,
    "schedules": true,
    "monitoringDashboardType": "GRAFANA"
  },
  "runtimeEnvironment": {
    "appDeployer": {
      "deployerImplementationVersion": "2.11.5",
      "deployerName": "Spring Cloud Skipper Server",
      "deployerSpiVersion": "2.11.5",
      "javaVersion": "21.0.5",
      "platformApiVersion": "",
      "platformClientVersion": "",
      "platformHostVersion": "",
      "platformSpecificInfo": {
        "default": "kubernetes"
      },
      "platformType": "Skipper Managed",
      "springBootVersion": "2.7.18",
      "springVersion": "5.3.39"
    },
    "taskLaunchers": [
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-x2sfc28s"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-x2sfc28s/"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-n666tnnf"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-ghbjhsss"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      }
    ]
  },
  "monitoringDashboardInfo": {
    "url": "https://grafana.sanitas.dom",
    "source": "default-scdf-source",
    "refreshInterval": 15
  },
  "security": {
    "isAuthentication": true,
    "isAuthenticated": true,
    "username": "jpsantos",
    "roles": [
      "ROLE_CREATE",
      "ROLE_DEPLOY",
      "ROLE_DESTROY",
      "ROLE_MANAGE",
      "ROLE_MODIFY",
      "ROLE_SCHEDULE",
      "ROLE_VIEW"
    ]
  },
  "git": {
    "commit": "edc71ff"
  }
}

Custom apps:
N/A.

Steps to reproduce:
N/A.

Screenshots:
N/A.

Additional context:
N/A.

@github-actions github-actions bot added the status/need-triage Team needs to triage and take a first look label Feb 19, 2025
@cppwfs
Copy link
Contributor

cppwfs commented Feb 19, 2025

Hello @juanpablo-santos ,
Can you share the stack trace? Thanks

@cppwfs cppwfs added status/need-feedback Calling participant to provide feedback and removed status/need-triage Team needs to triage and take a first look labels Feb 19, 2025
@juanpablo-santos
Copy link
Author

Hi,

my bad, not an stacktrace but a warn message on log like

2025-02-19 19:19:12.282  WARN 1 --- [nio-8080-exec-1] o.s.c.d.s.k.KubernetesTaskLauncher       : Cannot delete pod for task "TASK_NAME_HERE-xexqgrpd6e" (reason: pod does not exist)

per task execution without its corresponding pod

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Feb 19, 2025
@juanpablo-santos
Copy link
Author

ouch, not exactly what I reported. Most executions get deleted.

However, those executions that were unable to spin up a pod because of whatever reason (in our case, f.ex., a missing init container) are not deleted. Our first couple of executions pages look something like this:

Image

I'm so used to see it that thought that expected the Clean up to wipe it, and incorrectly thought that it wasn't deleting executions at all, but it is deleting executions with FAILED or SUCCESS state. Apologies on the noise, will update the issue title accordingly.

@juanpablo-santos juanpablo-santos changed the title Clean up all task/job executions requires the pod to be present Clean up all task/job executions does not clean up tasks on unknown state Feb 19, 2025
@cppwfs cppwfs added type/enhancement Is an enhancement request and removed for/team-attention For team attention labels Feb 19, 2025
@cppwfs cppwfs added this to the General Backlog milestone Feb 19, 2025
@cppwfs
Copy link
Contributor

cppwfs commented Feb 19, 2025

Support option that will delete status of UNKNOWN. But keeping mind it could delete pending task runs. So that will need to be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Is an enhancement request
Projects
None yet
Development

No branches or pull requests

2 participants