Clean up all task/job executions does not clean up tasks on unknown state #6110

juanpablo-santos · 2025-02-19T12:09:31Z

Description:
Not sure if bug or improvement request. Currently, the "Clean up all task/job executions" menu option at Tools section requires that the pods associated to the execution to be present at the cluster where te were run in order to remove the task execution from SCDF database.

In our case, our platform team runs a pipeline on every k8s cluster which wipes every pod that has been on finished/err state for more than 6 hours, so when we run the 'Clean up all task/job executions', every execution that hasn't its pod present on the cluster doesn't get deleted. The cleanup process tries to fetch the pod (to delete it, I presume), raises an exception that appears on the SCDF server logs, and then carries on and tries with the next execution.

Current workaround is to manually delete the rows at dabase level.

Release versions:

{
  "versions": {
    "implementation": {
      "name": "spring-cloud-dataflow-server",
      "version": "2.11.5"
    },
    "core": {
      "name": "Spring Cloud Data Flow Core",
      "version": "2.11.5"
    },
    "dashboard": {
      "name": "Spring Cloud Dataflow UI",
      "version": "3.4.6"
    },
    "shell": {
      "name": "Spring Cloud Data Flow Shell",
      "version": "2.11.5",
      "url": "https://repo.maven.apache.org/maven2/org/springframework/cloud/spring-cloud-dataflow-shell/2.11.5/spring-cloud-dataflow-shell-2.11.5.jar"
    }
  },
  "features": {
    "streams": true,
    "tasks": true,
    "schedules": true,
    "monitoringDashboardType": "GRAFANA"
  },
  "runtimeEnvironment": {
    "appDeployer": {
      "deployerImplementationVersion": "2.11.5",
      "deployerName": "Spring Cloud Skipper Server",
      "deployerSpiVersion": "2.11.5",
      "javaVersion": "21.0.5",
      "platformApiVersion": "",
      "platformClientVersion": "",
      "platformHostVersion": "",
      "platformSpecificInfo": {
        "default": "kubernetes"
      },
      "platformType": "Skipper Managed",
      "springBootVersion": "2.7.18",
      "springVersion": "5.3.39"
    },
    "taskLaunchers": [
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-x2sfc28s"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-x2sfc28s/"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-n666tnnf"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      },
      {
        "deployerImplementationVersion": "unknown",
        "deployerName": "KubernetesTaskLauncher",
        "deployerSpiVersion": "unknown",
        "javaVersion": "21.0.5",
        "platformApiVersion": "v1",
        "platformClientVersion": "unknown",
        "platformHostVersion": "unknown",
        "platformSpecificInfo": {
          "namespace": "scdf",
          "master-url": "https://rancher.sanitas.dom/k8s/clusters/c-m-ghbjhsss"
        },
        "platformType": "Kubernetes",
        "springBootVersion": "2.7.18",
        "springVersion": "5.3.39"
      }
    ]
  },
  "monitoringDashboardInfo": {
    "url": "https://grafana.sanitas.dom",
    "source": "default-scdf-source",
    "refreshInterval": 15
  },
  "security": {
    "isAuthentication": true,
    "isAuthenticated": true,
    "username": "jpsantos",
    "roles": [
      "ROLE_CREATE",
      "ROLE_DEPLOY",
      "ROLE_DESTROY",
      "ROLE_MANAGE",
      "ROLE_MODIFY",
      "ROLE_SCHEDULE",
      "ROLE_VIEW"
    ]
  },
  "git": {
    "commit": "edc71ff"
  }
}

Custom apps:
N/A.

Steps to reproduce:
N/A.

Screenshots:
N/A.

Additional context:
N/A.

The text was updated successfully, but these errors were encountered:

cppwfs · 2025-02-19T15:08:10Z

Hello @juanpablo-santos ,
Can you share the stack trace? Thanks

juanpablo-santos · 2025-02-19T18:22:52Z

Hi,

my bad, not an stacktrace but a warn message on log like

2025-02-19 19:19:12.282  WARN 1 --- [nio-8080-exec-1] o.s.c.d.s.k.KubernetesTaskLauncher       : Cannot delete pod for task "TASK_NAME_HERE-xexqgrpd6e" (reason: pod does not exist)

per task execution without its corresponding pod

juanpablo-santos · 2025-02-19T18:34:47Z

ouch, not exactly what I reported. Most executions get deleted.

However, those executions that were unable to spin up a pod because of whatever reason (in our case, f.ex., a missing init container) are not deleted. Our first couple of executions pages look something like this:

I'm so used to see it that thought that expected the Clean up to wipe it, and incorrectly thought that it wasn't deleting executions at all, but it is deleting executions with FAILED or SUCCESS state. Apologies on the noise, will update the issue title accordingly.

cppwfs · 2025-02-19T18:55:12Z

Support option that will delete status of UNKNOWN. But keeping mind it could delete pending task runs. So that will need to be documented.

github-actions bot added the status/need-triage Team needs to triage and take a first look label Feb 19, 2025

cppwfs added status/need-feedback Calling participant to provide feedback and removed status/need-triage Team needs to triage and take a first look labels Feb 19, 2025

github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Feb 19, 2025

juanpablo-santos changed the title ~~Clean up all task/job executions requires the pod to be present~~ Clean up all task/job executions does not clean up tasks on unknown state Feb 19, 2025

cppwfs added type/enhancement Is an enhancement request and removed for/team-attention For team attention labels Feb 19, 2025

cppwfs added this to the General Backlog milestone Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up all task/job executions does not clean up tasks on unknown state #6110

Clean up all task/job executions does not clean up tasks on unknown state #6110

juanpablo-santos commented Feb 19, 2025

cppwfs commented Feb 19, 2025

juanpablo-santos commented Feb 19, 2025

juanpablo-santos commented Feb 19, 2025

cppwfs commented Feb 19, 2025

Clean up all task/job executions does not clean up tasks on unknown state #6110

Clean up all task/job executions does not clean up tasks on unknown state #6110

Comments

juanpablo-santos commented Feb 19, 2025

cppwfs commented Feb 19, 2025

juanpablo-santos commented Feb 19, 2025

juanpablo-santos commented Feb 19, 2025

cppwfs commented Feb 19, 2025