Skip to content

Removing All Labels on GKE Node Pool Fails #15848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
drcapulet opened this issue Sep 14, 2023 · 16 comments · Fixed by GoogleCloudPlatform/magic-modules#12877, #21082 or hashicorp/terraform-provider-google-beta#9171

Comments

@drcapulet
Copy link

drcapulet commented Sep 14, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

$ terraform -v
Terraform v1.5.7
on darwin_arm64
+ provider registry.terraform.io/hashicorp/google v4.82.0
+ provider registry.terraform.io/hashicorp/random v3.5.1

Affected Resource(s)

  • google_container_node_pool

Terraform Configuration Files

resource "google_service_account" "default" {
  account_id   = "service-account-id"
  display_name = "Service Account"
}

resource "google_container_cluster" "primary" {
  name     = "primary"
  location = "us-central1"

  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "main" {
  name       = "main"
  cluster    = google_container_cluster.primary.id
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-medium"

    labels = {
      example = "label"
    }

    service_account = google_service_account.default.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

Debug Output

2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: ---[ REQUEST ]---------------------------------------
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: PUT /v1/projects/example-project/locations/us-central1/clusters/primary/nodePools/main?alt=json&prettyPrint=false HTTP/1.1
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Host: container.googleapis.com
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: User-Agent: google-api-go-client/0.5 Terraform/1.5.7 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/4.82.0
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Content-Length: 19
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Content-Type: application/json
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: X-Goog-Api-Client: gl-go/1.19.9 gdcl/0.138.0
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Accept-Encoding: gzip
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: {
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:  "name": "main"
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: }
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:
2023-09-14T13:47:00.021-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: -----------------------------------------------------
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: 2023/09/14 13:47:00 [DEBUG] Google API Response Details:
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: ---[ RESPONSE ]--------------------------------------
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: HTTP/2.0 400 Bad Request
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Cache-Control: private
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Content-Type: application/json; charset=UTF-8
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Date: Thu, 14 Sep 2023 18:47:00 GMT
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Server: ESF
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Vary: Origin
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Vary: X-Origin
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: Vary: Referer
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: X-Content-Type-Options: nosniff
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: X-Frame-Options: SAMEORIGIN
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: X-Xss-Protection: 0
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: {
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:   "error": {
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     "code": 400,
2023-09-14T13:47:00.381-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     "message": "At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'containerd_config', 'resource_manager_tags'] must be specified.",
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     "errors": [
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:       {
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:         "message": "At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'containerd_config', 'resource_manager_tags'] must be specified.",
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:         "domain": "global",
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:         "reason": "badRequest"
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:       }
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     ],
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     "status": "INVALID_ARGUMENT",
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     "details": [
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:       {
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:         "@type": "type.googleapis.com/google.rpc.RequestInfo",
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:         "requestId": "0x3847051cd04685a"
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:       }
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:     ]
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5:   }
2023-09-14T13:47:00.382-0500 [DEBUG] provider.terraform-provider-google_v4.82.0_x5: }

Expected Behavior

Labels on the node pool are removed.

Actual Behavior

│ Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'containerd_config', 'resource_manager_tags'] must be specified.

Steps to Reproduce

  1. Remove node_config.labels or set to {}
  2. terraform apply

b/300616676

@drcapulet drcapulet added the bug label Sep 14, 2023
@edwardmedia edwardmedia self-assigned this Sep 14, 2023
@github-actions github-actions bot added forward/review In review; remove label to forward service/container labels Sep 14, 2023
@edwardmedia edwardmedia removed the forward/review In review; remove label to forward label Sep 14, 2023
@edwardmedia edwardmedia removed their assignment Sep 14, 2023
@880831ian
Copy link

Hello, I want to ask, is there any solution at present?

@trenslow
Copy link

this is still an issue on provider registry.terraform.io/hashicorp/google v5.27.0

@wyardley
Copy link

I just tested on 6.8.0, and the problem still seems to be an issue, even with the changes I was testing locally for handling updating these values in-place.

@rd-jonas-luebke
Copy link

This is still an issue...

@michaellzc
Copy link

michaellzc commented Jan 25, 2025

We're running into the same problem when provisioning node pool using https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/main/modules/beta-private-cluster

I believe this is a new change that we only observed in newly provisioned GKE cluster.

upon initial provisioning, it resulted in immediate drift where GCP automatically added goog-gke-node-pool-provisioning-model=on-demand to resourceLabels at the node pool. Below is the output of terraform plan:

  # module.gke_self_6A89A423.google_container_node_pool.pools["primary"] will be updated in-place
  ~ resource "google_container_node_pool" "pools" {
        id                          = "projects/<redacted>/locations/<redacted>/clusters/<redacted>/nodePools/primary"
        name                        = "primary"
        # (10 unchanged attributes hidden)

      ~ node_config {
          ~ resource_labels             = {
              - "goog-gke-node-pool-provisioning-model" = "on-demand" -> null
            }
            tags                        = [
                "gke-src-3dfa2a497e1995ac",
                "gke-src-3dfa2a497e1995ac-primary",
            ]
            # (16 unchanged attributes hidden)

            # (4 unchanged blocks hidden)
        }

        # (5 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

however, when running terraform apply, it will result in the same error:

╷
│ Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.
│ Details:
│ [
│   {
│     "@type": "type.googleapis.com/google.rpc.RequestInfo",
│     "requestId": "0x14548dde169f71bf"
│   }
│ ]
│ , badRequest
│
│   with module.gke_self_6A89A423.google_container_node_pool.pools["primary"],
│   on .terraform/modules/gke_self_6A89A423/modules/beta-private-cluster/cluster.tf line 523, in resource "google_container_node_pool" "pools":
│  523: resource "google_container_node_pool" "pools" {
│
╵
Operation failed: failed running terraform apply (exit 1)

@alina-frolova
Copy link

We found a working solution by using Terraform's ability to ignore changes to specific map elements. Adding this to the node pool resource configuration works:

lifecycle {
  ignore_changes = [
    node_config[0].resource_labels["goog-gke-node-pool-provisioning-model"]
  ]
}

This prevents Terraform from trying to remove the automatically added label while still maintaining control over other resource labels.

Reference to the relevant Terraform docs: https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes

@michaellzc
Copy link

michaellzc commented Jan 27, 2025

We found a working solution by using Terraform's ability to ignore changes to specific map elements. Adding this to the node pool resource configuration works:

lifecycle {
  ignore_changes = [
    node_config[0].resource_labels["goog-gke-node-pool-provisioning-model"]
  ]
}

This prevents Terraform from trying to remove the automatically added label while still maintaining control over other resource labels.

Reference to the relevant Terraform docs: https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes

sadly it won’t work if the node pool exists in a child module due a terraform limitation hashicorp/terraform#21546

unless the module itself has this hardcoded.

our current workaround is to always provide some dummy resource labels in the configuration. also thanks to the non-authoritative nature of resourceLabels, the provider will not run into infinite drift as its additional only.

I believe the issue is more so about the poor handling of zero value in the GKE management API, the provider is working as intended.

For example, if we try to update resource labels using the Go sdk directly, we can reproduce the exact same error:

	client, err := containerv1.NewClusterManagerClient(ctx)
	if err != nil {
		log.Fatal(err)
	}
	op, err := client.UpdateNodePool(ctx, &containerpb.UpdateNodePoolRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/clusters/%s/nodePools/%s", projectID, location, clusterName, nodePoolName),
		ResourceLabels: &containerpb.ResourceLabels{
			Labels: map[string]string{}, // nil, or omitting the Labels struct field will also result in the same error
		},
		NodeVersion: "1.28.15-gke.1641000",
	})
At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings', 'kubelet_config', 'linux_node_config', 'tags', 'taints', 'labels', 'node_network_config', 'gcfs_config', 'gvnic', 'confidential_nodes', 'logging_config', 'fast_socket', 'resource_labels', 'accelerators', 'windows_node_config', 'machine_type', 'disk_type', 'disk_size_gb', 'storage_pools', 'containerd_config', 'resource_manager_tags', 'performance_monitoring_unit', 'queued_provisioning', 'max_run_duration'] must be specified.

@rileykarson
Copy link
Collaborator

rileykarson commented Jan 27, 2025

There are two issues going on here:

  • The initial issue is that we can't null out the list of labels. This is because the Go client library doesn't send empty values in the request message, and we need to use ForceSendFields to force-send an empty value
  • There is a recent server-controlled label added to resource_labels that is causing a diff (and it can't be corrected because of the first issue, although it's possible it would remain a permadiff afterwards if GKE preserved the label when unset)
    • We also likely don't want to / can't remove the server-applied label anyways

@maheshglm
Copy link

We have got the problem with new node pools created just few days ago

This is from terraform plan

 node_config {
          ~ resource_labels             = {
              - "goog-gke-node-pool-provisioning-model" = "spot" -> null
            }

And adding this into terraform module solved the issue.

node_pools_resource_labels = {
    all = {}

    standard = {
      "goog-gke-node-pool-provisioning-model" = "spot"
    }
  }

@slevenick
Copy link
Collaborator

Reopening this as I didn't fix the removal of all labels issue. I only fixed the issue with diffs showing up for the server added label

@slevenick
Copy link
Collaborator

I recently released versions 6.18.1 and 5.45.1 of both the GA and beta provider to address the server-side applied labels. It doesn't necessarily fix the remove all labels problem as I didn't want to make such a change in a backport fix, but that problem may actually go away once the server-side labels are rolled out.

@drcapulet
Copy link
Author

@slevenick Appreciate the fix for the goog-gke labels diff - for anyone reading that patch actually released in 6.19.0 of the GA provider. Unfortunately, we're still seeing drift on newly provisioned pools:

  ~ resource "google_container_node_pool" "main" {
      ~ node_config {
          ~ resource_labels             = {
              - "goog-gke-accelerator-type"             = "nvidia-tesla-t4" -> null
              - "goog-gke-node-pool-provisioning-model" = "on-demand" -> null
            }
            tags                        = []
        }
    }

@slevenick
Copy link
Collaborator

Hey @drcapulet that's worrying. I'm not able to reproduce this locally with 6.19.0, but I'm also not getting the labels applied to my node pools.

Can you make sure that you're on 6.18.1 or 6.19.0? If so, can you share your config and debug logs so that I can take a closer look?

@drcapulet
Copy link
Author

@slevenick Apologies - I missed that we were using the beta provider there, can confirm we're not seeing drift there anymore with 6.19.0.

@slevenick
Copy link
Collaborator

Given that we've fixed the "removing all labels" problem once the server-side labels are rolled out I'm going to close this.

There are two separate problems in this issue, but both are solved once the GKE node pool has server-side applied labels and the user is on 6.18.1+

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.