Permadiff and / or force replacement on `node.config.gcfs_config.enabled` #2100

wyardley · 2024-09-14T01:19:30Z

TL;DR

I'm seeing permadrift and an inability to suppress trying to update gcfs_config.enabled:

          + gcfs_config {
              + enabled = false
            }

Simple repro case below. I created the cluster with an older module version, then updated to 33.0.3. Creating with the new version creates no issue.

This could end up being a provider issue, but I would think at least when it's null, the module shouldn't try to set it.

Expected behavior

Terraform to not see a diff, and to apply changes in the case where the setting is changed in the module.

Observed behavior

Terraform keeps trying to set the value (and, in the case of < 6.2.0, also tries to force replace the nodepool)
5.43.0

          + gcfs_config { # forces replacement
              + enabled = false # forces replacement
            }

6.2.0

  ~ resource "google_container_node_pool" "pools" {
        id                          = "projects/zelig-tfprovidertest-sandbox/locations/us-central1/clusters/test-cluster/nodePools/primary"
        name                        = "primary"
        # (10 unchanged attributes hidden)

      ~ node_config {
            tags                        = [
                "gke-test-cluster",
                "gke-test-cluster-primary",
            ]
            # (16 unchanged attributes hidden)

          + gcfs_config {
              + enabled = false
            }

            # (3 unchanged blocks hidden)
        }

        # (5 unchanged blocks hidden)
    }

Terraform Configuration

provider "google" {
  region  = "us-central1"
  zone    = "us-central1-f"
}

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "5.43.0"
    }
  }
}

module "gke" {
  source                     = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version                    = "32.0.0" ## update this to 33.0.3 after
  project_id                 = [project id]
  name                       = "test-cluster"
  service_account_name       = "foobarbaz"
  grant_registry_access      = true
  kubernetes_version         = "1.30.3-gke.1639000"
  release_channel            = "UNSPECIFIED"
  region                     = "us-central1"
  zones                      = ["us-central1-f"]
  network                    = "default"
  subnetwork                 = "default"
  horizontal_pod_autoscaling = true
  enable_private_nodes       = true
  dns_cache                  = true
  ip_range_pods              = "foo-pods"
  ip_range_services          = "foo-services"
  master_ipv4_cidr_block     = "10.10.0.0/28"
  remove_default_node_pool = true
  node_pools = [
    # Note: this is intentionally different from the actual default,
    # "default-pool"
    {
      name                      = "primary"
      machine_type              = "e2-standard-4"
      total_min_count           = 1
      total_max_count           = 2
      local_ssd_count           = 0
      spot                      = false
      local_ssd_ephemeral_count = 0
      disk_size_gb              = 100
      disk_type                 = "pd-balanced"
      image_type                = "COS_CONTAINERD"
      enable_gcfs               = false
      enable_gvnic              = false
      logging_variant           = "DEFAULT"
      auto_upgrade              = false
      preemptible               = false
      pod_pids_limit            = 0
      cpu_manager_policy = ""
    },
  ]
}

Terraform Version

OpenTofu v1.8.2
on darwin_arm64
+ provider registry.opentofu.org/hashicorp/google v5.43.0
+ provider registry.opentofu.org/hashicorp/kubernetes v2.32.0
+ provider registry.opentofu.org/hashicorp/random v3.6.2

Additional information

The text was updated successfully, but these errors were encountered:

wyardley · 2024-09-14T02:53:37Z

side note: I tried deleting and re-importing the resource, in hopes that would resolve the problem, but it did not.

Looks like what it's getting back from the API is:

2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:  "nodePoolDefaults": {
2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:   "nodeConfigDefaults": {
2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:    "gcfsConfig": {},

But I don't actually see the provider trying to do anything when it "updates" the resource, and the fact that it sees the empty response as the value not being set is probably the underlying cause for this issue.

So this may be a provider issue, possibly related to GoogleCloudPlatform/magic-modules#11553

As I mentioned to the author here, the test doesn't actually test updates: https://github.com/dominykasn/magic-modules/blob/22ea4977c565faa522b501f380e1461604d67645/mmv1/third_party/terraform/services/container/go/resource_container_node_pool_test.go.tmpl#L1739-L1743

wyardley · 2024-09-14T17:14:24Z

@apeabody I think this is the right fix (upstream provider level issue introduced by vs. something with this module):
GoogleCloudPlatform/magic-modules#11717
Hopefully, someone can see if it's possible to get that approved quickly early next week if it looks good.

Unfortunately, I think this affects both 5.x and 6.x providers (albeit in different ways) - earlier ones will have the force replacement issue, and 6.2 will have the harmless (because the update doesn't actually succeed), but ugly, permadiff

wyardley · 2024-09-20T03:47:41Z

hashicorp/terraform-provider-google#19534
My fix should be going out in 6.4.0 🎉

Not sure if it will get cherry-picked back to 5.x or not.

apeabody · 2024-09-20T15:38:40Z

hashicorp/terraform-provider-google#19534 My fix should be going out in 6.4.0 🎉

Not sure if it will get cherry-picked back to 5.x or not.

Thanks @wyardley! - Keeping an eye out. - We might need to consider bumping the minimum to v6.4, at least for Autopilot sub-modules.

Fixes terraform-google-modules#2100 This basically replicates the fixes from terraform-google-modules#2093, terraform-google-modules#2095, but at the scope of implicitly defined nodepools.

wyardley · 2024-09-23T22:32:54Z

@apeabody after updating to TPG 6.4.0, had to make the changes in #2111 locally to fully resolve this issue.

wyardley added the bug Something isn't working label Sep 14, 2024

This was referenced Sep 14, 2024

Updates failing and not detecting false value for node.config.gcfs_config.enabled hashicorp/terraform-provider-google#19482

Closed

container: fix updates for node_config.gcfs_config and make optional GoogleCloudPlatform/magic-modules#11717

Merged

wyardley mentioned this issue Sep 23, 2024

fix: allow null enable_gcfs setting in defined nodepools #2111

Merged

apeabody closed this as completed in #2111 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permadiff and / or force replacement on `node.config.gcfs_config.enabled` #2100

Permadiff and / or force replacement on `node.config.gcfs_config.enabled` #2100

wyardley commented Sep 14, 2024

wyardley commented Sep 14, 2024 •

edited

Loading

wyardley commented Sep 14, 2024

wyardley commented Sep 20, 2024

apeabody commented Sep 20, 2024

wyardley commented Sep 23, 2024 •

edited

Loading

Permadiff and / or force replacement on node.config.gcfs_config.enabled #2100

Permadiff and / or force replacement on node.config.gcfs_config.enabled #2100

Comments

wyardley commented Sep 14, 2024

TL;DR

Expected behavior

Observed behavior

Terraform Configuration

Terraform Version

Additional information

wyardley commented Sep 14, 2024 • edited Loading

wyardley commented Sep 14, 2024

wyardley commented Sep 20, 2024

apeabody commented Sep 20, 2024

wyardley commented Sep 23, 2024 • edited Loading

Permadiff and / or force replacement on `node.config.gcfs_config.enabled` #2100

Permadiff and / or force replacement on `node.config.gcfs_config.enabled` #2100

wyardley commented Sep 14, 2024 •

edited

Loading

wyardley commented Sep 23, 2024 •

edited

Loading