Skip to content

Permadiff and / or force replacement on node.config.gcfs_config.enabled #2100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wyardley opened this issue Sep 14, 2024 · 5 comments · Fixed by #2111
Closed

Permadiff and / or force replacement on node.config.gcfs_config.enabled #2100

wyardley opened this issue Sep 14, 2024 · 5 comments · Fixed by #2111
Labels
bug Something isn't working

Comments

@wyardley
Copy link
Contributor

TL;DR

I'm seeing permadrift and an inability to suppress trying to update gcfs_config.enabled:

          + gcfs_config {
              + enabled = false
            }

Simple repro case below. I created the cluster with an older module version, then updated to 33.0.3. Creating with the new version creates no issue.

This could end up being a provider issue, but I would think at least when it's null, the module shouldn't try to set it.

Expected behavior

Terraform to not see a diff, and to apply changes in the case where the setting is changed in the module.

Observed behavior

Terraform keeps trying to set the value (and, in the case of < 6.2.0, also tries to force replace the nodepool)
5.43.0

          + gcfs_config { # forces replacement
              + enabled = false # forces replacement
            }

6.2.0

  ~ resource "google_container_node_pool" "pools" {
        id                          = "projects/zelig-tfprovidertest-sandbox/locations/us-central1/clusters/test-cluster/nodePools/primary"
        name                        = "primary"
        # (10 unchanged attributes hidden)

      ~ node_config {
            tags                        = [
                "gke-test-cluster",
                "gke-test-cluster-primary",
            ]
            # (16 unchanged attributes hidden)

          + gcfs_config {
              + enabled = false
            }

            # (3 unchanged blocks hidden)
        }

        # (5 unchanged blocks hidden)
    }

Terraform Configuration

provider "google" {
  region  = "us-central1"
  zone    = "us-central1-f"
}

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "5.43.0"
    }
  }
}

module "gke" {
  source                     = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version                    = "32.0.0" ## update this to 33.0.3 after
  project_id                 = [project id]
  name                       = "test-cluster"
  service_account_name       = "foobarbaz"
  grant_registry_access      = true
  kubernetes_version         = "1.30.3-gke.1639000"
  release_channel            = "UNSPECIFIED"
  region                     = "us-central1"
  zones                      = ["us-central1-f"]
  network                    = "default"
  subnetwork                 = "default"
  horizontal_pod_autoscaling = true
  enable_private_nodes       = true
  dns_cache                  = true
  ip_range_pods              = "foo-pods"
  ip_range_services          = "foo-services"
  master_ipv4_cidr_block     = "10.10.0.0/28"
  remove_default_node_pool = true
  node_pools = [
    # Note: this is intentionally different from the actual default,
    # "default-pool"
    {
      name                      = "primary"
      machine_type              = "e2-standard-4"
      total_min_count           = 1
      total_max_count           = 2
      local_ssd_count           = 0
      spot                      = false
      local_ssd_ephemeral_count = 0
      disk_size_gb              = 100
      disk_type                 = "pd-balanced"
      image_type                = "COS_CONTAINERD"
      enable_gcfs               = false
      enable_gvnic              = false
      logging_variant           = "DEFAULT"
      auto_upgrade              = false
      preemptible               = false
      pod_pids_limit            = 0
      cpu_manager_policy = ""
    },
  ]
}

Terraform Version

OpenTofu v1.8.2
on darwin_arm64
+ provider registry.opentofu.org/hashicorp/google v5.43.0
+ provider registry.opentofu.org/hashicorp/kubernetes v2.32.0
+ provider registry.opentofu.org/hashicorp/random v3.6.2

Additional information

@wyardley wyardley added the bug Something isn't working label Sep 14, 2024
@wyardley
Copy link
Contributor Author

wyardley commented Sep 14, 2024

side note: I tried deleting and re-importing the resource, in hopes that would resolve the problem, but it did not.

Looks like what it's getting back from the API is:

2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:  "nodePoolDefaults": {
2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:   "nodeConfigDefaults": {
2024-09-13T19:47:25.138-0700 [DEBUG] provider.terraform-provider-google:    "gcfsConfig": {},

But I don't actually see the provider trying to do anything when it "updates" the resource, and the fact that it sees the empty response as the value not being set is probably the underlying cause for this issue.

So this may be a provider issue, possibly related to GoogleCloudPlatform/magic-modules#11553

As I mentioned to the author here, the test doesn't actually test updates: https://github.com/dominykasn/magic-modules/blob/22ea4977c565faa522b501f380e1461604d67645/mmv1/third_party/terraform/services/container/go/resource_container_node_pool_test.go.tmpl#L1739-L1743

@wyardley
Copy link
Contributor Author

@apeabody I think this is the right fix (upstream provider level issue introduced by vs. something with this module):
GoogleCloudPlatform/magic-modules#11717
Hopefully, someone can see if it's possible to get that approved quickly early next week if it looks good.

Unfortunately, I think this affects both 5.x and 6.x providers (albeit in different ways) - earlier ones will have the force replacement issue, and 6.2 will have the harmless (because the update doesn't actually succeed), but ugly, permadiff

@wyardley
Copy link
Contributor Author

hashicorp/terraform-provider-google#19534
My fix should be going out in 6.4.0 🎉

Not sure if it will get cherry-picked back to 5.x or not.

@apeabody
Copy link
Collaborator

hashicorp/terraform-provider-google#19534 My fix should be going out in 6.4.0 🎉

Not sure if it will get cherry-picked back to 5.x or not.

Thanks @wyardley! - Keeping an eye out. - We might need to consider bumping the minimum to v6.4, at least for Autopilot sub-modules.

wyardley added a commit to wyardley/terraform-google-kubernetes-engine that referenced this issue Sep 23, 2024
Fixes terraform-google-modules#2100
This basically replicates the fixes from terraform-google-modules#2093, terraform-google-modules#2095, but at the scope
of implicitly defined nodepools.
wyardley added a commit to wyardley/terraform-google-kubernetes-engine that referenced this issue Sep 23, 2024
Fixes terraform-google-modules#2100
This basically replicates the fixes from terraform-google-modules#2093, terraform-google-modules#2095, but at the scope
of implicitly defined nodepools.
@wyardley
Copy link
Contributor Author

wyardley commented Sep 23, 2024

@apeabody after updating to TPG 6.4.0, had to make the changes in #2111 locally to fully resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants