Upgrade GKE public-cluster’s Terraform module


From time to time Google introduces new features and changes that sometimes also force the Terraform modules to upgrade themselves. It was our case at Geko, where we were using the GKE module for public-cluster deployment&management at version 5.x. A few days ago, when we planned to update some parameters it came that Google had removed the support for the Kubernetes dashboard. It was completely deprecated and the module was failing because of it, so we were forced to upgrade the module in order to meet the new conditions. There were up to 3 major version upgrades available, so we decided to go for it and use the latest one. However, it was not a standalone solution as it required to handle Terraform state’s incoherences.

The aim of this lab is to learn how to upgrade the official Terraform module intended to deploy&manage a public GKE cluster. We will specially deal with module’s (kubernetes-engine.beta-public-cluster) breaking changes, and we will manage to obtain the consistent status we previously had before the failure which preceded the upgrade.

Estimated time to finish this lab: ~20 minutes

1. Remove the previous resources

It’s strongly encouraged to perform a tfstate file backup before continue!

It’s especially important to remove all the conflicting resources from the Terraform state as soon as they are bounded between them using dependencies. The goal here is to remove any deprecated binding prior to importing them again from the current “picture” there’s already deployed.

The main components on a Kubernetes cluster are the networks (and subnetworks), the node pool and the cluster itself. Let’s focus on them.

terraform state rm module.gke.google_container_cluster.primary
terraform state rm module.gke.google_container_node_pool.pools[0]
terraform state rm module.vpc.google_compute_network.network
terraform state rm module.vpc.google_compute_subnetwork.subnetwork[0]
terraform state rm module.vpc.google_compute_subnetwork.subnetwork[1]

2. Upgrade versions

Once removed the previous states the next step is to set the version for the required modules to the current latest version. For the GKE module the latest now it’s 8.1.0, but it will be allowed to automatically adopt minor upgrades (“~>”).

Upgrade the GKE cluster module
 module "gke" {
   source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-public-cluster"
-  version = "~> 5.0"
+  version = "~> 8.1"
Upgrade the VPC module
 module "vpc" {
-  source  = "github.com/terraform-google-modules/terraform-google-network?ref=v1.1.0"
+  source  = "github.com/terraform-google-modules/terraform-google-network?ref=v2.3.0"
Check the new resources

In order to find out if the new resources have experienced a name change (due to the modules upgrade), a Terraform plan is strongly encouraged.

On this case it has been found that some module’s internal hierarchy and also list’s indexes have changed.

- module.gke.google_container_node_pool.pools[0]
+ module.gke.google_container_node_pool.pools["default-node-pool"]

- module.vpc.google_compute_subnetwork.subnetwork[0]
+ module.vpc.module.subnets.google_compute_subnetwork.subnetwork["southamerica-east1/my-cluster-public"]

- module.vpc.google_compute_subnetwork.subnetwork[1]
+ module.vpc.module.subnets.google_compute_subnetwork.subnetwork["southamerica-east1/my-cluster-private"]

3. Import fresh resources

Keep in mind that the zone/region depends on your kind of cluster. If it’s zonal you must use the master zone (e.g. southamerica-east1-a). On the other hand, if it’s a regional cluster you must use the region (e.g. southamerica-east1). The following example assumes a regional cluster located at southamerica-east1, in the project “my-project“, and with a cluster name “my-cluster“. The network names were set accordingly to the cluster’s name, just adding the suffixes “private” and “public” to the subnets to properly differentiate them.

Note also the new module hierarchy and indexing.

# Global vars

# Cluster

# Node pool
terraform import $POOL_LOCAL $POOL_REMOTE

# Subnetworks

## Public

## Private

# Network

 4. Update parameters

It’s very likely you will encounter that after a Terraform plan the google_container_cluster resource still needs to be updated due to a subnetwork parameter change. The new subnet keys have made the indexes to change their order. Just edit your GKE module to replace the subnetwork parameter as below.

- subnetwork = module.vpc.subnets_names[0]
+ subnetwork = module.vpc.subnets_names[1]


As you may have read above, sometimes -when relying on third parties- could happen that a breaking change is introduced and you get yourself into troubles to get the service back again. Beside this, the solution could introduce collateral damages which will require additional sub-solutions. On this particular case regarding Terraform, dealing with inconsistent states is not really common nor recommended, but it comes that is the only method you have available to solve them on your tool-set.

I hope you’ve enjoyed this post and I encourage you to check our blog for other posts that you might find helpful. Do not hesitate to contact us if you would like us to help you on your projects.

See you on the next post!

Leave a Reply

Your email address will not be published. Required fields are marked *