Blog

Alright Node Pools, Which One of You is Dave?

By July 22, 2020 No Comments

Terraform’s create_before_destroy, random_id and keepers

Zoe Higgins
12 Minute Read

I recently found myself in a bit of a pickle when managing GKE node pools. I was tasked with minimising node downtime by introducing lifecycle rule create_before_destroy in Terraform, and it proved to be more complicated than I assumed going in.

Google’s Kubernetes Engine (GKE) Node pools can take upwards of twenty minutes to create and Terraform defaults to destroying the old resource prior to creating the new one when changes are made. By instead creating the updated resource before destroying the old one, we can avoid those 20 minutes of downtime. So it is important that we are able to set the lifecycle rule on the resources to be able to avoid that downtime. This comes with caveats though, as Terraform will try to create a node pool with the same name as the original. If two node pools have the same name (for example, Dave), Terraform will meet conflicts when trying to destroy the node pool with said name. We need to be able to destroy the right Dave. 

Ok, It seems simple then right?

One approach to avoiding two Daves (more generically known as a naming collision) is appending those names with a random ID, and setting a label on the node pools with their name (without the ID). This allows us to point our workloads to those pools no matter what the actual name is. This is Terraform’s recommended approach. However, the random_id resource doesn’t work in the way you may be familiar with when it comes to random values. We don’t get a random ID each time we run a plan or apply it on Terraform. So this will not work for us.

What do we do then?

Instead, you need to define when to produce a new ID with keepers, an argument within the random_id resource. Without keepers, you will find yourself in the same name conflict as before, but with two node pools who have the same random ID, making it not so random after all.

I hadn’t heard of keepers prior to this problem and, in this case, the Terraform documentation leaves much to be desired. Even if you have some understanding on how it works, the main cracks I fell through were:

  • When do I need to use keepers?
  • How do I use them when the object types get more complicated? 
  • How do we make sure we don’t kill the wrong Dave?! 

I’m going to focus on the above points and as I walk you through my personal solution.

Keepers: What are they in the context of Terraform

For a quick rundown on Terraform’s random_id resource, you can have a look at their documentation. An important note is that, by default, Terraform does not change the value of the random ID when it is applied. Instead, we need to specify when a new ID needs to be triggered (i.e. what the rules are for when the existing random_id resource should be replaced with a new one). That is to say, this random_id:

resource "random_id" "my_id" {
  byte_length = 8
}

Will be the same ID upon its creation (unless we manually destroy and recreate the resource). That’s fine, but often not particularly useful when we need the ID to change. Say we need a new ID every time local.my_variable changes:

locals {
  my_variable = 1 // previously 0
}

resource "random_id" "my_id" {
  byte_length = 8
}

We can’t do that without introducing keepers, an argument in the random_id resource. The argument is a map of values which determine when to make a new random_id. If any of the values have changed, a new ID is created. Using keepers, we can now trigger a new random_id:

locals {
  my_variable = 1 // previously 0
}

resource "random_id" "my_id" {
  byte_length = 8
  keepers = {
    my_var_check = local.my_variable // triggers new random_id
  }
}

The above is important when introducing resources with the create_before_destroy lifecycle rule. We are going to be using it in the context of node pools. Certain alterations to node pools cannot be made in place, and instead, require the destruction of the old node pool (Dave) and creation of the new (also Dave). The create_before_destroy lifecycle rule creates the new Dave before it destroys the old Dave. To avoid naming collision, we need to append a random_id to the name. So, whenever an alteration would cause a node pool destruction, we need a new random_id triggered.

A Simple Example (Multiple node pools)

Okay, so we are going to be working with a very basic set of node pools to begin with, that consists only of machine types and cluster names (I am setting these configurations in locals, but you can take the leap that this would be for variables set in a tfvars file, also): 

locals {
  cluster_name  = "names_cluster"
  node_config   = {
    dave        = {
      machine_type = "n1-standard-4"
    }
    joanna      = {
      machine_type = "f1-micro"
    }
  }              
}

resource "google_container_node_pool" "my_node_pools" {
  for_each = local.node_config
  
  name    = each.key
  cluster = local.cluster_name 

  node_config {
    machine_type = each.value.machine_type
  } 
}

We want to be able to append a random ID to the node pool name so we can set the create_before_destroy lifecycle rule to them. The label is also set to be the name of the node pool without the random ID, so that we can always point our workloads to the node pool via the label, without needing to know the random ID appended to the name.

locals {
  cluster_name  = "names_cluster"
  node_config   = {
    dave        = {
      machine_type = "n1-standard-4"
    }
    joanna      = {
      machine_type = "f1-micro"
    }
  }              
}

// this is new
resource "random_id" "suffix" {
  for_each    = local.node_config
  byte_length = 8
}

resource "google_container_node_pool" "my_node_pools" {
  for_each = local.node_config
  
  // name is new
  name    = "${each.key}-${random_id.suffix[each.key].hex}"
  cluster = local.cluster_name 

  node_config {
    machine_type = each.value.machine_type
  } 
  
  // this is new
  labels = {
    nodepool-name = each.key
  }

  // this is new
  lifecycle {
    create_before_destroy = true
  }
}

And finally, we should add the keepers necessary so that a new ID is created when the node pools would be otherwise destroyed and recreated on change. The reference to clusters and machine types has changed, as Terraform recommends we read values “through” the random_id resource where possible.

locals {
  cluster_name  = "names_cluster"
  node_config   = {
    dave        = {
      machine_type = "n1-standard-4"
    }
    joanna      = {
      machine_type = "f1-micro"
    }
  }              
}

resource "random_id" "suffix" {
  for_each    = local.node_config
  byte_length = 8
  // this is new
  keepers     = {
    cluster_name = local.cluster_name
    machine_type = each.value.machine_type
  }
}

resource "google_container_node_pool" "my_node_pools" {
  for_each = local.node_config
  
  name    = "${each.key}-${random_id.suffix[each.key].hex}"
  // this is new
  cluster = random_id.suffix[each.key].keepers.cluster_name 

  node_config {
    // this is new
    machine_type = random_id.suffix[each.key].keepers.machine_type
  } 

  labels = {
    nodepool-name = each.key
  }

  lifecycle {
    create_before_destroy = true
  }
}

And that’s it! If your use case consists of arguments with types similar to the above, you are golden. The problem really lies when you are considering structures more complicated than string or number values. Hence:

A Not-So-Simple Example (Multiple node pools with the same list)

My team and I realise that we need to be able to add a list of oauth_scopes to the config that are subject to change. Super. So I now need to be able to pass a list through keepers, where keepers only accepts string values. I decided to sort the list to ensure that re-ordering the list doesn’t invoke node pool destruction, and converting the list to a string to then split within the node pool resource:

locals {
  cluster_name  = "names_cluster"
  node_config   = {
    dave        = {
      machine_type = "n1-standard-4"
    }
    joanna      = {
      machine_type = "f1-micro"
    }
  }
// join() is a list to string function
// this is new
  oauth_scopes = join(“,”, sort([
    “logging-write”,
    “monitoring”,
    “trace-append”,
  ]))              
}

resource "random_id" "suffix" {
  for_each    = local.node_config
  byte_length = 8
  keepers     = {
    cluster_name = local.cluster_name
    machine_type = each.value.machine_type
    // this is new
    oauth_scopes = local.oauth_scopes
  }
}

resource "google_container_node_pool" "my_node_pools" {
  for_each = local.node_config
  
  name    = "${each.key}-${random_id.suffix[each.key].hex}"
  cluster = local.cluster_name 

  node_config {
    machine_type = each.value.machine_type
    // this is new
    oauth_scopes = split(
      “,”, random_id.suffix[each.key].keepers.oauth_scopes
    )
  } 

  labels = {
    nodepool-name = each.key
  }

  lifecycle {
    create_before_destroy = true
  }
}

Looks good to me! I can’t imagine that the team will need anything more complex.

An Absolutely Wild Example (Multiple node pools, each with their own list of maps)

So a couple of weeks later, the team needed something much more complex. They want to be able to set Kubernetes taints on the node structure so that they can prevent or avoid workloads on particular nodes or node pools. The taint structure is a list of maps per node pool. The maps each have a fixed maximum number of values, complying to the Kubernetes taint structure, as defined here

Some considerations before going forward:

  • The join() function only works from a list of strings to a string, it will not work on a list of maps
  • We are utilising our for_each loop in our google_container_node_pool resource block, and the number of items in the taint structure for each node pool may be different in length and value
  • We need to be able to pass through the keepers (which only takes strings), because the taint structure will cause a destruction and recreation of the node pool.

We needed to be able to pull the taint structure into lists of strings in some kind of way, so that it could be passed to the keepers. This is what we went with:

locals {
  cluster_name  = "names_cluster"
  node_config   = {
    dave        = {
      machine_type = "n1-standard-4"
      // this is new
      taints        = [
        {
          key    = “first-pool”
          value  = true
          effect = “NO_SCHEDULE”
        }
      ]
    }
    joanna      = {
      machine_type = "f1-micro"
      //  this is new
      taints       = []
    }
  }
  // join() is a list to string function
  oauth_scopes = join(",", sort([
    "logging-write",
    "monitoring",
    "trace-append",
  ]))              
}

resource "random_id" "suffix" {
  for_each    = local.node_config
  byte_length = 8
  keepers     = {
    cluster_name  = local.cluster_name
    machine_type  = each.value.machine_type
    oauth_scopes  = local.oauth_scopes
    // this is new
    taint_keys    = join(“,”, [
      for taint in each.value.taints :
        taint.key
    ])
    taint_values  = join(“,”, [
      for taint in each.value.taints :
        taint.value
    ])
    taint_effects = join(“,”, [
      for taint in each.value.taints :
        taint.effect
    ])
  }
}

resource "google_container_node_pool" "my_node_pools" {
  for_each = local.node_config
  
  name    = "${each.key}-${random_id.suffix[each.key].hex}"
  cluster = local.cluster_name 

  node_config {
    machine_type = each.value.machine_type
    oauth_scopes = split(
      ",", random_id.suffix[each.key].keepers.oauth_scopes
    )
    // this is new
    taint = local.node_config[each.key].taint
  } 

  labels = {
    nodepool-name = each.key
  }

  lifecycle {
    create_before_destroy = true
  }
}

Essentially what the above does is pass the list of each compiled attribute of the taints and converts that list to a string. If any of those lists have altered, it forces a new ID. There are drawbacks to this solution:

  • If someone simply changes the order of the list without deleting or adding elements to the taint config, this will cause a new ID to be made. We decided this risk was small, and a reordering would usually be to add a new taint in or to remove one. 
  • We aren’t referencing the final point to the taint structure through the keepers, which is what is recommended to “ensure that both resources will change together”. However, our google_container_node_pool resource already depends on our random_id resource in its name, so we also saw this risk as quite low.
  • There is a case where the list could compile and not see a change where there is one. For example, if the 1st map didn’t have a value, and the 2nd one did, and we later add the same value to the 1st map and remove it from the 2nd. It would be in the same order and the compiled list would look exactly the same. I think we should instead create strings that are the maps key and attribute, to be safe. e.g.
    “${taint.key}-${taint.effect}” instead of just taint.effect.

To Recap

The create_before_destroy lifecycle rule is hard, much harder than I assumed it would be when I saw it as an option. Particularly the restriction of keepers only accepting a string type within the random_id resource allows for complexity to what would otherwise be a pretty simple problem. My hope is that Terraform will eventually alter the accepted types of their keepers and have the conversion to string abstracted from us. Alternatively, since Terraform can recognise when a resource needs to be destroyed based on changes, it would be cool to have a more generic flag in place on Terraform’s random resources that allows us to create a new random resource whenever it is required.  Until then though, go ahead and stringify your data types as you see fit. The above is not the only way to avoid node pool naming collision, but it certainly is successful in keeping my Daves distinguishable and alive (while they are necessary). I hope you can employ and alter for your use cases as you see fit.

Do you have a better way of solving the problem? Let me know! We grappled at this for a few hours before deciding on the above!

Zoe Higgins
Cloud Engineer at Kasna
zoe.higgins@kasna.com.au

Extra Readings

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/256/files – Github user morgante who does a much more complex and thorough version of this problem

https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/id – Terraform documentation on random_id 

https://registry.terraform.io/providers/hashicorp/random/latest/docs#resource-keepers – Terraform documentation on resource keepers

https://www.terraform.io/docs/configuration/resources.html#create_before_destroy – Terraform documentation on create_before_destroy