My role with my employer is frequently best described as "smoke jumper". That is, a given project they're either priming or, more frequently, subbing on is struggling and the end customer has requested further assistance getting things more on the track they were originally expecting to be on. How I'm usually first brought onto such projects is automation-support "surge".
In this context, "surge" means resources brought in to either fill gaps in the existing project-team created by turnover or augmenting that team with additional expertise. Most frequently, that's helping them either update and improve existing automation or write new automation. Either way, the code I tend to deliver tends to be both fairly compact and dense as well as flexible compared to what they've typically delivered to date.
One of my first-principles in delivering new functionality is to attempt to do so in a way that is easily deactivated or backed out. This project team, like others I've helped, uses Terraform, but in a not especially modular or function-isolating way. All of the deployments consist of main.tf, vars.tf and outputs.tf files and occational "template" files (usually simple HERE documents with simple variable-substitution actions). While they do (fortunately) make use of some data providers, they're not real disciplined about where they implement them. They embed them in either or both of a given service's the main.tf and vars.tf files. Me? I generally like all of my data-providers in data.tf type of files as it aids consistency and keeps the type of contents in the various, individual files "clean" from an offered-functionality standpoint.
Similarly, if I'm using templated content, I prefer to deliver it in ways that sufficiently externalizes the content to allow appropriate linters to be run on it. This kind of externalization not only allows such files to be more easily linted but, because it tends to remove encapsulation effects, it tends to make either debugging or extending the externalized content easier to do.
On a recent project, I was tasked with helping them automate the deployment of VPC endpoints into their AWS accounts. The customer was trying, to the greatest extent possible, prevent as much of their project-traffic from leaving their VPCs as possible.
When I started the coding-task, the customer wasn't able to tell me which specific services they wanted or needed so-enabled. Knowing that each such service-endpoint comes with recurring costs and not wanting them to accidentally break the bank, I opted to write my code in a way that, absent operator input, would deploy all AWS endpoint services into their VPCs but also easily allow them to easily dial things back when the first (shocking) bills came due.
The code I delivered worked well. However, as familiar with the framework as the incumbent team was, they were left a bit perplexed by the code I delivered. They asked me to do a walkthrough of the code for them. Knowing the history of the project – both from a paucity-of-documentation and staff-churn perspective – I opted to write an explainer document. What follows is that explanation.
Firstly, I delivered my contents as four, additional files rather than injecting my code into their existing main.tf, vars.tf and outputs.tf file-set. Doing so allowed them to wholly disable functionality simply by nuking the files I delivered rather than having to do file-surgery on their normal file-set. As my customer is operating in multiple AWS partitions, this makes dealing with partitions' API differences easier to roll back changes if the deployment-partition's APIs are older then their development-partition's are. The file-set I delivered was an endpoints_main.tf, endpoints_data.tf, endpoints_vars.tf and an endpoints_services.tpl.hcl file. Respectively these files encapsulate: primary functionality; data-provider definitions; definition of variables used in the "main" and "data" files; and an HCL-formatted default-endpoints template-file.
The most basic/easily-explained file is the default-endpoints template file, endpoints_services.tpl.hcl. The file consists of map-objects encapsulated in a larger list structure. The map-objects consist of name and type attribute-pairs. The name values were derived by executing:
aws ec2 describe-vpc-endpoint-services \ --query 'ServiceDetails[].{Name:ServiceName,Type:ServiceType[].ServiceType}' | \ sed -e '/\[$/{N;s/\[\n *"/"/;}' -e '/^[ ][ ]*]$/d' | \ tr '[:upper:]' '[:lower:]'
And then changing the literal region-names to "${endpoint-region}". This change allows Terraform's templatefile() function to sub in the desired-value when the template file is read – making the automation portable across both regions and partitions. The template-file's contents are also encapsulate with Terraform's jsondecode() function. This encapsulation is necesary to allow the templatefile() function to properly read the file in (so that the variable-substitution can occur).
Because I wanted the use of this template-file to be the fallback (default) behavior, I needed to declare its use as a fallback. This was done in the endpoint_data.tf file's locals {} section:
locals { vpc_endpoint_services = length(var.vpc_endpoint_services) == 0 ? jsondecode( templatefile( "./endpoint_services.tpl.hcl", { endpoint_region = var.region } ) ) }
In the above, we're using a ternary evaluation to set the value of the locally-scoped vpc_endpoint_services variable. If the size of the globally-scoped vpc_endpoint_services variable is "0", then the template file is used; otherwise, the content of the globally-scoped vpc_endpoint_services variable is used. The template-file's use is effected by using the templatefile() function to read the file in while substituting all occurrences of "${endpoint-region}" in the file with the value of the globally-scoped "region" variable.
Note: The surrounding jsondecode() function is used to convert the file-stream from the format previously set using the jsonencode() function at the beginning of the file. I'm not a fan of having to resort to this kind of kludgery, but, without it, the templatefile() function would error out when trying to populate the vpc_endpoint_services variable. If any reader has a better idea of how to attain the functionality desired in a less-kludgey way, please comment.
Where my customer needed the most explanation was the logic in the section:
data "aws_vpc_endpoint_service" "this" { for_each = { for service in local.vpc_endpoint_services : "${service.name}:${service.type}" => service } service_name = length( regexall( var.region, each.value.name ) ) == 1 ? each.value.name : "com.amazonaws.${var.region}.${each.value.name}" service_type = title(each.value.type) }
This section leverages Terraform's aws_vpc_endpoint_service data-source. My code gives it the reference id "this". Not a terribly original or otherwise noteworthy label, but, absent the need for multiple such references, it will do.
The for_each function iterates over the values stored in the locally-scoped vpc_endpoint_services object-variable. As it loops, assigns each dictionary-object – the name and type attribute-pairs – to the service loop-variable. In turn, the loop iteratively-exports an each.value.name and each.value.type variable.
I could have set the service_name variable to more-simply equal the each.value.name variable's value, however, I wanted to make life a bit less onerous for the automation-user. Instead of needing to specify the full service-name path-string, the short-name could be specified. Using the regexall() function to see if the value of the region globally-scoped variable was present in the each.value.name variable's value allows the length() function to be used as part of a ternary definition for the service_name variable. If returned length is "0", the operator-passed service-name is prepended with the fully-qualified service-path typically valid for the partition's region; if the returned length is "1", then the value already stored in the each.value.name variable is used.
Similarly, I didn't want the operator to need to care about the case of the service-type they were specifying. As such, I let Terraform's title(function) take care of setting the proper case of the each.value.type variable's value.
The service_type and service_name values are then returned when the data-provider is called from the endpoint_main.tf file's locals {} block is processed:
locals { […elided…] # Split Endpoints by their type gateway_endpoints = toset( [ for e in data.aws_vpc_endpoint_service.this : e.service_name if e.service_type == "Gateway" ] ) interface_endpoints = toset( [ for e in data.aws_vpc_endpoint_service.this : e.service_name if e.service_type == "Interface" ] ) […elided…] }
The gateway_endpoints and interface_endpoints locally-scoped variables are each list-variables. Each is populated by taking the service-name returned from the data.aws_vpc_endpoint_service.this data-provider if the selected-for service_type value matches. This list-vars are then iteratively-processed in the relevant resource "aws_vpc_endpoint" "interface_services" and resource "aws_vpc_endpoint" "gateway_services" stanzas.