Wednesday, July 21, 2021

What Permissions Do I Need

 In recent months, I've been converting some automation I originally wrote under CloudFormation to instead work under Terraform. Ultimately, the automation I wrote is going to be used in a different account than I (re)developed it in. As part of the customer's "least-privileges" deployment model, I needed to be able to specify to them all of the specific AWS IAM permissions that my TerraForm-based automation would need. Since the development account I've been working in doesn't provide me CloudTrail or other similarly-useful access, I had to find another way. Turns out, that "another way" is effectively built into Terraform, itself!

When one uses the TF_LOG=trace environment-variable, the activity-logging becomes very verbose. Burried amongst the storm of output is all of the IAM permissions that Terraform needs in order to perform its deployment, configuration and removal actions. Extracting it all was a matter of:

  1. Execute a `terraform apply` using:
      TF_LOG=trace terraform apply -autoapprove > apply.log
  2. Execute a `terraform apply` using:
      TF_LOG=trace terraform apply --autoapprove \
        -refresh-only > refresh.log
    `
  3. Execute a `terraform apply` using:
      TF_LOG=trace terraform destroy -autoapprove > destroy.log
Once each of the above completes successfully, one has three looooong output files. To extract the information (and put it in a format IAM administrators are more used to), a simple set of filters can be applied:

cat *.log | \
grep 'DEBUG: Request ' | \
sed -e 's/.*: Request//' \
    -e 's/ Details:.*$//' \
    -e 's#/#:#' | \
sort -u
This filter-set gives you a list that looks something like:
ec2:AuthorizeSecurityGroupEgress
 ec2:AuthorizeSecurityGroupIngress
 ec2:CreateSecurityGroup
 ec2:DescribeImages
 ec2:DescribeInstanceAttribute
 ec2:DescribeInstanceCreditSpecifications
 ec2:DescribeInstances
 ec2:DescribeSecurityGroups
 ec2:DescribeTags
 ec2:DescribeVolumes
 ec2:DescribeVpcs
 ec2:RevokeSecurityGroupEgress
 ec2:RunInstances
 elasticloadbalancing:AddTags
 elasticloadbalancing:CreateListener
 elasticloadbalancing:CreateLoadBalancer
 elasticloadbalancing:CreateTargetGroup
 elasticloadbalancing:DescribeListeners
 elasticloadbalancing:DescribeLoadBalancerAttributes
 elasticloadbalancing:DescribeLoadBalancers
 elasticloadbalancing:DescribeTags
 elasticloadbalancing:DescribeTargetGroupAttributes
 elasticloadbalancing:DescribeTargetGroups
 elasticloadbalancing:ModifyLoadBalancerAttributes
 elasticloadbalancing:ModifyTargetGroup
 elasticloadbalancing:ModifyTargetGroupAttributes
 elasticloadbalancing:SetSecurityGroups
 s3:GetObject
 s3:ListObjects
Which you can then pass on to the parties that set up your IAM roles.

Friday, February 19, 2021

Working Around Errors Caused By Poorly-Built AMIs (Networking Edition)

Over the past several years, the team I work on created a set of provisioning-automation tools that we've used with/for a NUMBER of customers. The automation is pretty well designed to run "anywhere".

Cue current customer/project. They're an AWS-using customer. They maintain their own AMIs. Unfortunately, our automation would break during the hardening phase of the deployment automation. After a waste of more than a man-day, discovered the root cause of the problem: when they build their EL7 AMIs, they don't do an adequate cleanup job.

Discovered that there were spurious ifcfg-* files in the resultant EC2s' /etc/sysconfig/network-scripts directory. Customer's AMI-users had never really noticed this oversight. All they really knew was that "networking appears to work", so had never noticed that the network.service systemd unit was actually in a fault state. Whipped out journalctl to find that the systemd unit was attempting to online interfaces that didn't exist on their EC2s ...because, while there were ifcfg-* files present, corresponding interface-directories in /sys/class/net didn't actually exist.

Because our hardening tools, as part of ensuring that network-related hardenings all get applied, does (the equivalent of) systemctl restart network.service. Unfortunately, due to the aforementioned problem, this action resulted in a non-zero exit. Consequently, our tools were aborting.

So, how to pre-clean the system so that the standard provisioning automation would work? Fortunately, AWS lets you inject boot-time logic via cloud-init scripts. I whipped up a quick script to eliminate the superfluous ifcfg-* files:  

for IFFILE in $( echo /etc/sysconfig/network-scripts/ifcfg-* )
do
   [[ -e /sys/class/net/${IFFILE//*ifcfg-/} ]] || (
      printf "Device %s not found. Nuking... " "${IFFILE//*ifcfg-/}" &&
      rm "${IFFILE}" || ( echo FAILED ; exit 1 )
      echo "Success!"
   )
done

Launched a new EC2 with the userData addition. When the "real" provisioning automation ran, no more errors. Dandy.

Ugh... Hate having to kludge to work around error-conditions that simply should not occur.