Thursday, June 20, 2019

Crib-Notes: EC2 UserData Audit

Sometimes, I find that I'll return to a customer/project and forget what's "normal" for them in how they deploy their EC2s. If I know a given customer/project tends to deploy EC2s that include UserData, but they don't keep good records of what they tend to do for said UserData, I find the following BASH scriptlet to be useful for getting myself back into the swing of things:

for INSTANCE in $( aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceId' | \
                   sed -e '/^\[/'d -e '/^]/d' -e 's/^ *"//' -e 's/".*//' )
do
   printf "###############\n# %s\n###############\n" "${INSTANCE}"
   aws ec2 describe-instance-attribute --instance-id "${INSTANCE}" --attribute userData | \
   jq -r .UserData.Value | base64 -d
   echo
done | tee /tmp/DiceLab-EC2-UserData.log

To explain, what the above does is:
  1. Initiates a for-loop using ${INSTANCE} as the iterated-value
  2. With each iteration, the value injected into ${INSTANCE} is derived from a line of output from the aws ec2 describe-instances command. Normally, this command outputs a JSON document containing a bunch of information about each instance in the account-region. Using the --query option, the output is constrained to only output each EC2 instance's InstanceId value. This is then piped through sed so that the extraneous characters are removed, resulting in a clean list of EC2 instance-IDs.
  3. The initial printf line creates a bit of an output-header. This will make it easier to pore through the output and keep each iterated instance's individual UserData content separate
  4. Instance UserData is considered to be an attribute of a given EC2 instance. The aws ec2 describe-instance-attribute command is what is used to actually pull this content from the target EC2. I could have used a --query filter to constrain my output. However, I instead chose to use jq as it allows me to both constrain my output as well as do output-cleanup, eliminating the need for the kind of complex sed statement I used in the loop initialization (cygwin's jq was crashing this morning when I was attempting to use it in the loop-initialization phase - in case you were wondering about the inconsistent constraint/cleanup methods). Because the UserData output is stored as a BASE64-encoded string, I have to pipe the cleaned-up output through the base64 utility to get my plain-text data back.
  5. I inject a closing blank line into my output stream (via the echo command) to make the captured output slightly easier to scan.
  6. I like to watch my scriptlet's progress, but still like to capture that output into a file for subsequent perusal, thus I pipe the entire loop's output through tee so I can capture as I view.
I could have set it up so that each instance's data was dumped to an individual output-file. This would have saved the need for the printf and echo lines. However, I like having one, big file to peruse (rather than having to hunt through scads of individual files) ...and a single file-open/close action is marginally faster than scads of open/closes.

In an account-region that had hundreds of EC2s, I'd probably have been more selective with which instance-IDs I initiated my loop. I would have used a --filter statement in my aws ec2 describe-instances command - likely filtering by VPC-ID and one or two other selectors.