Friday, August 3, 2018

JMSE Query Fu

Yesterday, I posted up an article illustrating the pain of LDAP's query-language. Today, I had to dick around with JMSE reporting.

The group I work for manages a number of AWS accounts in a number of regions. We also provide support to a number of tenants and their AWS accounts.

This support is necessary because, in much the same way AWS strives to make it easy to adopt "cloud", a lot of people flocking aboard don't necessarily know how to do so cost-effectively. Using AWS can be a cost-effective way to do things, but failure to exercise adequate house-keeping can bite you right in the wallet. Unfortunately, part of the low bar to entry means that a lot of users coming into AWS don't really comprehend that housekeeping is necessary. And, even if they did, they don't necessarily understand how to implement automated housekeeping methods. So, stuff tends to build up over time ...leading to unnecessary cost-exposures.

Inevitably, this leads to a "help: our expenses are running much higher than expected" types of situation. What usually turns out to be a majar cause is disused storage (and orphaned EC2s left running 24/7/365 ...but that's another story). Frequently, this comes out to some combination of orphaned (long-detached) EBS volumes, poorly maintained S3 buckets and elderly EBS snapshots.

IDin'g orphaned EBS volumes is pretty straight forward. Not a lot of query-fu is needed. To see it.

Poorly maintained S3 buckets are a skosh harder to suss out. But, you can usually just enable bucket inventory-reporting an lifecycle policies to help automate your cost-exposure reduction.

While EBS snapshots are relatively low-cost, when enough of them build up, you start to notice the expenses. Reporting can be relatively straightforward. However, if you happen to be maintaining fleets of custom AMIs, doing a simple "what can I delete" report becomes an exercise in query-filtering. Fortunately, AMI-related snapshots are generally identifiable (excludable) with a couple of filters, leaving you a list of snapshots to futher investigate. Unfortunately, writing the query-filters means dicking with JMSE filter syntax ...which is kind of horrid. Dunno if it's more or less horrid than LDAP's.

At any rate, I ended up writing a really simple reporting script to give me a basic idea of whether stale snapshots are even a problem ...and the likely scope of said problem if it existed. While few of our tenants maintain their own AMIs, we do in the account I primarily work with. So, I wrote my script to exclude their snapshots from relevant reports:

for YEAR in $( seq 2015 2018 )
do
   for MONTH in $( seq -w 1 12 )
   do
      for REGION in us-{ea,we}st-{1,2}
      do
         printf "%s-%s (%s): " "${YEAR}" "${MONTH}" "${REGION}"
         aws --region "${REGION}" ec2 describe-snapshots --owner <ACCOUNT> \
           --filter "Name=start-time,Values=${YEAR}-${MONTH}*" --query \
           'Snapshots[?starts_with(Description, `Created by CreateImage`) == `false`]|[?starts_with(Description, `Copied for DestinationAmi`) == `false`].[StartTime,SnapshotId,VolumeSize,Description]' \
           --output text | wc -l
      done
    done
done | sed '/ 0$/d'

The first suck part with a JMSE query is it can be really long. Worse: it  really doesn't tolerate line-breaking to make scripts less "sprawly". If you're like me, you like to keep a given line of code in a script no longer than "X" characters. I usually prefer X=80. JMSE queries pretty much say, "yeah: screw all that prettiness nonsense".

At any rate to quickly explain the easier parts of  the above:

  • First, this is a basic BASH wrapper (should work in other shells that are aware of Bourne-ish syntax)
  • I'm using nested loops: first level is year, second level is month, third level is AWS region. This allows easy grouping of output.
  • I'm using `printf` to give my output meaning
  • At the end of my `aws` command, I'm using `wc` to provide a simple count of lines returned: when one uses the `--output text` argument to the `aws` command, each object's returned data is done as a single line ...allowing `wc` to provide a quick tally of lines meeting the JMSE selection-criteria
  • At the end of all my looping, I'm using `sed` to suppress any lines where a given region/month/year has no snapshots found
Notice I don't have a bullet answering why I'm bothering to define an output string ...and then, essentially, throwing that string away. Simply put, I have to output something for `wc` to  count and I may as well output several useful items in case I want to `tee` it off and use the data in an extension to the script.

The fun part of the above is the JMSE horror-show:
  • When you query a EBS snapshot object, it outputs a JSON document. That document's structure looks like:
    {
        "Snapshots": [
            {
                "Description": "Snapshot Description Strin",
                "Tags": [
                    {
                        "Value": "TAG1 Value",
                        "Key": "TAG1"
                    },
                    {
                        "Value": "TAG2 Value",
                        "Key": "TAG2"
                    },
                    {
                        "Value": "TAG3 Value",
                        "Key": "TAG3"
                    }
                ],
                "Encrypted": false,
                "VolumeId": "",
                "State": "completed",
                "VolumeSize": ,
                "StartTime": "--
    T::.000Z", "Progress": "100%", "OwnerId": "", "SnapshotId": "" } ] }
  • The ".[StartTime,SnapshotId,VolumeSize,Description]" portion of the "--query" string constrains the output to containing the "StartTime", "SnapshotId", "VolumeSize" and "Description" elements from the object's JSON document. Again, not of immediate use when doing a basic element-count census but is convertible to something useful in later tasks.
  • When constructing a JMSE query, you're informing the tool, which elements of the JSON node-tree you want to work on. You pretty much always have to specify the top-level document node. In this case, that's "Snapshots". Thus, the initial part of the query-string is "Snapshots[]". Not directly germane to this document, but likely worth knowing:
    • In JMSE, "Snapshots[]" basically means "give me everything from 'Snapshots' on down".
    • An equivalent to this is "Snapshots[*]".
    • If you wanted just the first object returned, you'd use  "Snapshots[0]"
    • If you wanted just a sub-range of objects returned, you'd use  "Snapshots[X:Y]"
  • If you want to restrict your output based on selection criteria, JMSE provides the "[?]" construct. In this particular case, I'm using the "starts_with" query, or  "[?starts_with]". This query takes arguments: what sub-attribute you're querying; the string to match against and whether your positively or negatively selecting. Thus:
    • To eliminate snapshots with descriptions starting "Created by CreateImage", my query looks like: "?starts_with(Description, `Created by CreateImage`) == `false`"
    • To eliminate snapshots with descriptions starting "Copied for DestinationAmi", my query looks like: "?starts_with(Description, `Copied for DestinationAmi`) == `false`"
    • Were I selecting for either of these strings — rather than the current against-match — I would need to change my "false" selectors to "true".
  • To do a compound query of an OR type, one does "[query1]|[query2]". Notionally, I could string together as many of these as I wanted if I needed to select for/against any arbitrary number of attribute values by adding in further "|" operators.
At the end of running my script, I ended up with output that looked like:

    2017-07 (us-east-1): 61
    2017-08 (us-east-1): 746
    2017-09 (us-east-1): 196
    2017-10 (us-east-1): 4
    2017-11 (us-east-1): 113
    2017-12 (us-east-1): 600
    2018-01 (us-east-1): 149
    2018-03 (us-east-1): 6
    2018-04 (us-east-1): 3
    2018-06 (us-east-1): 302
    2018-07 (us-east-1): 1620
    2018-08 (us-east-1): 206
    2018-08 (us-west-2): 3

Obviously, there's some cleanup to be done in the queried account: it's unlikely that any snapshots older than 30 days — stuff that's from 2017 is almost a lock to be the result of "bad housekeeping".

No comments:

Post a Comment