Tuesday, December 5, 2017

I've Used How Much Space??

A customer of mine needed me to help them implement a full CI/CD tool-chain in AWS. As part of that implementation, they wanted to have daily backups. Small problem: the enterprise backup software that their organization normally uses isn't available in their AWS-hosted development account/environment. That environment is mostly "support it yourself".

Fortunately, AWS has a number of tools that can help with things like backup tasks. The customer didn't have strong specifications on how they wanted things backed up, retention-periods, etc. Just "we need daily backups". So I threw them together some basic "pump it into S3" type of jobs with the caveat "you'll want to keep an eye on this because, right now, there's no data lifecycle elements in place".

For the first several months things ran fine. Then, as they often do, problems began popping up. Their backup jobs started experiencing periodic errors. Wasn't able to find underlying causes. However, in my searching around, it occurred to me "wonder if these guys have been aging stuff off like I warned them they'd probably want to do."

AWS provides a nifty GUI option in the S3 console that will show you storage utilization. A quick look in their S3 backup buckets told me, "doesn't look like they have".

Not being over much of a GUI-jockey, I wanted something I could run from the CLI that could be fed to an out-of-band notifier. The AWS CLI offers the `s3api` tool-set that comes in handy for such actions. My first dig through (and some Googling), I sorted out "how do I get a total-utilization view for this bucket". It looks something like:

aws s3api list-objects --bucket toolbox-s3res-12wjd9bihhuuu-backups-q5l4kntxp35k \
    --output json --query "[sum(Contents[].Size), length(Contents[])]" | \
    awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
[
1671.5 GB
    423759
]

The above agreed with the GUI and was more space than I'd assumed they'd be using at this point. So, I wanted to see "can I clean up".

aws s3api list-objects --bucket toolbox-s3res-12wjd9bihhuuu-backups-q5l4kntxp35k \
    --prefix Backups/YYYYMMDD/ --output json \
    --query "[sum(Contents[].Size), length(Contents[])]" | \
    awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
[
198.397 GB
    50048
]
That "one day's worth of backups" was also more than expected. Last time I'd censused their backups (earlier in the summer), they had maybe 40GiB worth of data. They wanted a week's worth of backups. However, at 200GiB/day worth of backups, I could see that I really wasn't going to be able to trim the utilization. Also meant that maybe they were keeping on top of aging things off.
Note: yes, S3 has lifecycle policies that allow you to automate moving things to lower-cost tiers. Unfortunately, the auto-tiering (at least from regular S3 to S3-IA) has a minimum age of 30 days. Not helpful, here.
Saving grace: at least I snuffled up a way to verify get metrics without the web GUI. As a side effect, also meant I had a way to see that the amount that reaches S3 matches the amount being exported from their CI/CD applications.