Titular Discrepancy: EC2

Showing posts with label EC2. Show all posts

Wednesday, August 28, 2024

Getting the Most Out of EC2 Transfers With Private S3 Endpoints

Recently, I was given a project to help a customer migrate an on-premises GitLab installation into AWS. The current GitLab was pretty large: a full export of the configuration was nearly 500GiB in size.

It turned out a good chunk of that 500GiB was due to disk-hosted artifacts and LFS objects. Since I was putting it all into AWS, I opted to make use of GitLab's ability to store BLOBs in S3. Ultimately, that turned out to be nearly 8,000 LFS objects and nearly 150,000 artifacts (plus several hundred "uploads").

The first challenge was getting the on-premises data into my EC2. Customer didn't want to give me access to their on-premises network, so I needed to have them generate the export TAR-file and upload it to S3. Once in S3, I needed to get it into an EC2.

Wanting to make sure that the S3→EC2 task was as quick as possible, I selected an instance-type rated to 12.5Gbps of network bandwidth and 10Gbps of EBS bandwidth. However, my first attempt at downloading the TAR-file from S3 took nearly an hour to run: it was barely creeping along at 120MiB/s. Abysmal.

I broke out `iostat` and found that my target EBS was reporting 100% utilization and a bit less than 125MiB/s of average throughput. That seemed "off" to me, so I looked at the EBS. Was then that I noticed that the default volume-throughput was only 125MiB/s. So, I upped the setting to its maximum: 1000MiB/s. I re-ran the transfer only to find that, while the transfer-speed had improved, it had only improved to a shade under 150MiB/s. Still abysmal.

So, I started rifling through the AWS documentation to see what CLI settings I could change to improve things. First mods were:

max_concurrent_requests = 40
multipart_chunksize = 10MB
multipart_threshold = 10MB

This didn't really make much difference. `iostat` was showing really variable utilization-numbers, but mostly that my target-disk was all but idle. Similarly, `netstat` was showing only a handful of simultaneous-streams between my EC2 and S3.

Contacted AWS support. They let me know that S3 multi-part upload and download was limited to 10,0000 chunks. So, I did the math (<FILE_SIZE> / <MAX_CHUNKS>) and changed the above to:

max_concurrent_requests = 40
multipart_chunksize = 55MB
multipart_threshold = 64MB

This time, the transfers were running about 220-250MiB/s. While that was a 46% throughput increase, it was still abysmal. While `netstat` was finally showing the expected number of simultaneous connections, my `iostat` was still saying that my EBS was mostly idle.

Reached back out to AWS support. They had the further suggestion of adding:

preferred_transfer_client = crt
target_bandwidth = 10GB/s

To my S3 configuration. Re-ran my test and was getting ≈990MiB/s of continuous throughput for the transfer! This knocked the transfer speed down from fifty-five minutes to a shade over eight minutes. In other words, I was going to be able to knock nearly an hour off the upcoming migration-task.

In digging back through the documentation, it seems that, when one doesn't specify a preferred_transfer_client value, the CLI will select the `classic` (`python`) client. And, depending on your Python version, the performance ranges from merely-horrible to ungodly-bad: using RHEL 9 for my EC2, it was pretty freaking bad, but had been less-bad when using AWS for my EC2's OS. Presumably a difference in the two distro's respective Python versions?

Specifying a preferred_transfer_client value of `crt` (C run-time client) unleashed the full might and fury of my EC2's and GP3's capabilities.

Interestingly, this "use 'classic'" behavior isn't a universal auto-selection. If you've selected an EC2 with any of the instance-types:

p4d.24xlarge
p4de.24xlarge
p5.48xlarge
trn1n.32xlarge
trn1.32xlarge

The auto-selection gets you `crt`. Not sure why `crt` isn't the auto-selected value for Nitro-based instance-types. But, "it's what it's".

Side note: just selecting `crt` probably wouldn't have completely roided-out the transfer. I assume the further setting of `target_bandwidth` to `10GB/s` probably fully-unleashed things. There definitely wasn't much bandwidth leftover for me to actually monitor the transfer. I assume that the `target_bandwidth` parameter has a default value that's less than "all the bandwidth". However, I didn't actually bother to verify that.

Update:

After asking support "why isn't `crt` the default for more instance-types", I got back the reply:

Thank you for your response. I see that these particular P5, P4d and Trn1 instances are purpose built for high-performance ML training¹. Hence I assume the throughput needed for this ML applications needs to high and CRT is auto enabled for these instance types.

Currently, the CRT transfer client does not support all of the functionality available in the classic transfer client.
These are few limitations for CRT configurations²:

Region redirects - Transfers fail for requests sent to a region that does not match the region of the targeted S3 bucket.
max_concurrent_requests, max_queue_size, multipart_threshold, and max_bandwidth configuration values - Ignores these configuration values.

S3 to S3 copies - Falls back to using the classic transfer client 

All of which is to say that, once I set `preferred_transfer_client = crt` all of my other, prior settings got ignored.

Thursday, May 16, 2024

So You Work in Private VPCs and Want CLI Access to Your Linux EC2s?

Most of the AWS projects I work on, both currently and historically, have deployed most, if not all, of their EC2s into private VPC subnets. This means that, if one wants to be able directly login to their Linux EC2s' interactive shells, they're out of luck. Historically, to get something akin to direct access one had to set up bastion-hosts in a public VPC subnet, and then jump through to the EC2s one actually wanted to login to. How well one secured those bastion-hosts could make-or-break how well-isolated their private VPC subnets – and associated resources – were.

If you were the sole administrator or part of a small team, or were part of an arbitrary-sized administration-group that all worked from a common network (i.e., from behind a corporate firewall or through a corporate VPN), keeping a bastion-host secure was fairly easy. All you had to do was set up a security-group that allowed only SSH connections and only allowed them from one or a few source IP addresses (e.g. your corporate firewall's outbound NAT IP address). For a bit of extra security, one could eve prohibit password-based logins on the Linux bastions (instead, using SSH key-based login, SmartCards, etc. for authenticating logins). However, if you were a member of a team of non-trivial size and your team members were geographically-distributed, maintaining whitelists to protect bastion-hosts could become painful. That painfulness would be magnified if that distributed team's members were either frequently changing-up their work locations or were coming from locations where their outbound IP address would change with any degree of frequency (e.g., work-from-home staff whose ISPs would frequently change their routers' outbound IPs).

A few years ago, AWS introduced SSM and the ability to tunnel SSH connections through SSM (see the re:Post article for more). With appropriate account-level security-controls, the need for dedicated bastion-hosts and maintenance of whitelists effectively vanished. Instead, all one had to do was:

Register an SSH key to the target EC2s' account
Set up their local SSH client to allow SSH-over-SSM
Then SSH "directly" to their target EC2s

SSM would, effectively, "take care of the rest" …including logging of connections. If one were feeling really enterprising, one could enable key-logging for those SSM-tunneled SSH connections (a good search-engine query should turn up configuration guides; one such guide is toptal's). This would, undoubtedly make your organization's IA team really happy (and may even be required depending on security-requirements your organization is legally-required to adhere to) – especially if they don't yet have an enterprise session-logging tool purchased.

But what if your EC2s are hosting applications that require GUI-based access to set up and/or administer? Generally, you have two choices:

X11 display-redirection
SSH port-forwarding

Unfortunately, SSM is a fairly low-throughput solution. So, while doing X11 display-redirection from an EC2 in a public VPC subnet may be more than adequately performant, the same cannot be said when done through an SSH-over-SSM tunnel. Doing X11 display-redirection of a remote browser session – or, worse, an entire graphical desktop session (e.g., KDE or Gnome desktops) – is paaaaaainfully slow. For my own tastes, it's uselessly slow.

Alternately, one can use SSH port-forwarding as part of that SSH-over-SSM session. Then, instead of trying to send rendered graphics over the tunnel, one only sends the pre-rendered data. It's a much lighter traffic load with the result being a much quicker/livelier response. It's also pretty easy to set up. Something like:

ssh -L localhost:8080:$(
  aws ec2 describe-instances \
    --query 'Reservations[].Instances[].PrivateIpAddress' \
    --output text \
    --instance-ids <EC2_INSTANCE_ID>
):80 <USERID>@<EC2_INSTANCE_ID>

Is all you need. In the above, the argument to the -L flag is saying, "set up a tcp/8080 listener on my local machine and have it forward connections to the remote machine's tcp/80". The local and remote ports can be varied for your specific needs. You can even set up dynamic-forwarding by creating a SOCKS proxy (but this document is meant to be a starting point, not dive into the weeds).

Note that, while the above is using a subshell (via the $( … ) shell-syntax) to snarf the remote EC2's private IP address, one should be able to simply substitute "localhost". I simply prefer to try to speak to the remote's ethernet, rather than loopback, interface, since doing so can help identify firewall-type issues that might interfere with others' use of the target service.

Tuesday, September 20, 2022

Crib Notes: Quick Audit of EC2 Instance-Types

Was recently working on a project for a customer who was having performance issues. Noticed the customer was using t2.* for the problematic system. Also knew that I'd seen them using pre-Nitro instance-types on some other systems they'd previously complained about performance problems with. Wanted to put a quick list of "you might want to consider updating these guys" EC2s. Ended up executing:

$ aws ec2 describe-instances \
   --query 'Reservations[].Instances[].{Name:Tags[?Key == `Name`].Value,InstanceType:InstanceType}' \
   --output text | \
sed -e 'N;s/\nNAME//;P;D'

Because the describe-instances's command-output is multi-line – even with the applied --query filter – adding the sed filter was necessary to provide a nice, table-like output:

t3.medium       ingress.dev-lab.local
t2.medium       etcd1.dev-lab.local
m5.xlarge       k8snode.dev-lab.local
m6i.large       runner.dev-lab.local
t2.small        dns1.dev-lab.local
t3.medium       k8smaster.dev-lab.local
t2.medium       bastion.dev-lab.local
t3.medium       ingress.dev-lab.local
t2.medium       etcd0.dev-lab.local
m5.xlarge       k8snode.dev-lab.local
m6i.large       runner.dev-lab.local
m5.xlarge       k8snode.dev-lab.local
t2.xlarge       workstation.dev-lab.local
t2.medium       proxy.dev-lab.local
t2.small        dns0.dev-lab.local
t3.medium       ingress.dev-lab.local
t2.medium       etcd2.dev-lab.local
m5.xlarge       k8snode.dev-lab.local
t2.medium       mail.dev-lab.local
m6i.large       runner.dev-lab.local
t2.small        dns2.dev-lab.local
t3.medium       k8smaster.dev-lab.local
t2.medium       bastion.dev-lab.local
t2.medium       proxy.dev-lab.local

Friday, February 19, 2021

Working Around Errors Caused By Poorly-Built AMIs (Networking Edition)

Over the past several years, the team I work on created a set of provisioning-automation tools that we've used with/for a NUMBER of customers. The automation is pretty well designed to run "anywhere".

Cue current customer/project. They're an AWS-using customer. They maintain their own AMIs. Unfortunately, our automation would break during the hardening phase of the deployment automation. After a waste of more than a man-day, discovered the root cause of the problem: when they build their EL7 AMIs, they don't do an adequate cleanup job.

Discovered that there were spurious ifcfg-* files in the resultant EC2s' /etc/sysconfig/network-scripts directory. Customer's AMI-users had never really noticed this oversight. All they really knew was that "networking appears to work", so had never noticed that the network.service systemd unit was actually in a fault state. Whipped out journalctl to find that the systemd unit was attempting to online interfaces that didn't exist on their EC2s ...because, while there were ifcfg-* files present, corresponding interface-directories in /sys/class/net didn't actually exist.

Because our hardening tools, as part of ensuring that network-related hardenings all get applied, does (the equivalent of) systemctl restart network.service. Unfortunately, due to the aforementioned problem, this action resulted in a non-zero exit. Consequently, our tools were aborting.

So, how to pre-clean the system so that the standard provisioning automation would work? Fortunately, AWS lets you inject boot-time logic via cloud-init scripts. I whipped up a quick script to eliminate the superfluous ifcfg-* files:

for IFFILE in $( echo /etc/sysconfig/network-scripts/ifcfg-* )
do
   [[ -e /sys/class/net/${IFFILE//*ifcfg-/} ]] || (
      printf "Device %s not found. Nuking... " "${IFFILE//*ifcfg-/}" &&
      rm "${IFFILE}" || ( echo FAILED ; exit 1 )
      echo "Success!"
   )
done

Launched a new EC2 with the userData addition. When the "real" provisioning automation ran, no more errors. Dandy.

Ugh... Hate having to kludge to work around error-conditions that simply should not occur.

Thursday, June 20, 2019

Crib-Notes: EC2 UserData Audit

Sometimes, I find that I'll return to a customer/project and forget what's "normal" for them in how they deploy their EC2s. If I know a given customer/project tends to deploy EC2s that include UserData, but they don't keep good records of what they tend to do for said UserData, I find the following BASH scriptlet to be useful for getting myself back into the swing of things:

for INSTANCE in $( aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceId' | \
                   sed -e '/^\[/'d -e '/^]/d' -e 's/^ *"//' -e 's/".*//' )
do
   printf "###############\n# %s\n###############\n" "${INSTANCE}"
   aws ec2 describe-instance-attribute --instance-id "${INSTANCE}" --attribute userData | \
   jq -r .UserData.Value | base64 -d
   echo
done | tee /tmp/DiceLab-EC2-UserData.log

To explain, what the above does is:

Initiates a for-loop using ${INSTANCE} as the iterated-value
With each iteration, the value injected into ${INSTANCE} is derived from a line of output from the aws ec2 describe-instances command. Normally, this command outputs a JSON document containing a bunch of information about each instance in the account-region. Using the --query option, the output is constrained to only output each EC2 instance's InstanceId value. This is then piped through sed so that the extraneous characters are removed, resulting in a clean list of EC2 instance-IDs.
The initial printf line creates a bit of an output-header. This will make it easier to pore through the output and keep each iterated instance's individual UserData content separate
Instance UserData is considered to be an attribute of a given EC2 instance. The aws ec2 describe-instance-attribute command is what is used to actually pull this content from the target EC2. I could have used a --query filter to constrain my output. However, I instead chose to use jq as it allows me to both constrain my output as well as do output-cleanup, eliminating the need for the kind of complex sed statement I used in the loop initialization (cygwin's jq was crashing this morning when I was attempting to use it in the loop-initialization phase - in case you were wondering about the inconsistent constraint/cleanup methods). Because the UserData output is stored as a BASE64-encoded string, I have to pipe the cleaned-up output through the base64 utility to get my plain-text data back.
I inject a closing blank line into my output stream (via the echo command) to make the captured output slightly easier to scan.
I like to watch my scriptlet's progress, but still like to capture that output into a file for subsequent perusal, thus I pipe the entire loop's output through tee so I can capture as I view.

I could have set it up so that each instance's data was dumped to an individual output-file. This would have saved the need for the printf and echo lines. However, I like having one, big file to peruse (rather than having to hunt through scads of individual files) ...and a single file-open/close action is marginally faster than scads of open/closes.

In an account-region that had hundreds of EC2s, I'd probably have been more selective with which instance-IDs I initiated my loop. I would have used a --filter statement in my aws ec2 describe-instances command - likely filtering by VPC-ID and one or two other selectors.

Wednesday, April 17, 2019

Crib-Notes: Validating Consistent ENA and SRIOV Support in AMIs

One of the contracts I work on, we're responsible for producing the AMIs used for the entire enterprise. At this point, the process is heavily automated. Basically, we use a pipeline that leverages some CI tools, Packer and a suite of BASH scripts to do all the grunt-work and produce not only an AMI but artifacts like AMI configuration- and package-manifests.

When we first adopted Packer, it had some limitations on how it registered AMIs (or, maybe, we just didn't find the extra flags back when we first selected Packer – who knows, it's lost to the mists of time at this point). If you wanted the resultant AMIs to have ENA and/or SRIOV support baked in (we do), your upstream AMI needed to have it baked in as well. This necessitated creating our own "bootstrap" AMIs as you couldn't count on these features being baked in – not even within the upstream vendor's (in our case, Red Hat's and CentOS's) AMIs.

At any rate, because the overall process has been turned over from the people that originated the automation to people that basically babysit automated tasks, the people running the tools don't necessarily have a firm grasp of everything that the automation's doing. Further, the people that are tasked with babysitting the automation differe from run-to-run. While automation should see to it that this doesn't matter, sometimes it pays to be paranoid. So a quick way to assuage that paranoia is to run quick reports from the AWS CLI. The following snippet makes for an adequate, "fifty-thousand foot" consistency-check:

aws ec2 describe-images --owner <AWS_ACCOUNT_ID> --query \
      'Images[].[ImageId,Name,EnaSupport,SriovNetSupport]' \
      --filters 'Name=name,Values=<SEARCH_STRING_PATTERN>' \
      --out text | \
   aws 'BEGIN {
         printf("%-18s%-60s%-14s-10s\n","AMI ID","AMI Name","ENA Support","SRIOV Support")
      } {
         printf("%-18s%-60s%-14s-10s\n",$1,$2,$3,$4)
      }'

There's a lot of organizations and individuals publishing AMIs. Thus, we use the --owner flag to search only for AMIs we've published.
We produce a couple of different families of AMIs. Thus, we use the --filter statement to only show the subset of our AMIs we're interested in.
I really only care about four attributes of the AMIs being reported on: ImageId, Name, EnaSupport and SriovNetSupport. Thus, the use of the JMSE --query statement to suppress all output except for that in which I'm interested.
Since I want the output to be pretty, I used the compound awk statement to create a formatted header and apply the same formatting to the output from the AWS CLI (using but a tiny bit of the printf routine's many capabilities).

This will produce output similar to:

   AMI ID                 AMI Name                                        ENA Support  SRIOV Support
   ami-187af850f113c24e1  spel-minimal-centos-7-hvm-2019.03.1.x86_64-gp2  True         simple
   ami-91b38c446d188643e  spel-minimal-centos-7-hvm-2019.02.1.x86_64-gp2  True         simple
   ami-22867cf08bb264ac4  spel-minimal-centos-7-hvm-2019.01.1.x86_64-gp2  True         simple
   [...elided...]
   ami-71c3822ed119c3401  spel-minimal-centos-7-hvm-2018.03.1.x86_64-gp2  None         simple
   [...elided...]
   ami-8057c2bf443dc01f5  spel-minimal-centos-7-hvm-2016.06.1.x86_64-gp2  None         None

As you can see, not all of the above AMIs are externally alike. While this could indicate a process or personnel problem, what my output actually shows is evolution in our AMIs. Originally, we weren't doing anything to support SRIOV or ENA. Then we added SRIOV support (because our AMI users were finally asking for it). Finally, we added ENA support (mostly so we could use the full range and capabilities of the fifth-generation EC2 instance-types).

At any rate, running a report like the above, we can identfy if there's unexpected differences and, if a sub-standard AMI slips out, we can alert our AMI users "don't use <AMI> if you have need of ENA and/or SRIOV".

Monday, May 7, 2018

Streamed Backups to S3

Introduction/Background

Many would-be users of AWS come to AWS from a legacy hosting background. Often times, when moving to AWS, the question, "how do I back my stuff up when I no longer have access to my enterprise backup tools," is asked. If not, it's a question that would-be AWS users should be asking.

AWS provides a number of storage options. Each option has use-cases that it is optimized for. Each also has a combination of performance, feature and pricing tradeoffs (see my document for a quick summary of these tradeoffs). The lowest-cost - and therefore most attractive for data-retention use-cases typical of backups-related activities - is S3. Further, within S3, there are pricing/capability tiers that are appropriate to different types of backup needs (the following list is organized by price, highest to lowest):

If there is a need to perform frequent full or partial recoveries, the S3 Standard tier is probably the best option
If recovery-frequency is pretty much "never" — but needs to be quick if there actually is a need to perform recoveries — and the policies governing backups mandates up to a thirty-day recoverability window, the best option is likely the S3 Infrequent Access (IA) tier.
If there's generally no need for recovery beyond legal compliance capabilities, or the recovery-time objectives (RTO) for backups will tolerate a multi-hour wait for data to become unavailable, the S3 glacier layer is probable the best option.

Further, if projects backup needs span the usage profiles of the previous list, data lifecycle policies can be created that will move data from a higher-cost layer to a lower-cost layer based on time thresholds. And, to prevent being billed for data that has no further utility, the lifecycle policies can include an expiration-age at which AWS will simply delete and stop charging for the backed up data.

There are a couple of ways to get backup data into S3:

Copy: The easiest — and likely most well known — is to simply copy the data from a host into an S3 bucket. Every file on disk that's copied to S3 exists as an individually downloadable file in S3. Copy operations can be iterative or recursive. If the copy operation takes the form of a recursive-copy, basic location relationship between files is preserved (though, things like hard- or soft-links get converted into multiple copies of a given file). While this method is easy, it includes a loss of filesystem metadata — not just the previously-mentioned loss of link-style file-data but ownerships, permissions, MAC-tags, etc.
Sync: Similarly easy is the "sync" method. Like the basic copy Every file on disk that's copied to S3 exists as an individually downloadable file in S3. The sync operation is inherently recursive. Further, if an identical copy of a file exists within S3 at a given location, the sync operation will only overwrite the S3-hosted file if the to-be-copied file is different. This provides good support for incremental style backups. As with the basic copy-to-S3 method, this method results in the loss of file-link and other filesystem metadata.

Note: if using this method, it is probably a good idea to turn on bucket-versioning to ensure that each version of an uploaded file is kept. This allows a recovery operation to recover a given point-in-time's version of the backed-up file.
Streaming copy: This method is the least well-known. However, this method can be leveraged to overcome the problem of loss filesystem metadata. If the stream-to-S3 operation includes an inlined data-encapsulation operation (e.g., piping the stream through the tar utility), filesystem metadata will be preserved.

Note: the cost of preserving metadata via encapsulation is that the encapsulated object is opaque to S3. As such, there's no (direct) means by which to emulate an incremental backup operation.

Technical Implementation

As the title of this article suggests, the technical-implementation focus of this article is on streamed backups to S3.

Most users of S3 are aware of is static file-copy options. That is copying a file from an EC2 instance directly to S3. Most such users, when they want to store files in EC2 and need to retain filesystem metadata either look to things like s3fs or do staged encapsulation.

The former allows you to treat S3 as though it were a local filesystem. However, for various reasons, many organizations are not comfortable using FUSE-based filesystem-implementstions - particularly opensource project ones (usually due to fears about support if something goes awry)

The latter means using an archiving tool to create a pre-packaged copy of the data first staged to disk as a complete file and then copying that file to S3. Common archiving tools include the Linux Tape ARchive utility (`tar`), cpio or even `mkisofs`/`genisoimage`. However, if the archiving tool supports reading from STDIN and/or writing to STDOUT, the tool can be used to create an archive directly within S3 using S3's streaming-copy capabilities.

Best practices for backups is to ensure that the target data-set is in a consistent state. Generally, this means that the data to be archived is non-changing. This can be done by quiescing a filesystem ...or snapshotting a filesystem and backing up the snapshot. Use of LVM snapshots will be used to illustrate how to a consistent backup of a live filesystem (like those used to host the operating system.)

Note: this illustration assumes that the filesystem to be backed up is built on top of LVM. If the filesystem is built on a bare (EBS-provided) device, the filesystem will need to be stopped before it can be consistently streamed to S3.

The high-level procedure is as follows:

Create a snapshot of the logical volume hosting the filesystem to be backed up (note that LVM issues an `fsfreeze` operation before to creating the snapshot: this flushes all pending I/Os before making the snapshot, ensuring that the resultant snapshot is in a consistent state). Thin or static-sized snapshots may be selected (thin snapshots are especially useful when snapshotting multiple volumes within the same volume-group as one has less need to worry about getting the snapshot volume's size-specification correct).
Mount the snapshot
Use the archiving-tool to stream the filesystem data to standard output
Pipe the stream to S3's `cp` tool, specifying to read from a stream and to write to object-name in S3
Unmount the snapshot
Delete the snapshot
Validate the backup by using S3's `cp` tool, specifying to write to a stream and then read the stream using the original archiving tool's capability to read from standard input. If the archiving tool has a "test" mode, use that; if it does not, it is likely possible to specify /dev/null as its output destination.

For a basic, automated implementation of the above, see the linked-to tool. Note that this tool is "dumb": it assumes that all logical volumes hosting a filesystem should be backed up. The only argument it takes is the name of the S3 bucket to upload to. The script does only very basic "pre-flight" checking:

Ensure that the AWS CLI is found within the script's inherited PATH env.
Ensure that either an AWS IAM instance-role is attached to the instance or that an IAM user-role is defined in the script's execution environment (${HOME}/.aws/credential files not currently supported). No attempt is made to ensure the instance- or IAM user-role has sufficient permissions to write to the selected S3 bucket
Ensure that a bucket-name has been passed, but not checked for validity.

Once the pre-flights pass: the script will attept to snapshot all volumes hosting a filesystem; mount the snapshots under the /mnt hierarchy — recreating the original volumes' mount-locations, but rooted in /mnt; use the `tar` utility to encapsulate and stream the to-be-archived data to the s3 utility; use the S3 cp utility to write tar's streamed, encapsulated output to the named S3 bucket's "/Backups/" folder. Once the S3 cp utility closes the stream without errors, the script will then dismount and delete the snapshots.

Alternatives

As mentioned previously, it's possible to do similar actions to the above for filesystems that do not reside on LVM2 logical volumes. However, doing so will either require different methods for creating a consistent state for the backup-set or backing up postentially inconsistent data (and possibly even wholly missing "in flight" data).

EBS has the native ability to create copy-on-write snapshots. However, the EBS volume's snapshot capability is generally decoupled from the OS'es ability to "pause" a filesystem. One can use a tool — like those in the LxEBSbackups project — to coordinate the pausing of the filesystem so that the EBS snapshot can create a consistent copy of the data (and then unpause the filesystem as soon as the EBS snapshot has been started).

One can leave the data "as is" in the EBS snapshot or one can then mount the snapshot to the EC2 and execute a streamed archive operation to S3. The former has the value of being low effort. The latter has the benefit of storing the data to lower-priced tiers (even S3 standard is cheaper than snapshots of EBS volumes) and allowing the backed up data to be placed under S3 lifecycle policies.

Monday, December 29, 2014

Custom EL6 AMIs With a Root LVM

One of the customers I work for is a security-focused organization. As such, they try to follow the security guidelines laid out within the SCAP guidelines for the operating systems they deploy. This particular customer is also engaged in couple of "cloud" initiatives - a couple privately-hosted and one publicly-hosted option. For the publicly-hosted cloud initiative, they make use of Amazon Web Services EC2 services.

The current SCAP guidelines for Red Hat Enterprise Linux (RHEL) 6 draw the bulk of their content straight from the DISA STIGS for RHEL 6. There are a few differences, here and there, but the commonality between the SCAP and STIG guidance - at least as of the SCAP XCCDF 1.1.4 and STIG Version 1, Release 5, respectively - is probably just shy of 100% when measured on the recommended tests and fixes. In turn, automating the guidance in these specifications allow you to quickly crank out predictably-secure Red Hat, CentOS, Scientific Linux or Amazon Linux systems.

For the privately-hosted cloud initiatives, supporting this guidance was a straight-forward matter. The solutions my customer uses all support the capability to network-boot and provision a virtual machine (VM) from which to create a template. Amazon didn't provide similar functionality to my customer, somewhat limiting some of the things that can be done to create a fully-customized instance or resulting template (Amazon Machine Image - or "AMI" - in EC2 terminology).

For the most part this wasn't a problem to my customer. Perhaps the biggest sticking-point was that it meant that, at least initially, partitioning schemes used on the privately-hosted VMs couldn't be easily replicated on the EC2 instances.

Section 2.1.1 of the SCAP guidance calls for "/tmp", "/var", "/var/log", "/var/log/audit", and "/home" to each be on their own, dedicated partitions, separate from the "/" partition. On the privately-hosted cloud solutions, use of a common, network-based KickStart was used to carve the boot-disk into a /boot partition and an LVM volume-group (VG). The boot VG was then carved up to create the SCAP-mandated partitions.

With the lack of network-booting/provisioning support, it meant we didn't have the capability to extend our KickStart methodologies to the EC2 environment. Further, at least initially, Amazon didn't provide support for use of LVM on boot disks. The combination of the two limitations meant my customer couldn't easily meet the SCAP partioning requiremts. Lack of LVM meant that the boot disk had to be carved up using bare /dev/sdX devices. Lack of console defeated the ability to repartition an already-built system to create the requisite partitons on the boot disk. Initially, this meant that the AMIs we could field were limited to "/boot" and "/" partitions. This meant config-drift between the hosting environments and meant we had to get security-waivers for the Amazon-hosted environment.

Not being one who well-tolerates these kind of arbitrary-feeling deviances, I got to cracking with my Google searches. Most of what I found were older documents that focussed on how to create LVM-enabled, S3-backed AMIs. These weren't at all what I wanted - they were a pain in the ass to create, were stoopidly time-consuming to transfer into EC2 and the resultant AMIs hamstrung me on the instance-types I could spawn from them. So, I kept scouring around. In the comments section to one of the references for S3-backed AMIs, I saw a comment about doing a chroot() build. So, I used that as my next branch of Googling about.

Didn't find a lot for RHEL-based distros - mostly Ubuntu and some others. That said, it gave me the starting point that I needed to find my ultimate solution. Basically, that solution comes down to:

Pick an EL-based AMI from the Amazon Marketplace (I chose a CentOS one - I figured that using an EL-based starting point would ease creating my EL-based AMI since I'd already have all the tools I needed and in package names/formats I was already familiar with)
Launch the smallest instance-size possible from the Marketplace AMI (8GB when I was researching the problem)
Attach an EBS volume to the running instance - I set mine to the minimum size possible (8GB) figuring I could either grow the resultant volumes or, once I got my methods down/automated, use a larger EBS for my custom AMI.
Carve the attached EBS up into two (primary) partitions. I like using `parted` for this, since I can specify the desired, multi-partition layout (and all the offsets, partition types/labels, etc.) in one long command-string.
- I kept "/boot" in the 200-400MB range. Could probably keep it smaller since the plans weren't so much to patch instantiations as much as periodically use automated build tools to launch instances from updated AMIs and re-deploy the applications onto the new/updated instances.
- I gave the rest of the disk to the partition that would host my root VG.
I `vgcreate`d my root volume group, then carved it up into the SCAP-mandated partitions (minus "/tmp" - we do that as a tmpfs filesystem since the A/V tools that SCAP wants you to have tend to kill system performance if "/tmp" is on disk - probably not relevant in EC2, but consistency across environments was a goal of the exercise)
Create ext4 filesystems on each of my LVs and my "/boot" partition.
Mount all of the filesystems under "/mnt" to support a chroot-able install (i.e., "/mnt/root", "/mnt/root/var", etc.)
Create base device-nodes within my chroot-able install-tree (you'll want/need "/dev/console", "/dev/null", "/dev/zero", "/dev/random", "/dev/urandom", "/dev/tty" and "/dev/ptmx" - modes, ownerships and major/minor numbers should match what's in your live OS's)
Setup loopback mounts for "/proc", "/sys", "/dev/pts" and "/dev/shm",
Create "/etc/fstab" and "/etc/mtab" files within my chroot-able install-tree (should resemble the mount-scheme you want in your final AMI - dropping the "/mnt/root" from the paths)
Use `yum` to install the same package-sets to the chroot that our normal KickStart processes would install.
The `yum` install should have created all of your "/boot" files with the exception of your "grub.conf" type files.

Create a "/mnt/boot/grub.conf" file with vmlinuz/initramfs references matching the ones installed by `yum`.
Create links to your "grub.conf" file:
- You should have an "/mnt/root/etc/grub.conf" file that's a sym-link to your "/mnt/root/boot/grub.conf" file (be careful how you create this sym-link so you don't create an invalid link)
- Similarly, you'll want a "/mnt/root/boot/grub/grub.conf" linked up to "/mnt/root/boot/grub.conf" (not always necessary, but it's a belt-and-suspenders solution to some issues related to creating PVM AMIs)

Create a basic eth0 config file at "/mnt/root/etc/sysconfig/network-scripts/ifcfg-eth0". EC2 instances require the use of DHCP for networking to work properly. A minimal network config file should look something like:
```
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=on
IPV6INIT=no
```
Create a basic network-config file at "/mnt/root/etc/sysconfig/network". A minimal network config file should look something like:
```
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=localhost.localdomain
```
Append "UseDNS no" and "PermitRootLogin without-password" to the end of your "/mnt/root/etc/ssh/sshd_config" file. The former fixes connect-speed problems related to EC2's use of private IPs on their hosted instances. The latter allows you to SSH in as root for the initial login - but only with a valid SSH key (don't want to make newly-launched instances instantly ownable!)
Assuming you want instances started from your AMI to use SELinux:
- Do a `touch /mnt/root/.autorelabel`
- Make sure that the "SELINUX" value in "/mnt/root/etc/selinux/config" is set to either "permissive" or "enforcing"
Create an unprivileged login user within the chroot-able install-tree. Make sure a password is set and the the user is able to use `sudo` to access root (since I recommend setting root's password to a random value).
Create boot init script that will download your AWS public key into the root and/or maintenance user's ${HOME}/.ssh/authorized_keys file. At its most basic, this should be a run-once script that looks like:
```
curl -f http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key > /tmp/pubkey
install --mode 0700 -d ${KEYDESTDIR}
install --mode 0600 /tmp/pubkey ${KEYDESTDIR}/authorized_keys
```
Because "/tmp" is an ephemeral filesystem, the next time the instance is booted, the "/tmp/pubkey" will self-clean. Note that an appropriate destination-directory will need to exist

Clean up the chroot-able install-tree:

yum --installroot=/mnt/root/ -y clean packages
rm -rf /mnt/root/var/cache/yum
rm -rf /mnt/root/var/lib/yum
cat /dev/null > /mnt/root/root/.bash_history

Unmount all of the chroot-able install-tree's filesystems.
Use `vgchange` to deactivate the root VG
Using the AWS console, create a snapshot of the attached EBS.
Once the snapshot completes, you can then use the AWS console to create an AMI from the EBS-snapshot using the "Create Image" option. It is key that you set the "Root Device Name", "Virtualization Type" and "Kernel ID" parameters to appropriate values.
- The "Root Device Name" value will auto-populate as "/dev/sda1" - change this to "/dev/sda"
- The "Virtualization Type" should be set as "Paravirtual".
- The appropriate value for the "Kernel ID" parameter will vary from AWS availability-region to AWS availability-region (for example, the value for "US-East (N. Virginia)" will be different from the value for "EU (Ireland)"). In the drop-down, look for a description field that contains "pv-grub-hd00". There will be several. Look for the highest-numbered option that matches your AMIs architecture (for example, I would select the kernel with the description "pv-grub-hd00_1.04-x86_64.gz" for my x86_64-based EL 6.x custom AMI).
The other Parameters can be tweaked, but I usually leave them as is.
Click the "Create" button, then wait for the AMI-creation to finish.
Once the "Create" finishes, the AMI should be listed in your "AMIs" section of the AWS console.
Test the new AMI by launching an instance. If the instance successfully completes its launch checks and you are able to SSH into it, you've successfully created a custom, PVM AMI (HWM AMIs are fairly easily created, as well, but require some slight deviations that I'll cover in another document).

I've automated much of the above tasks using some simple shell scripts and the Amazon EC2 tools. Use of the EC2 tools is well documented by Amazon. Their use allows me to automate everything within the instance launched from the Marketplace AMI (I keep all my scripts in Git, so, prepping a Marketplace AMI for building custom AMIs takes maybe two minutes on top of launching the generic Marketplace AMI). When automated as I have, you can go from launching your Marketplace AMI to having a launchable custom AMI in as little as twenty minutes.

Properly automated, generating updated AMIs as security fixes or other patch bundles come out is as simple as kicking off a script, hitting the vending machine for a fresh Mountain Dew, then coming back to launch new, custom AMIs.