Wednesday, May 24, 2017

Barely There

So, this morning I get an IM from one of my teammates asking, "you don't happen to have your own copy of <GitHub_Hosted_Project>, do you? I had kind of an 'oops' moment a few minutes ago." Unfortunately, because my A/V had had its own "oops" moment two days prior, all of my local project copies had gone *poof!*, as well.

Fortunately, I'd been making a habit of configuring our RedMine to do bare-syncs of all of our projects. So, I replied back, "hold on, I'll see what I can recover from RedMine." I update the security rules for our RedMine VM and then SSH in. I escalate my privileges and navigate to where RedMine's configured to create repository copies. I look at the repository copy and remember, "shit, this only does a bare copy. None of the actual files I need is here."

So, I consult the Googles to see if there's any articles on "how to recreate a full git repository from a bare copy" (and permutations thereof). Pretty much all the results are around "how to create a bare repository" ...not really helpful.

I go to my RedMine project's repository page and notice the "download" link when I click on one of the constituent files. I click on it to see whether I actually get the file's contents or if all the download link is is a (now broken) referral to the file on GitHub. Low and behold, the file downloads. It's a small project, so, absolute worst case, I can download all of the individual files an manually recreate the GitHub project and only lose my project-history.

That said, the fact that I'm able to download the files tells me, "RedMine has a copy of those files somewhere." So I think to myself, "mebbe I need to try another search: clearly RedMine is allowing me to check files out of this 'bare' repository, so perhaps there's something similar I can do more directly." I return to Google and punch in "git bare repository checkout". VoilĂ . A couple useful-looking hits. I scan through a couple of them and find that I can create a full clone from a bare repo. All I have to do is go into my (RedMine's) filesystem, copy my bare repository to a safe location (just in case) and then clone from it:

# find <BARE_REPOSITORY> -print | cpio -vpmd /tmp
     # cd /tmp
     # git clone <BARE_REPOSITORY_COPY> <FULL_REPOSITORY_TARGET>
     # find <FULL_REPOSITORY_TARGET>

That final find shows me that I now have a full repository (there's now a fully populated .git subdirectory in it). I chown the directory to my SSH user-account, then exit my sudo session (I'd ssh'ed in with key-forwarding enabled).

I go to GitHub and (re)create the nuked project, then, configure the on-disk copy of my files and git metadata to be able to push everything back to GitHub. I execute my git push and all looks good from the ssh session. I hop back to GitHub and there is all my code and all of my commit-history and other metadata. Huzzah!

I finish out by going back and setting up branch-protection and my push-rules and CI-test integrations. Life is good again.

Wednesday, May 17, 2017

The Savings Are In The POST

Been playing around with Artifactory for a client looking to implement a full CI/CD toolchain. My customer has an interesting budgeting method: they're better able to afford sunk costs than recurring costs. So, they splurged for the Enterprise Edition pricing but asked me to try to deploy it in a "cost-aware" fashion.

Two nice things that the Enterprise Edition of artifactory gives you: the ability to store artifacts directly to lower-cost "cloud" storage tiers and the ability to cluster. Initially, I wasn't going to bother with the latter: while the customer is concerned about reliability, the "cost-aware" method for running things means that design-resiliency is more critical than absolute availability/uptime. Having two or more nodes running didn't initially make sense, so I set the clustering component aside, and explored other avenues for resiliency.

The first phase of resiliency was externalizing stuff that was easily persisted.

Artifactory keeps much of its configuration and tracking information in a database. We're deploying the toolchain into AWS, so, offloading the management overhead of an external database to RDS was pretty much a no-brainer.

When you have the Enterprise Edition entitlements, Artifactory lets you externalize the storage of artifacts to cloud-storage. For a cost-aware deployment, storing gigabytes of data in S3 is much more economical than storing in an EBS volume. Storing it in S3 also means that the data has a high-degree of availability and protection right out of the box. Artifactory also makes it fairly trivial to set up storage tiering. This meant I was able to configure the application to stage recently-written or fetched data in either an SSD-backed EBS volume or leverage instance storage (fast, ephemeral storage). I could then let the tiering move data to S3 either as the local filesystem became full or the recently-written or fetched data aged.

With some of the other stuff I automated, once you had configuration and object data externalized, resiliency was fairly easy to accommodate. You could kill the running instance (or let a faulty one die), spin up a new instance, automate the install of the binaries and the sucking-down of any stub-data needed to get the replacement host knitted back to the external data-services. I assumed that Artifactory was going to be the same case.

For better or worse, Artifactory's relationship with it's database is a bit more complicated than some of the other CI/CD components. Specifically, just giving Artifactory the host:port and credentials for the database doesn't get your new instance talking to the database. No. It wants all that and it wants some authentication tokens and certificates to more-fully secure the Artifactory/database communications.

While backing these tokens and certificates up and accounting for them in the provisioing-time stub-data pull-down is a potential approach, when you have Enterprise Edition, you're essentially reinventing the wheel. You can, instead, take advantage of EE's clustering capabilities ...and run the Artifactory service as a single-node cluster. Doing so requires generating a configuration bundle-file via an API call and then putting that in your stub-data set. The first time you fire up a new instantiation of Artifactory, if that bundle-file is found in the application's configuration-root, it will "magic" the rest (automatically extract the bundle contents and update the factory config files to enable communication with the externalized services).

As of the writing of this post, the documentation for the HA setup is a bit wanting. They tell you "use this RESTful API call to generate the bundle file". Unfortunately, they don't go into great depth on how to convice Artifactory to generate the bundle. Yeah, there's other links in the documents for how to use the API calls to store data in Artifactory, but no "for this use-case, execute this". Ultimately, what it ended up being was:
# curl -u <ADMIN_USER> -X POST http://localhost:8081/artifactory/api/system/bootstrap_bundle
If you got an error or no response, you need to look in the Artifactory access logs for clues.

If your call was successful, you'd get a (JSON) return similar to:
Enter host password for user '<ADMIN_USER>':
{
    "file" : "/var/opt/jfrog/artifactory/etc/bootstrap.bundle.tar.gz"
}
The reason I gave the article the title I did was, because, previous to this exercise, I'd not had to muck with explicitly setting the method for passing the API calls to the command-endpoint. It's that "-X POST" bit that's critical. I wouldn't have known to use the documentation's search-function for the call-method had I not looked at the access logs and seen that Artifactory didn't like me using curl's default, GET-based method.

At any rate, once you have that bootstrap-bundle generated and stored with your other stub-data, all that's left is automating the instantiation-time creation of the ha-node.properties file. After a generic install, when those files are present, the newly-launched Artifactory instance joins itself back to the single-node cluster and all your externalized-data becomes available. All that taken care of, running "cost-aware" means that:

  • When you reach the close of service hours, you terminate your Artifactory host.
  • Just before your service hours start, you instantiate a new Artifactory host.
If you offer your service Mon-Fri from 06:00 to 20:00, you've saved yourself 148 hours of EC2 charges per calendar-week. If you're really aggressive about running cost-aware, you can apply similar "business hours" logic to your RDS (though, given the size of the RDS, the costs to create the automation may be more than you'll save in a year's running of the RDS instance).

Wednesday, November 23, 2016

Manually Mirroring Git Repos

If you're like me, you've had occasions where you need to replicate Git-managed projects from one Git service to another. If you're not like me, you use paid services that make such mundane tasks a matter of clicking a few buttons in a GUI. If, however, you need to copy projects from one repository-service to another and no one has paid to make a GUI buttton/config-page available to you, then you need to find other methods to get things done.

The following assumes that you have a git project hosted in one repository service (e.g, GitHub) that you wish to mirror to another repository service (e.g., BitBucket, AWS CodeCommit, etc). The basic workflow looks like the following:

Procedure Outline:

  1. Login to a git-enabled host
  2. Create a copy of your "source-of-truth" repository, depositing its contents to a staging-directory:
    git clone --mirror \
       <REPOSITORY_USER>@<REPOSITORY1.DNS.NAME>:<PROJECT_USER_OR_GROUP>/<PROJECT_NAME>.git \
       stage
  3. Navigate into the staging-directory:
    cd stage
  4. Set the push-destination to the copy-repository:
    git remote set-url --push origin \
       <REPOSITORY_USER>@<REPOSITORY2.DNS.NAME>:<PROJECT_USER_OR_GROUP>/<PROJECT_NAME>.git
  5. Ensure the staging-directory's data is still up to date:
    git fetch -p origin
  6. Push the copied source-repository's data to the copy-repository:
    git push --mirror

Procedure Outline:


Using an example configuration (the AMIgen6 project):
$ git clone --mirror git@github.com:ferricoxide/AMIgen6.git stage && \
  cd stage && \
  git remote set-url --push origin git@bitbucket.org:ferricoxide/amigen6-copy.git && \
  git fetch -p origin && \
  git push --mirror
Cloning into bare repository 'stage'...
remote: Counting objects: 789, done.
remote: Total 789 (delta 0), reused 0 (delta 0), pack-reused 789
Receiving objects: 100% (789/789), 83.72 MiB | 979.00 KiB/s, done.
Resolving deltas: 100% (409/409), done.
Checking connectivity... done.
Counting objects: 789, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (369/369), done.
Writing objects: 100% (789/789), 83.72 MiB | 693.00 KiB/s, done.
Total 789 (delta 409), reused 789 (delta 409)
To git@bitbucket.org:ferricoxide/amigen6-copy.git
 * [new branch]      ExtraRPMs -> ExtraRPMs
 * [new branch]      SELuser-fix -> SELuser-fix
 * [new branch]      master -> master
 * [new branch]      refs/pull/38/head -> refs/pull/38/head
 * [new branch]      refs/pull/39/head -> refs/pull/39/head
 * [new branch]      refs/pull/40/head -> refs/pull/40/head
 * [new branch]      refs/pull/41/head -> refs/pull/41/head
 * [new branch]      refs/pull/42/head -> refs/pull/42/head
 * [new branch]      refs/pull/43/head -> refs/pull/43/head
 * [new branch]      refs/pull/44/head -> refs/pull/44/head
 * [new branch]      refs/pull/52/head -> refs/pull/52/head
 * [new branch]      refs/pull/53/head -> refs/pull/53/head
 * [new branch]      refs/pull/54/head -> refs/pull/54/head
 * [new branch]      refs/pull/55/head -> refs/pull/55/head
 * [new branch]      refs/pull/56/head -> refs/pull/56/head
 * [new branch]      refs/pull/57/head -> refs/pull/57/head
 * [new branch]      refs/pull/62/head -> refs/pull/62/head
 * [new branch]      refs/pull/64/head -> refs/pull/64/head
 * [new branch]      refs/pull/65/head -> refs/pull/65/head
 * [new branch]      refs/pull/66/head -> refs/pull/66/head
 * [new branch]      refs/pull/68/head -> refs/pull/68/head
 * [new branch]      refs/pull/71/head -> refs/pull/71/head
 * [new branch]      refs/pull/73/head -> refs/pull/73/head
 * [new branch]      refs/pull/76/head -> refs/pull/76/head
 * [new branch]      refs/pull/77/head -> refs/pull/77/head

Updating (and Automating)

To keep your copy-repository's project in sync with your source-repository's project, periodically do:
cd stage && \
  git fetch -p origin && \
  git push --mirror
This can be accomplished by logging into a host and executing the steps manually or placing them into a cron job.

Thursday, November 3, 2016

Automation in a Password-only UNIX Environment

Occasionally, my organization has need to run ad hoc queries against a large number of Linux systems. Usually, this is a great use-case for an enterprise CM tool like HPSA, Satellite, etc. Unfortunately, the organization I consult to is between solutions (their legacy tool burned down and their replacement has yet to reach a working state). The upshot is that, one needs to do things a bit more on-the-fly.

My preferred method for accessing systems is using some kind of token-based authentication framework. When hosts are AD-integrated, you can often use Kerberos for this. Failing that, you can sub in key-based logins (if all of your targets have your key as an authorized key). While my customer's systems are AD-integrated, their security controls preclude the use of both AD/Kerberos's single-signon capabilities and the use of SSH key-based logins (and, to be honest, almost none of the several hundred targets I needed to query had my key configured as an authorized key).

Because (tunneled) password-based logins are forced, I was initially looking at the prospect of having to write an expect script to avoid having type in my password several hundred times. Fortunately, there's an alternative to this in the tool "sshpass".

SSH pass lets you supply your password with a number of methods: command-line argument, a password-containing file, an environment variable value or even a read from STDIN. I'm not a fan of text files containing passwords (they've a bad tendency to be forgotten and left on a system - bad juju). I'm not particularly a fan of command-line arguments, either - especially on a multi-user system where others might see your password if they `ps` at the wrong time (which increases in probability as the length of time your job runs goes up). The STDIN method is marginally less awful than the command arg method (for similar reasons). At least with an environment variable, the value really only sits in memory (especially if you've set your HISTFILE location to someplace non-persistent).

The particular audit I was doing was an attempt to determine the provenance of a few hundred VMs. Over time, the organization has used templates authored by several groups - and different personnel within one of the groups. I needed to scan all of the systems to see which template they might have been using since the template information had been deleted from the hosting environment. Thus, I needed to run an SSH-encapsulated command to find the hallmarks of each template. Ultimately, what I ended up doing was:

  1. Pushed my query-account's password into the environment variable used by sshpass, "SSHPASS"
  2. Generated a file containing the IPs of all the VMs in question.
  3. Set up a for loop to iterate that list
  4. Looped `sshpass -e ssh -o PubkeyAuthentication=no StrictHostKeyChecking=no <USER>@${HOST} <AUDIT_COMMAND_SEQUENCE> 2>&1
  5. Jam STDOUT through a a sed filter to strip off the crap I wasn't interested in and put CSV-appropriate delimeters into each, queried host's string.
  6. Capture the lot to a text file
The "PubkeyAuthentication=no" option was required because I pretty much always have SSH-agent (or agent-forwarding) enabled. This causes my key to be passed. With the targets' security settings, this causes the connection to be aborted unless I explicitly suppress the passing of my agent-key.

The "StrictHostKeyChecking=no" option was required because I'd never logged into these hosts before. Our standard SSH client config is to require confirmation before accepting the remote key (and shoving it into ${HOME}/.ssh/known_hosts). Without this option, I'd be required to confirm acceptance of each key ...which is just about as automation-breaking as having to retype your password hundreds of times is.

Once the above was done, I had a nice CSV that could be read into Excel and a spreadsheet turned over to the person asking "who built/owns these systems". 

This method also meant that for the hosts that refused the audit credentials, it was easy enough to report "...and this lot aren't properly configured to work with AD".

Monday, October 17, 2016

Update to EL6 and the Pain of 10Gbps Networking in AWS

In a previous post, EL6 and the Pain of 10Gbps Networking in AWS, EL6 and the Pain of 10Gbps Networking in AWS, I discussed how to enable using third-party ixgbevf drivers to support 10Gbps networking in AWS-hostd RHEL 6 and CentOS 6 instances. Under further testing, it turns out that this is overkill. The native drivers may be used, instead, with minimal hassle.

It turns out that, in my quest to ensure that my CentOS and RHEL AMIs contained as many of the AWS utilities present in the Amazon Linux AMIs, that I included two RPMS - ec2-net and ec2-net-util - that were preventing use of the native drivers. Skipping these two RPMs (and possibly sacrificing ENI hot-plug capabilities) allows a much more low-effort support of 10Gpbs networking in AWS-hosted EL6 instances.

Absent those RPMS, 10Gbps support becomes a simple matter of:
  1. Add add_drivers+="ixgbe ixgbevf" to the AMI's /etc/dracut.conf file
  2. Use dracut to regenerate the AMI's initramfs.
  3. Ensure that there are no persistent network device mapping entries in /etc/udev/rules.d.
  4. Ensure that there are no ixgbe or ixgbevf config directives in /etc/modprobe.d/* files.
  5. Enable sr-iov support in the instance-to-be-registered
  6. Register the AMI 
The resulting AMIs (and instances spawned from them) should support 10Gbps networking and be compatible with M4-generation instance-types.

Friday, October 7, 2016

Using DKMS to maintain driver modules

In my prior post, I noted that maintaining custom drivers for the the kernel in RHEL and CentOS hosts can be a bit painful (and prone to leaving you with an unreachable or even unbootable system). One way to take some of the pain out of owning a system with custom drivers is to leverage DKMS. In general, DKMS is the recommended way to ensure that, as kernels are updated, required kernel modules are also (automatically) updated.

Unfortunately, use of the DKMS method will require that developer tools (i.e., the GNU C-compiler) be present on the system - either in perpetuity or just any time kernel updates are applied. It is very likely that your security team will object to - or even prohibit - this. If the objection/prohibition cannot be overridden, use of the DKMS method will not be possible.

Steps

  1. Set an appropriate version string into the shell-environment:
    export VERSION=3.2.2
  2. Make sure that appropriate header files for the running-kernel are installed
    yum install -y kernel-devel-$(uname -r)
  3. Ensure that the dkms utilities are installed:
    yum --enablerepo=epel install dkms
  4. Download the driver sources and unarchive into the /usr/src directory:
    wget https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/${VERSION}/ixgbevf-${VERSION}.tar.gz/download \
        -O /tmp/ixgbevf-${VERSION}.tar.gz && \
       ( cd /usr/src && \
          tar zxf /tmp/ixgbevf-${VERSION}.tar.gz )
  5. Create an appropriate DKMS configuration file for the driver:
    cat > /usr/src/ixgbevf-${VERSION}/dkms.conf << EOF
    PACKAGE_NAME="ixgbevf"
    PACKAGE_VERSION="${VERSION}"
    CLEAN="cd src/; make clean"
    MAKE="cd src/; make BUILD_KERNEL=\${kernelver}"
    BUILT_MODULE_LOCATION[0]="src/"
    BUILT_MODULE_NAME[0]="ixgbevf"
    DEST_MODULE_LOCATION[0]="/updates"
    DEST_MODULE_NAME[0]="ixgbevf"
    AUTOINSTALL="yes"
    EOF
  6. Register the module to the DKMS-managed kernel tree:
    dkms add -m ixgbevf -v ${VERSION}
  7. Build the module against the currently-running kernel:
    dkms build ixgbevf/${VERSION}

Verification

The easiest way to verify the correct functioning of DKMS is to:
  1. Perform a `yum update -y`
  2. Check that the new drivers were created by executing `find /lib/modules -name ixgbevf.ko`. Output should be similar to the following:
    find /lib/modules -name ixgbevf.ko | grep extra
    /lib/modules/2.6.32-642.1.1.el6.x86_64/extra/ixgbevf.ko
    /lib/modules/2.6.32-642.6.1.el6.x86_64/extra/ixgbevf.ko
    There should be at least two output-lines: one for the currently-running kernel and one for the kernel update. If more kernels are installed, there may be more than just two output-lines
     
  3. Reboot the system, then check what version is active:
    modinfo ixgbevf | grep extra
    filename:       /lib/modules/2.6.32-642.1.1.el6.x86_64/extra/ixgbevf.ko
    If the output is null, DKMS didn't build the new module.

Wednesday, October 5, 2016

EL6 and the Pain of 10Gbps Networking in AWS

AWS-hosted instances with optimized networking-support enabled see the 10Gbps interface as an Intel 10Gbps ethernet adapter (`lspci | grep Ethernet` will display a string similar to Intel Corporation 82599 Virtual Function). This interface makes use the ixgbevf network-driver. Enterprise Linux 6 and 7 bundle the version 2.12.x version of the driver into the kernel RPM. Per the Enabling Enhanced Networking with the Intel 82599 VF Interface on Linux Instances in a VPC document, AWS enhanced networking recommends version 2.14.2 or higher of the ixgbevf network-driver. To meet AWS's recommendations for 10Gbps support within an EL6 instance, it will be necessary to update the ixgbevf driver to at least version 2.14.2.

The ixgbevf network-driver source-code can be found on SourceForge. It should be noted that not every AWS-compatible version will successfully compile on EL 6. Version 3.2.2 is known to successfully compile, without intervention, on EL6.


Notes:


>>>CRITICAL ITEM<<<

It is necessary to recompile the ixgbevf driver and inject it into the kernel each time the kernel version changes. This needs to be done between changing the kernel version and rebooting into the new kernel version. Failure to update the driver each time the kernel changes will result in the instance failing to return to the network after a reboot event.

Step #10 from the implementation procedure:
rpm -qa kernel | sed 's/^kernel-//' | xargs -I {} dracut -v -f /boot/initramfs-{}.img {}
Is the easiest way to ensure any available kernels are properly linked against the ixgbevf driver.

>>>CRITICAL ITEM<<<

It is possible that the above process can be avoided by installing DKMS and letting it coordinate the insertion of the ixgbevf driver modules into updated kernels.


Procedure:


The following assumes the instance-owner has privileged access to the instance OS and can make AWS-level configuration changes to the instance-configuration:
  1. Login to the instance
  2. Escalate privileges to root
  3. Install the up-to-date ixgbevf driver. This can be installed either by compiling from source or using pre-compiled binaries.
  4. Delete any `*persistent-net*.rules` files found in the /etc/udev/rules.d directory (one or both of 70-persistent-net.rules and 75-persistent-net-generator.rules may be present)
  5. Ensure that an `/etc/modprobe.d` file with the following minimum contents exists:
    alias eth0 ixgbevf
    
    Recommend creating/placing in /etc/modprobe.d/ifaliases.conf
  6. Unload any ixgbevf drivers that may be in the running kernel:
    modprobe -rv ixgbevf
    
  7. Load the updated ixgbevf driver into the running kernel:
    modprobe -v ixgbevf
    
  8. Ensure that the /etc/modprobe.d/ixgbevf.conf file exists. Its contents should resemble:
    options ixgbevf InterruptThrottleRate=1
    
  9. Update the /etc/dracut.conf. Ensure that the add_drivers+="" directive is uncommented and contains reference to the ixgbevf modules (i.e., `add_drivers+="ixgbevf"`)
  10. Recompile all installed kernels:
    rpm -qa kernel | sed 's/^kernel-//'  | xargs -I {} dracut -v -f /boot/initramfs-{}.img {}
    
  11. Shut down the instance
  12. When the instance has stopped, use the AWS CLI tool to enable optimized networking support:
    aws ec2 --region  modify-instance-attribute --instance-id  --sriov-net-support simple
    
  13. Power the instance back on
  14. Verify that 10Gbps capability is available:
    1. Check that the ixgbevf module is loaded
      $ sudo lsmod
      Module                  Size  Used by
      ipv6                  336282  46
      ixgbevf                63414  0
      i2c_piix4              11232  0
      i2c_core               29132  1 i2c_piix4
      ext4                  379559  6
      jbd2                   93252  1 ext4
      mbcache                 8193  1 ext4
      xen_blkfront           21998  3
      pata_acpi               3701  0
      ata_generic             3837  0
      ata_piix               24409  0
      dm_mirror              14864  0
      dm_region_hash         12085  1 dm_mirror
      dm_log                  9930  2 dm_mirror,dm_region_hash
      dm_mod                102467  20 dm_mirror,dm_log
      
    2. Check that `ethtool` is showing that the default interface (typicall "`eth0`") is using the ixgbevf driver:
      $ sudo ethtool -i eth0
      driver: ixgbevf
      version: 3.2.2
      firmware-version: N/A
      bus-info: 0000:00:03.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: yes
      supports-priv-flags: no
      
    3. Verify that the interface is listed as supporting a link mode of `10000baseT/Full` and a speed of `10000Mb/s`:
      $ sudo ethtool eth0
      Settings for eth0:
              Supported ports: [ ]
              Supported link modes:   10000baseT/Full
              Supported pause frame use: No
              Supports auto-negotiation: No
              Advertised link modes:  Not reported
              Advertised pause frame use: No
              Advertised auto-negotiation: No
              Speed: 10000Mb/s
              Duplex: Full
              Port: Other
              PHYAD: 0
              Transceiver: Unknown!
              Auto-negotiation: off
              Current message level: 0x00000007 (7)
                                     drv probe link
              Link detected: yes
      

Thursday, August 25, 2016

Use the Force, LUKS

Not like there aren't a bunch of LUKS guides out there already ...mostly posting this one for myself.

Today, was working on turning the (attrocious - other than a long-past deadline, DISA, do you even care what you're publishing?) RHEL 7 V0R2 STIGs specifications into configuration management elements for our enterprise CM system. Got to the STIG item for "ensure that data-at-rest is encrypted as appropriate". This particular element is only semi-automatable ...since it's one of those "context" rules that has a "if local policy requires it" back-biting element to it. At any rate, this particular STIG-item prescribes the use of LUKs.

As I set about to write the code for this security-element, it occurred to me, "we typically use array-based storage encryption - or things like KMS in cloud deployments - that I can't remember how to cofigure LUKS ...least of all configure it so it doesn't require human intervention to mount volumes." So, like any good Linux-tech, I petitioned the gods of Google. Lo, there were many results — most falling into either the "here's how you encrypt a device" or the "here's how you take an encrypted device and make the OS automatically remount it at boot" camps. I was looking to do both so that my test-rig could be rebooted and just have the volume there. I was worried about testing whether devices were encrypted, not whether leaving keys on a system was adequately secure.

At any rate, at least for testing purposes (and in case I need to remember these later), here's what I synthesized from my Google searches.

  1. Create a directory for storing encryption key-files. Ensure that directory is readable only by the root user:
    install -d -m 0700 -o root -g root /etc/crypt.d
  2. Create a 4KB key from randomized data (stronger encryption key than typical, password-based unlock mechanisms):
    # dd if=/dev/urandom of=/etc/crypt.d/cryptFS.key bs=1024 count=4
    ...writing the key to the previously-created, protected directory. Up the key-length by increasing the value of the count parameter.
     
  3. Use the key to create an encrypted raw device:
    # cryptsetup --key-file /etc/crypt.d/cryptFS.key \
    --cipher aes-cbc-essiv:sha256 luksFormat /dev/CryptVG/CryptVol
  4. Activate/open the encrypted device for writing:
    # cryptsetup luksOpen --key-file /etc/crypt.d/cryptFS.key \
    /dev/CryptVG/CryptVol CryptVol_crypt
    Pass the location of the encryption-key using the --key-file parameter.
     
  5. Add a mapping to the crypttab file:
    # ( printf "CryptVol_crypt\t/dev/CryptVG/CryptVol\t" ;
       printf "/etc/crypt.d/cryptFS.key\tluks\n" ) >> /etc/crypttab
    The OS will use this mapping-file at boot-time to open the encrypted device and ready it for mounting. The four column-values to the map are:
    1. Device-mapper Node: this is the name of the writable block-device used for creating filesystem structures and for mounting. The value is relative. When the device is activated, it will be assigned the device name /dev/mapper/<key_value>
    2. Hosting-Device: The physical device that hosts the encrypted psuedo-device. This can be a basic hard disk, a partition on a disk or an LVM volume.
    3. Key Location: Where the device's decryption-key is stored.
    4. Encryption Type: What encryption-method was used to encrypt the device (typically "luks")
     
  6. Create a filesystem on the opened encrypted device:
    # mkfs -t ext4 /dev/mapper/CryptVol_crypt
  7. Add the encrypted device's mount-information to the host's /etc/fstab file:
    # ( printf "/dev/mapper/CryptVol_crypt\t/cryptfs\text4" ;
       printf "defaults\t0 0\n" ) >> /etc/fstab
  8. Verify that everything works by hand-mounting the device (`mount -a`)
  9. Reboot the system (`init 6`) to verify that the encrypted device(s) automatically mount at boot-time
Keys and mappings in place, the system will reboot with the LUKSed devices opened and mounted. The above method's also good if you wanted to give each LUKS-protected device its own, device-specific key-file.

Note: You will really want to back up these key files. If you somehow lose the host OS but not the encrypted devices, the only way you'll be able to re-open those devices if you're able to restore the key-files to the new system. Absent those keys, you better have good backups of the unencrypted data - becuase you're starting from scratch.

Tuesday, August 2, 2016

Supporting Dynamic root-disk in LVM-enabled Templates - EL7 Edition

In my previous article,  Supporting Dynamic root-disk in LVM-enabled Templates, I discussed the challenges around supporting LVM-enabled VM-templates in cloud-based deployments of Enterprise Linux 6 VMs. The kernel used for Enterprise Linux 7 distributions makes template-based deployment of LVM-enabled VMs a bit easier. Instead of having to add an RPM from EPEL and then hack that RPM to make it support LVM2-encapsulated root volumes/filesystems, one need only ensure that the cloud-utils-growpart RPM is installed and do same launch-time massaging via cloud-init. By way of example:
#cloud-config
runcmd:
  - /usr/bin/growpart /dev/xvda 2
  - pvresize /dev/xvda2
  - lvresize -r -L +2G VolGroup00/logVol
  - lvresize -r -L +2G VolGroup00/auditVol
Will cause the launched instance to:
  1. Grow the second partition on the boot disk to the end of the disk
  2. Instruct LVM to resize the PV to match the new partition-size
  3. Instruct LVM to grow the VolGroup00/logVol volume — and the filesystem on top of it — by 2GiB
  4. Instruct LVM to grow the VolGroup00/auditVol volume — and the filesystem on top of it — by 2GiB
Upon login, the above launch-time configuration-actions can be verified by using `vgdisplay -s` and `lvs --segments -o +devices`:
# vgdisplay -s
  "VolGroup00" 29.53 GiB [23.53 GiB used / 6.00 GiB free]
# lvs --segments -o +devices
  LV       VG         Attr       #Str Type   SSize Devices
  auditVol VolGroup00 -wi-ao----    1 linear 8.53g /dev/xvda2(2816)
  auditVol VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(5512)
  homeVol  VolGroup00 -wi-ao----    1 linear 1.00g /dev/xvda2(1536)
  logVol   VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(2304)
  logVol   VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(5000)
  rootVol  VolGroup00 -wi-ao----    1 linear 4.00g /dev/xvda2(0)
  swapVol  VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(1024)
  varVol   VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(1792)

Supporting Dynamic root-disk in LVM-enabled Templates

One of the main customers I support has undertaken adoption of cloud-based services. This customer's IA team also requires that the OS drive be carved up to keep logging and audit activities separate from the rest of the OS disks. Previous to adoption of cloud-based services, this was a non-problem.

Since moving to the cloud — and using a build-method that generates launch-templates directly in the cloud (EL6 and EL7) — the use of LVM has proven problematic - particularly with EL6. Out-of-th-box, EL6 does not support dynamic resizing of the boot disk. This means that specifying a larger-than-default root-disk when launching a template is pointless if using a "stock" EL6 template. This can be overcome by creating a custom launch-template and that uses the dracut-modules-growroot from EPEL in that template.

Unfortunately, this EPEL RPM is only part of the picture. The downloaded dracut-modules-growroot RPM only supports growing "/" partition to the size of the larger-than-default disk if the template's root disk is either wholly unpartitioned or the disk is partitioned such that the "/" partition is the last partition on disk. It does not support a case where the "/" filesystem is hosted within an LVM2 volume-group. To get around this, it is necessary to patch the growroot.sh script that the dracut-modules-growroot RPM installs:
--- /usr/share/dracut/modules.d/50growroot/growroot.sh  2013-11-22 13:32:42.000000000 +0000
+++ growroot.sh 2016-08-02 15:56:54.308094011 +0000
@@ -18,8 +18,20 @@
 }

 _growroot() {
-       # Remove 'block:' prefix and find the root device
-       rootdev=$(readlink -f "${root#block:}")
+       # Compute root-device
+       if [ -z "${root##*mapper*}" ]
+       then
+               set -- "${root##*mapper/}"
+               VOLGRP=${1%-*}
+               ROOTVOL=${1#*-}
+               rootdev=$(readlink -f $(pvs --noheadings | awk '/'${VOLGRP}'/{print $1}'))
+               _info "'/' is hosted on an LVM2 volume: setting \$rootdev to ${rootdev}"
+       else
+               # Remove 'block:' prefix and find the root device
+               rootdev=$(readlink -f "${root#block:}")
+       fi
+
+       # root arg was nulled at some point...
        if [ -z "${rootdev}" ] ; then
                _warning "unable to find root device"
                return
Once the growroot.sh script has been patched, it will be necessary to generate and update the template's initramfs with the grow functionality enabled. If the template has multiple kernels installed, it will be desirable to ensure that each is functionally-enabled. A quick way to ensure that all of the initramfs files in the template are properly-enabled is to execute:

rpm -qa kernel | sed 's/^kernel-//' | \
   xargs -I {} dracut -f /boot/initramfs-{}.img
Note1: The above will likely put data into the template's /var/log/dracut.log file. It is likely desirable to null-out this file (along with all other log files) prior to sealing the template.

Note2: Patching the growroot.sh script will cause RPM-verification to fail in VMs launched from the template. This can either be handled as a known/expected exception or can be averted by performing a `yum reinstall dracut-modules-growroot` in the template or in the VMs launched from the template.

Credit: The above is an extension to a solution that I found at Backslasher.Net (my Google-fu was strong the day that I wanted to solve this problem!)