Wednesday, November 23, 2016

Manually Mirroring Git Repos

If you're like me, you've had occasions where you need to replicate Git-managed projects from one Git service to another. If you're not like me, you use paid services that make such mundane tasks a matter of clicking a few buttons in a GUI. If, however, you need to copy projects from one repository-service to another and no one has paid to make a GUI buttton/config-page available to you, then you need to find other methods to get things done.

The following assumes that you have a git project hosted in one repository service (e.g, GitHub) that you wish to mirror to another repository service (e.g., BitBucket, AWS CodeCommit, etc). The basic workflow looks like the following:

Procedure Outline:

  1. Login to a git-enabled host
  2. Create a copy of your "source-of-truth" repository, depositing its contents to a staging-directory:
    git clone --mirror \
       <REPOSITORY_USER>@<REPOSITORY1.DNS.NAME>:<PROJECT_USER_OR_GROUP>/<PROJECT_NAME>.git \
       stage
  3. Navigate into the staging-directory:
    cd stage
  4. Set the push-destination to the copy-repository:
    git remote set-url --push origin \
       <REPOSITORY_USER>@<REPOSITORY2.DNS.NAME>:<PROJECT_USER_OR_GROUP>/<PROJECT_NAME>.git
  5. Ensure the staging-directory's data is still up to date:
    git fetch -p origin
  6. Push the copied source-repository's data to the copy-repository:
    git push --mirror

Procedure Outline:


Using an example configuration (the AMIgen6 project):
$ git clone --mirror git@github.com:ferricoxide/AMIgen6.git stage && \
  cd stage && \
  git remote set-url --push origin git@bitbucket.org:ferricoxide/amigen6-copy.git && \
  git fetch -p origin && \
  git push --mirror
Cloning into bare repository 'stage'...
remote: Counting objects: 789, done.
remote: Total 789 (delta 0), reused 0 (delta 0), pack-reused 789
Receiving objects: 100% (789/789), 83.72 MiB | 979.00 KiB/s, done.
Resolving deltas: 100% (409/409), done.
Checking connectivity... done.
Counting objects: 789, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (369/369), done.
Writing objects: 100% (789/789), 83.72 MiB | 693.00 KiB/s, done.
Total 789 (delta 409), reused 789 (delta 409)
To git@bitbucket.org:ferricoxide/amigen6-copy.git
 * [new branch]      ExtraRPMs -> ExtraRPMs
 * [new branch]      SELuser-fix -> SELuser-fix
 * [new branch]      master -> master
 * [new branch]      refs/pull/38/head -> refs/pull/38/head
 * [new branch]      refs/pull/39/head -> refs/pull/39/head
 * [new branch]      refs/pull/40/head -> refs/pull/40/head
 * [new branch]      refs/pull/41/head -> refs/pull/41/head
 * [new branch]      refs/pull/42/head -> refs/pull/42/head
 * [new branch]      refs/pull/43/head -> refs/pull/43/head
 * [new branch]      refs/pull/44/head -> refs/pull/44/head
 * [new branch]      refs/pull/52/head -> refs/pull/52/head
 * [new branch]      refs/pull/53/head -> refs/pull/53/head
 * [new branch]      refs/pull/54/head -> refs/pull/54/head
 * [new branch]      refs/pull/55/head -> refs/pull/55/head
 * [new branch]      refs/pull/56/head -> refs/pull/56/head
 * [new branch]      refs/pull/57/head -> refs/pull/57/head
 * [new branch]      refs/pull/62/head -> refs/pull/62/head
 * [new branch]      refs/pull/64/head -> refs/pull/64/head
 * [new branch]      refs/pull/65/head -> refs/pull/65/head
 * [new branch]      refs/pull/66/head -> refs/pull/66/head
 * [new branch]      refs/pull/68/head -> refs/pull/68/head
 * [new branch]      refs/pull/71/head -> refs/pull/71/head
 * [new branch]      refs/pull/73/head -> refs/pull/73/head
 * [new branch]      refs/pull/76/head -> refs/pull/76/head
 * [new branch]      refs/pull/77/head -> refs/pull/77/head

Updating (and Automating)

To keep your copy-repository's project in sync with your source-repository's project, periodically do:
cd stage && \
  git fetch -p origin && \
  git push --mirror
This can be accomplished by logging into a host and executing the steps manually or placing them into a cron job.

Thursday, November 3, 2016

Automation in a Password-only UNIX Environment

Occasionally, my organization has need to run ad hoc queries against a large number of Linux systems. Usually, this is a great use-case for an enterprise CM tool like HPSA, Satellite, etc. Unfortunately, the organization I consult to is between solutions (their legacy tool burned down and their replacement has yet to reach a working state). The upshot is that, one needs to do things a bit more on-the-fly.

My preferred method for accessing systems is using some kind of token-based authentication framework. When hosts are AD-integrated, you can often use Kerberos for this. Failing that, you can sub in key-based logins (if all of your targets have your key as an authorized key). While my customer's systems are AD-integrated, their security controls preclude the use of both AD/Kerberos's single-signon capabilities and the use of SSH key-based logins (and, to be honest, almost none of the several hundred targets I needed to query had my key configured as an authorized key).

Because (tunneled) password-based logins are forced, I was initially looking at the prospect of having to write an expect script to avoid having type in my password several hundred times. Fortunately, there's an alternative to this in the tool "sshpass".

SSH pass lets you supply your password with a number of methods: command-line argument, a password-containing file, an environment variable value or even a read from STDIN. I'm not a fan of text files containing passwords (they've a bad tendency to be forgotten and left on a system - bad juju). I'm not particularly a fan of command-line arguments, either - especially on a multi-user system where others might see your password if they `ps` at the wrong time (which increases in probability as the length of time your job runs goes up). The STDIN method is marginally less awful than the command arg method (for similar reasons). At least with an environment variable, the value really only sits in memory (especially if you've set your HISTFILE location to someplace non-persistent).

The particular audit I was doing was an attempt to determine the provenance of a few hundred VMs. Over time, the organization has used templates authored by several groups - and different personnel within one of the groups. I needed to scan all of the systems to see which template they might have been using since the template information had been deleted from the hosting environment. Thus, I needed to run an SSH-encapsulated command to find the hallmarks of each template. Ultimately, what I ended up doing was:

  1. Pushed my query-account's password into the environment variable used by sshpass, "SSHPASS"
  2. Generated a file containing the IPs of all the VMs in question.
  3. Set up a for loop to iterate that list
  4. Looped `sshpass -e ssh -o PubkeyAuthentication=no StrictHostKeyChecking=no <USER>@${HOST} <AUDIT_COMMAND_SEQUENCE> 2>&1
  5. Jam STDOUT through a a sed filter to strip off the crap I wasn't interested in and put CSV-appropriate delimeters into each, queried host's string.
  6. Capture the lot to a text file
The "PubkeyAuthentication=no" option was required because I pretty much always have SSH-agent (or agent-forwarding) enabled. This causes my key to be passed. With the targets' security settings, this causes the connection to be aborted unless I explicitly suppress the passing of my agent-key.

The "StrictHostKeyChecking=no" option was required because I'd never logged into these hosts before. Our standard SSH client config is to require confirmation before accepting the remote key (and shoving it into ${HOME}/.ssh/known_hosts). Without this option, I'd be required to confirm acceptance of each key ...which is just about as automation-breaking as having to retype your password hundreds of times is.

Once the above was done, I had a nice CSV that could be read into Excel and a spreadsheet turned over to the person asking "who built/owns these systems". 

This method also meant that for the hosts that refused the audit credentials, it was easy enough to report "...and this lot aren't properly configured to work with AD".

Monday, October 17, 2016

Update to EL6 and the Pain of 10Gbps Networking in AWS

In a previous post, EL6 and the Pain of 10Gbps Networking in AWS, EL6 and the Pain of 10Gbps Networking in AWS, I discussed how to enable using third-party ixgbevf drivers to support 10Gbps networking in AWS-hostd RHEL 6 and CentOS 6 instances. Under further testing, it turns out that this is overkill. The native drivers may be used, instead, with minimal hassle.

It turns out that, in my quest to ensure that my CentOS and RHEL AMIs contained as many of the AWS utilities present in the Amazon Linux AMIs, that I included two RPMS - ec2-net and ec2-net-util - that were preventing use of the native drivers. Skipping these two RPMs (and possibly sacrificing ENI hot-plug capabilities) allows a much more low-effort support of 10Gpbs networking in AWS-hosted EL6 instances.

Absent those RPMS, 10Gbps support becomes a simple matter of:
  1. Add add_drivers+="ixgbe ixgbevf" to the AMI's /etc/dracut.conf file
  2. Use dracut to regenerate the AMI's initramfs.
  3. Ensure that there are no persistent network device mapping entries in /etc/udev/rules.d.
  4. Ensure that there are no ixgbe or ixgbevf config directives in /etc/modprobe.d/* files.
  5. Enable sr-iov support in the instance-to-be-registered
  6. Register the AMI 
The resulting AMIs (and instances spawned from them) should support 10Gbps networking and be compatible with M4-generation instance-types.

Friday, October 7, 2016

Using DKMS to maintain driver modules

In my prior post, I noted that maintaining custom drivers for the the kernel in RHEL and CentOS hosts can be a bit painful (and prone to leaving you with an unreachable or even unbootable system). One way to take some of the pain out of owning a system with custom drivers is to leverage DKMS. In general, DKMS is the recommended way to ensure that, as kernels are updated, required kernel modules are also (automatically) updated.

Unfortunately, use of the DKMS method will require that developer tools (i.e., the GNU C-compiler) be present on the system - either in perpetuity or just any time kernel updates are applied. It is very likely that your security team will object to - or even prohibit - this. If the objection/prohibition cannot be overridden, use of the DKMS method will not be possible.

Steps

  1. Set an appropriate version string into the shell-environment:
    export VERSION=3.2.2
  2. Make sure that appropriate header files for the running-kernel are installed
    yum install -y kernel-devel-$(uname -r)
  3. Ensure that the dkms utilities are installed:
    yum --enablerepo=epel install dkms
  4. Download the driver sources and unarchive into the /usr/src directory:
    wget https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/${VERSION}/ixgbevf-${VERSION}.tar.gz/download \
        -O /tmp/ixgbevf-${VERSION}.tar.gz && \
       ( cd /usr/src && \
          tar zxf /tmp/ixgbevf-${VERSION}.tar.gz )
  5. Create an appropriate DKMS configuration file for the driver:
    cat > /usr/src/ixgbevf-${VERSION}/dkms.conf << EOF
    PACKAGE_NAME="ixgbevf"
    PACKAGE_VERSION="${VERSION}"
    CLEAN="cd src/; make clean"
    MAKE="cd src/; make BUILD_KERNEL=\${kernelver}"
    BUILT_MODULE_LOCATION[0]="src/"
    BUILT_MODULE_NAME[0]="ixgbevf"
    DEST_MODULE_LOCATION[0]="/updates"
    DEST_MODULE_NAME[0]="ixgbevf"
    AUTOINSTALL="yes"
    EOF
  6. Register the module to the DKMS-managed kernel tree:
    dkms add -m ixgbevf -v ${VERSION}
  7. Build the module against the currently-running kernel:
    dkms build ixgbevf/${VERSION}

Verification

The easiest way to verify the correct functioning of DKMS is to:
  1. Perform a `yum update -y`
  2. Check that the new drivers were created by executing `find /lib/modules -name ixgbevf.ko`. Output should be similar to the following:
    find /lib/modules -name ixgbevf.ko | grep extra
    /lib/modules/2.6.32-642.1.1.el6.x86_64/extra/ixgbevf.ko
    /lib/modules/2.6.32-642.6.1.el6.x86_64/extra/ixgbevf.ko
    There should be at least two output-lines: one for the currently-running kernel and one for the kernel update. If more kernels are installed, there may be more than just two output-lines
     
  3. Reboot the system, then check what version is active:
    modinfo ixgbevf | grep extra
    filename:       /lib/modules/2.6.32-642.1.1.el6.x86_64/extra/ixgbevf.ko
    If the output is null, DKMS didn't build the new module.

Wednesday, October 5, 2016

EL6 and the Pain of 10Gbps Networking in AWS

AWS-hosted instances with optimized networking-support enabled see the 10Gbps interface as an Intel 10Gbps ethernet adapter (`lspci | grep Ethernet` will display a string similar to Intel Corporation 82599 Virtual Function). This interface makes use the ixgbevf network-driver. Enterprise Linux 6 and 7 bundle the version 2.12.x version of the driver into the kernel RPM. Per the Enabling Enhanced Networking with the Intel 82599 VF Interface on Linux Instances in a VPC document, AWS enhanced networking recommends version 2.14.2 or higher of the ixgbevf network-driver. To meet AWS's recommendations for 10Gbps support within an EL6 instance, it will be necessary to update the ixgbevf driver to at least version 2.14.2.

The ixgbevf network-driver source-code can be found on SourceForge. It should be noted that not every AWS-compatible version will successfully compile on EL 6. Version 3.2.2 is known to successfully compile, without intervention, on EL6.


Notes:


>>>CRITICAL ITEM<<<

It is necessary to recompile the ixgbevf driver and inject it into the kernel each time the kernel version changes. This needs to be done between changing the kernel version and rebooting into the new kernel version. Failure to update the driver each time the kernel changes will result in the instance failing to return to the network after a reboot event.

Step #10 from the implementation procedure:
rpm -qa kernel | sed 's/^kernel-//' | xargs -I {} dracut -v -f /boot/initramfs-{}.img {}
Is the easiest way to ensure any available kernels are properly linked against the ixgbevf driver.

>>>CRITICAL ITEM<<<

It is possible that the above process can be avoided by installing DKMS and letting it coordinate the insertion of the ixgbevf driver modules into updated kernels.


Procedure:


The following assumes the instance-owner has privileged access to the instance OS and can make AWS-level configuration changes to the instance-configuration:
  1. Login to the instance
  2. Escalate privileges to root
  3. Install the up-to-date ixgbevf driver. This can be installed either by compiling from source or using pre-compiled binaries.
  4. Delete any `*persistent-net*.rules` files found in the /etc/udev/rules.d directory (one or both of 70-persistent-net.rules and 75-persistent-net-generator.rules may be present)
  5. Ensure that an `/etc/modprobe.d` file with the following minimum contents exists:
    alias eth0 ixgbevf
    
    Recommend creating/placing in /etc/modprobe.d/ifaliases.conf
  6. Unload any ixgbevf drivers that may be in the running kernel:
    modprobe -rv ixgbevf
    
  7. Load the updated ixgbevf driver into the running kernel:
    modprobe -v ixgbevf
    
  8. Ensure that the /etc/modprobe.d/ixgbevf.conf file exists. Its contents should resemble:
    options ixgbevf InterruptThrottleRate=1
    
  9. Update the /etc/dracut.conf. Ensure that the add_drivers+="" directive is uncommented and contains reference to the ixgbevf modules (i.e., `add_drivers+="ixgbevf"`)
  10. Recompile all installed kernels:
    rpm -qa kernel | sed 's/^kernel-//'  | xargs -I {} dracut -v -f /boot/initramfs-{}.img {}
    
  11. Shut down the instance
  12. When the instance has stopped, use the AWS CLI tool to enable optimized networking support:
    aws ec2 --region  modify-instance-attribute --instance-id  --sriov-net-support simple
    
  13. Power the instance back on
  14. Verify that 10Gbps capability is available:
    1. Check that the ixgbevf module is loaded
      $ sudo lsmod
      Module                  Size  Used by
      ipv6                  336282  46
      ixgbevf                63414  0
      i2c_piix4              11232  0
      i2c_core               29132  1 i2c_piix4
      ext4                  379559  6
      jbd2                   93252  1 ext4
      mbcache                 8193  1 ext4
      xen_blkfront           21998  3
      pata_acpi               3701  0
      ata_generic             3837  0
      ata_piix               24409  0
      dm_mirror              14864  0
      dm_region_hash         12085  1 dm_mirror
      dm_log                  9930  2 dm_mirror,dm_region_hash
      dm_mod                102467  20 dm_mirror,dm_log
      
    2. Check that `ethtool` is showing that the default interface (typicall "`eth0`") is using the ixgbevf driver:
      $ sudo ethtool -i eth0
      driver: ixgbevf
      version: 3.2.2
      firmware-version: N/A
      bus-info: 0000:00:03.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: yes
      supports-priv-flags: no
      
    3. Verify that the interface is listed as supporting a link mode of `10000baseT/Full` and a speed of `10000Mb/s`:
      $ sudo ethtool eth0
      Settings for eth0:
              Supported ports: [ ]
              Supported link modes:   10000baseT/Full
              Supported pause frame use: No
              Supports auto-negotiation: No
              Advertised link modes:  Not reported
              Advertised pause frame use: No
              Advertised auto-negotiation: No
              Speed: 10000Mb/s
              Duplex: Full
              Port: Other
              PHYAD: 0
              Transceiver: Unknown!
              Auto-negotiation: off
              Current message level: 0x00000007 (7)
                                     drv probe link
              Link detected: yes
      

Thursday, August 25, 2016

Use the Force, LUKS

Not like there aren't a bunch of LUKS guides out there already ...mostly posting this one for myself.

Today, was working on turning the (attrocious - other than a long-past deadline, DISA, do you even care what you're publishing?) RHEL 7 V0R2 STIGs specifications into configuration management elements for our enterprise CM system. Got to the STIG item for "ensure that data-at-rest is encrypted as appropriate". This particular element is only semi-automatable ...since it's one of those "context" rules that has a "if local policy requires it" back-biting element to it. At any rate, this particular STIG-item prescribes the use of LUKs.

As I set about to write the code for this security-element, it occurred to me, "we typically use array-based storage encryption - or things like KMS in cloud deployments - that I can't remember how to cofigure LUKS ...least of all configure it so it doesn't require human intervention to mount volumes." So, like any good Linux-tech, I petitioned the gods of Google. Lo, there were many results — most falling into either the "here's how you encrypt a device" or the "here's how you take an encrypted device and make the OS automatically remount it at boot" camps. I was looking to do both so that my test-rig could be rebooted and just have the volume there. I was worried about testing whether devices were encrypted, not whether leaving keys on a system was adequately secure.

At any rate, at least for testing purposes (and in case I need to remember these later), here's what I synthesized from my Google searches.

  1. Create a directory for storing encryption key-files. Ensure that directory is readable only by the root user:
    install -d -m 0700 -o root -g root /etc/crypt.d
  2. Create a 4KB key from randomized data (stronger encryption key than typical, password-based unlock mechanisms):
    # dd if=/dev/urandom of=/etc/crypt.d/cryptFS.key bs=1024 count=4
    ...writing the key to the previously-created, protected directory. Up the key-length by increasing the value of the count parameter.
     
  3. Use the key to create an encrypted raw device:
    # cryptsetup --key-file /etc/crypt.d/cryptFS.key \
    --cipher aes-cbc-essiv:sha256 luksFormat /dev/CryptVG/CryptVol
  4. Activate/open the encrypted device for writing:
    # cryptsetup luksOpen --key-file /etc/crypt.d/cryptFS.key \
    /dev/CryptVG/CryptVol CryptVol_crypt
    Pass the location of the encryption-key using the --key-file parameter.
     
  5. Add a mapping to the crypttab file:
    # ( printf "CryptVol_crypt\t/dev/CryptVG/CryptVol\t" ;
       printf "/etc/crypt.d/cryptFS.key\tluks\n" ) >> /etc/crypttab
    The OS will use this mapping-file at boot-time to open the encrypted device and ready it for mounting. The four column-values to the map are:
    1. Device-mapper Node: this is the name of the writable block-device used for creating filesystem structures and for mounting. The value is relative. When the device is activated, it will be assigned the device name /dev/mapper/<key_value>
    2. Hosting-Device: The physical device that hosts the encrypted psuedo-device. This can be a basic hard disk, a partition on a disk or an LVM volume.
    3. Key Location: Where the device's decryption-key is stored.
    4. Encryption Type: What encryption-method was used to encrypt the device (typically "luks")
     
  6. Create a filesystem on the opened encrypted device:
    # mkfs -t ext4 /dev/mapper/CryptVol_crypt
  7. Add the encrypted device's mount-information to the host's /etc/fstab file:
    # ( printf "/dev/mapper/CryptVol_crypt\t/cryptfs\text4" ;
       printf "defaults\t0 0\n" ) >> /etc/fstab
  8. Verify that everything works by hand-mounting the device (`mount -a`)
  9. Reboot the system (`init 6`) to verify that the encrypted device(s) automatically mount at boot-time
Keys and mappings in place, the system will reboot with the LUKSed devices opened and mounted. The above method's also good if you wanted to give each LUKS-protected device its own, device-specific key-file.

Note: You will really want to back up these key files. If you somehow lose the host OS but not the encrypted devices, the only way you'll be able to re-open those devices if you're able to restore the key-files to the new system. Absent those keys, you better have good backups of the unencrypted data - becuase you're starting from scratch.

Tuesday, August 2, 2016

Supporting Dynamic root-disk in LVM-enabled Templates - EL7 Edition

In my previous article,  Supporting Dynamic root-disk in LVM-enabled Templates, I discussed the challenges around supporting LVM-enabled VM-templates in cloud-based deployments of Enterprise Linux 6 VMs. The kernel used for Enterprise Linux 7 distributions makes template-based deployment of LVM-enabled VMs a bit easier. Instead of having to add an RPM from EPEL and then hack that RPM to make it support LVM2-encapsulated root volumes/filesystems, one need only ensure that the cloud-utils-growpart RPM is installed and do same launch-time massaging via cloud-init. By way of example:
#cloud-config
runcmd:
  - /usr/bin/growpart /dev/xvda 2
  - pvresize /dev/xvda2
  - lvresize -r -L +2G VolGroup00/logVol
  - lvresize -r -L +2G VolGroup00/auditVol
Will cause the launched instance to:
  1. Grow the second partition on the boot disk to the end of the disk
  2. Instruct LVM to resize the PV to match the new partition-size
  3. Instruct LVM to grow the VolGroup00/logVol volume — and the filesystem on top of it — by 2GiB
  4. Instruct LVM to grow the VolGroup00/auditVol volume — and the filesystem on top of it — by 2GiB
Upon login, the above launch-time configuration-actions can be verified by using `vgdisplay -s` and `lvs --segments -o +devices`:
# vgdisplay -s
  "VolGroup00" 29.53 GiB [23.53 GiB used / 6.00 GiB free]
# lvs --segments -o +devices
  LV       VG         Attr       #Str Type   SSize Devices
  auditVol VolGroup00 -wi-ao----    1 linear 8.53g /dev/xvda2(2816)
  auditVol VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(5512)
  homeVol  VolGroup00 -wi-ao----    1 linear 1.00g /dev/xvda2(1536)
  logVol   VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(2304)
  logVol   VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(5000)
  rootVol  VolGroup00 -wi-ao----    1 linear 4.00g /dev/xvda2(0)
  swapVol  VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(1024)
  varVol   VolGroup00 -wi-ao----    1 linear 2.00g /dev/xvda2(1792)

Supporting Dynamic root-disk in LVM-enabled Templates

One of the main customers I support has undertaken adoption of cloud-based services. This customer's IA team also requires that the OS drive be carved up to keep logging and audit activities separate from the rest of the OS disks. Previous to adoption of cloud-based services, this was a non-problem.

Since moving to the cloud — and using a build-method that generates launch-templates directly in the cloud (EL6 and EL7) — the use of LVM has proven problematic - particularly with EL6. Out-of-th-box, EL6 does not support dynamic resizing of the boot disk. This means that specifying a larger-than-default root-disk when launching a template is pointless if using a "stock" EL6 template. This can be overcome by creating a custom launch-template and that uses the dracut-modules-growroot from EPEL in that template.

Unfortunately, this EPEL RPM is only part of the picture. The downloaded dracut-modules-growroot RPM only supports growing "/" partition to the size of the larger-than-default disk if the template's root disk is either wholly unpartitioned or the disk is partitioned such that the "/" partition is the last partition on disk. It does not support a case where the "/" filesystem is hosted within an LVM2 volume-group. To get around this, it is necessary to patch the growroot.sh script that the dracut-modules-growroot RPM installs:
--- /usr/share/dracut/modules.d/50growroot/growroot.sh  2013-11-22 13:32:42.000000000 +0000
+++ growroot.sh 2016-08-02 15:56:54.308094011 +0000
@@ -18,8 +18,20 @@
 }

 _growroot() {
-       # Remove 'block:' prefix and find the root device
-       rootdev=$(readlink -f "${root#block:}")
+       # Compute root-device
+       if [ -z "${root##*mapper*}" ]
+       then
+               set -- "${root##*mapper/}"
+               VOLGRP=${1%-*}
+               ROOTVOL=${1#*-}
+               rootdev=$(readlink -f $(pvs --noheadings | awk '/'${VOLGRP}'/{print $1}'))
+               _info "'/' is hosted on an LVM2 volume: setting \$rootdev to ${rootdev}"
+       else
+               # Remove 'block:' prefix and find the root device
+               rootdev=$(readlink -f "${root#block:}")
+       fi
+
+       # root arg was nulled at some point...
        if [ -z "${rootdev}" ] ; then
                _warning "unable to find root device"
                return
Once the growroot.sh script has been patched, it will be necessary to generate and update the template's initramfs with the grow functionality enabled. If the template has multiple kernels installed, it will be desirable to ensure that each is functionally-enabled. A quick way to ensure that all of the initramfs files in the template are properly-enabled is to execute:

rpm -qa kernel | sed 's/^kernel-//' | \
   xargs -I {} dracut -f /boot/initramfs-{}.img
Note1: The above will likely put data into the template's /var/log/dracut.log file. It is likely desirable to null-out this file (along with all other log files) prior to sealing the template.

Note2: Patching the growroot.sh script will cause RPM-verification to fail in VMs launched from the template. This can either be handled as a known/expected exception or can be averted by performing a `yum reinstall dracut-modules-growroot` in the template or in the VMs launched from the template.

Credit: The above is an extension to a solution that I found at Backslasher.Net (my Google-fu was strong the day that I wanted to solve this problem!)

Saturday, July 16, 2016

Retrospective Automatic Image Replication in NetBackup

In version 7.x of NetBackup, VERITAS added the Automatic Image Replication functionality. This technology is more commonly referred to as "AIR". Its primary use case is to enable a NetBackup administrator to easily configure data replication between two different — typically geographically-disbursed — NetBackup domains.

Like many tools that are designed for a given use-case, AIR can be used for things it wasn't specifically designed for. Primary down-side to these not-designed-for use-cases is the documentation and tool-sets for such usage is generally pretty thin.

A customer I was assisting wanted to upgrade their appliance-based NetBackup system but didn't want to have to give up their old data. Because NetBackup appliances use Media Server Deduplication Pools (MSDP), it meant that I had a couple choices in how to handle their upgrade. I opted to try to use AIR to help me quickly and easily migrate data from their old appliance's MSDP to their new appliance's.

Sadly, as is typical of  not-designed-for use-case, documentation for doing it was kind of thin-on-the ground. Worse, because Symantec had recently spun VERITAS back off as its own entity, many of the forums that survived the transition had reference- and discussion-links that pointed to nowhere. Fortunately, I had access to a set of laboratory systems (AWS/Azure/Google Cloud/etc. is great for this - both from the standpoint of setup speed and "ready to go" OS templates). I was also able to borrow some of my customer's NetBackup 7.7 keys to use for the testing.

I typically prefer to work with UNIX/Linux-based systems to host NetBackup. However, my customer is a Windows-based shop. My customer's planned migration was also going to have the new NetBackup domain hosted on a different VLAN from their legacy NetBackup domain. This guided my lab design: I created a cloud-based "lab" configuration using two subnets and two Windows Server 2012 instance-templates. I set up each of my instances with enough storage to host the NetBackup software on one disk and the MSDPs on another disk ...and provisioned each of my test master servers with four CPUs and 16GiB or RAM. This is considerably smaller then both their old and new appliances, but I also wasn't trying to simulate an enterprise outpost's worth of backpup traffic. I also set up a mix of about twenty Windows and Linux instances to act as testing clients (customer is beginning to add Linux systems as virtualization and Linux-based "appliances" have started to creep into their enterprise-stacks).

I set up two very generic NetBackup domains. Into each, I built an MSDP. I also set up a couple of very generic backup policies on the one NetBackup Master Server to backup all of the testing clients to the MSDP. I configured the policies for daily fulls and hourly incrementals, and set up each of the clients to continuously regenerate random data-sets in their filesystems. I let this run for forty-eight hours so that I could get a nice amount of seed-data into the source NBU domain's MSDP.

Note: If you're not familiar with MSDP setup, the SETTLERSOMAN website has a good, generic walkthrough.

After populating the source site's MSDP, I converted from using the MSDP by way of a direct STorage Unit definition (STU) to using it by way of a two-stage Storage Lifecycle Policy (SLP). I configured the SLP to use the source-site MSDP as the stage-one destination in the lifecycle and added the second NBU domain's MSDP as the stage-two destination in the lifecycle. I then seeded the second NBU domain's MSDP with data by executing a full backup of all clients against the SLP.

Note: For a discussion on setting up an AIR-based replication SLP, again, the SETLLERSOMAN website has a good, generic walkthrough.

All of the above is fairly straight-forward and well documented (both within the NBU documentation and sites like SETTLERSOMAN). However, it only addresses the issue of how you get newly-generated data from one NBU domain's MSDP to another's. Getting older data from an existing MSDP to a new MSDP is a bit more involved ...and not for the command-line phobic (or, in my case, PowerShell-phobic.)

At a high level, what you do is:
  1. Use the `bpimmedia` tool to enumerate all of the backup images stored on the source-site's MSDP
  2. Grab only the media-IDs of the enumerated backup images
  3. Feed that list of media-IDs to the `nbreplicate` tool so that it can copy that old data to the new MSDP
Note: The vendor documentation for the `bpimmedia` and  `nbreplicate` tools can be found at the VERITAS website.

When using the `bpimmedia` tool to automate image-ID enumeration, using the `-l` flag puts the output into a script-parsable format. The desired capture-item is the fourth field in all lines that begin 'IMAGE':
  • In UNIX/Linux shell, use an invocation similar to: `bpimmedia -l | awk '/^IMAGE/{print $4}`
  • In PowerShell, use an invocation similar to:`bpimmedia -l | select-string -pattern "IMAGE *" | ForEach-Object { $data = $_ -split " " ; "{0}" -f $data[3] }`
The above output can then be either captured to a file — so that one the `nbreplicate` job can be launched to handle all of the images — or each individual image-ID can be passed to an individual `nbreplicate` job (typically via a command-pipeline in a foreach script). I ended up doing the latter because, even though the documentation indicates that the tool supports specifying an image-file, when executed under PowerShell, `nbreplicate` did not seem to know what to do with said file.

The `nbreplicate` command has several key flags we're interested in for this exercise:
  • -backupid: The backup-identifier captured via the `bpimmedia` tool
  • -cn: The copy-number to replicate — in most circumstances, this should be "1"
  • -rcn: The copy-number to assign to the replicated backup-image — in most circumstances, this should be "1"
  • -slp: the name of the SLP hosted on the destination NetBackup domain
  • -target_sts: the FQDN of the destination storage-server (use `nbemmcmd -listhosts` to verify names - or the replication jobs will fail with a status 191, sub-status 174)
  • -target_user: the username of a user that has administrative rights to the destination storage-server
  • -target_user: the password of the the -target_user username
 If you don't care about minimizing the number of replication operations, this can all be put together similar to the following:
  • For Unix:
    for ID in $(bpimmedia -l | awk '/^IMAGE/{print $4}')
    do
       nbreplicate -backupid ${ID} -cn 1 -slp_name <REMOTE_SLP_NAME> \
         -target_sts <REMOTE_STORAGE_SERVER> -target_user <REMOTE_USER> \
         -target_pwd <REMOTE_USER_PASSWORD>
    done
    
  • For Windows:
    @(bpimmedia -l | select-string -pattern "IMAGE *" | \
       ForEach-Object { $data = $_ -split " " ; "{0}" -f $data[3] }) | \
       ForEach-Object { nbreplicate -backupid $_ -cn 1 \
         -slp_name <REMOTE_SLP_NAME> -target_sts <REMOTE_STORAGE_SERVER> \
         -target_user <REMOTE_USER> -target_pwd <REMOTE_USER_PASSWORD> }
    


Monday, June 13, 2016

Seriously, CentOS?

One of my (many) duties in our shop is doing cross-platform maintenance of RPMs. Previously, when I was maintaining them for EL5 and EL6, things were fairly straight-forward. You got or made a SPEC file to packages your SOURCES into RPMS and you were pretty much good to go. Imagine my surprise when I went to start porting things to EL7 and all my freaking packages had el7.centos in their damned names. WTF?? These RPMs are for use on all of our EL7 systems, not just CentOS: why the hell are you dropping the implementation into my %{dist}-string?? So, now I have to make sure that my ${HOME}/.rpmmacros file has a line in it that looks like:
%dist .el7
If I don't want my %{dist}-string get crapped-up.

Screw you, CentOS. Stuff was just fine on CentOS 5 and CentOS 6. This change is not an improvement.

Tuesday, April 26, 2016

SSH Via GateOne

If you're like me, you occasionally work from locations that have restrictive firewall rules. Those rules may even be so restrictive that you're unable to SSH to remote hosts that you need to manage or do development work on.

The solution to this scenario is to use an SSH-over-HTML proxy service. In my case, I use GateOne. It's a light-weight service that's quick and easy to configure. The software maintainers even make Docker-ready bundles for it.

In my case, I create a GateOne-enabled SSH-proxy in AWS. Doing so is as simple as picking an EL6-compatible AMI (CentOS, RHEL, Scientific Linux, Amazon Linux, etc.) and giving the AMI launch-tool the following user-data:
#!/bin/sh

yum update -y
yum install -y git
pip install --upgrade pip
pip install --upgrade tornado
git clone https://github.com/liftoff/GateOne.git /tmp/GateOne
(cd /tmp/GateOne ; python setup.py install)
chkconfig gateone on
service gateone start
printf "Sleeping for 15s..."
sleep 15
echo "Done!"
pkill gateone
sed -i -e '/"https_redirect"/s/: .*$/: true,/' \
    -e '/"origins":/s/:.*$/: ["*"],/' \
    $(readlink -f /etc/gateone/conf.d/10server.conf)
sed -i '/"auth"/s/: .*$/: "pam",/' \
    $(readlink -f /etc/gateone/conf.d/20authentication.conf)
service gateone restart
When installed from its Git repository, the default configuration of GateOne only allows connection from an instances local interfaces. The `sed` operation against the "/etc/gateone/conf.d/10server.conf" configuration-file opens up this restriction. Setting the "origins" definition to '"*"' will allow browser connections from anywhere (if this is unacceptable, you can use AWS security-groups to lock things back down a bit).

Because I like to think of myself as lazy, I also enable a port 80 → 443 automatic redirect. Saves me from having to type "https://" into my browser (every keystroke saved counts!). This redirect is created by setting the "/etc/gateone/conf.d/10server.conf" configuration-file's "https_redirect" configuration-option to "true".

Because I prefer to do key-based logins and I don't want to make my SSH keys easily snarfable, I disable anonymous logins. This also has the side-effect of making it so it's not as easy for randos to discover your SSH proxy and use it for their own purposes. Disabling anonymous logins is done by setting the "/etc/gateone/conf.d/20authentication.conf" file's "auth" parameter. By setting it to "pam", as above, you instruct GateOne to allow local user-credentials to be used for logins.

When GateOne's authentication is set to "pam", it will be necessary to set up a valid user/password at the OS layer to be able to login via GateOne. You can do this via userdata or by logging into the instance and manually creating the GateOne user-credentials. I prefer the userdata method, as it means I can fully-automate the whole GateOne deployment process (e.g., via a CLI-script, CloudFormation template, etc.).

Note on the sleep statement: GateOne's configuration files are not present until the first time the service starts. Without the sleep, the `sed` operations will fail with a "missing file" error.

Tuesday, April 19, 2016

But I Don't Like That Username

One of the clients I do work for is in the process of adopting commercial cloud solutions. Early in the process, they had a private network that they were doing executing virtualization and private-cloud efforts. The maintainers of those two environments had created standard builds for use within those environments. For better or worse, my client has a number of developer teams they've contracted out to whose primary efforts are conducted either in-house (to the contractor) or within AWS, Azure or Google Compute Engine.

The group I work for has been tasked with standardizing and automating the deployment of enterprise components across all of the various environments and providing stewardship of the other contracted-out development efforts. When we were first given this task, the customer's build engineers would not provide any methods to replicate the production build - not even sufficient documentation that would allow us to accurately mimic it well enough to enable the other developers with a seamless dev -> test -> prod workflow.

Early in our involvement, I ended up creating my own build. Eventually, in the previously-described vacuum, others decided to adopt my build. Now, there's enough different groups using that build that it's pressuring the maintainers of the internal build to abandon theirs and adopt mine.

Fun part of DevOps is that when tends to be consensus-driven. If there's a slow or unresponsive link in the chain, the old "top down" approach frequently becomes a casualty when critical mass is achieved bottom up..

At any rate, we were recently in discussions with the enterprise build maintainers to show them how to adopt the consensus build. One of their pushbacks was "but that build doesn't include the 'break-glass' account that the legacy builds do." They wanted to know if we could modify the build to be compliant with that user-account.

This struck me odd, since, our group's (and other dev-groups') approach to such issues is "modify it in code". This isn't an approach familiar to the enterprise team. They're very "golden image" oriented. So, I provideded them a quick set of instructions on how to take the incoming standard build and make it compatible with their tools' expectations
#cloud-config
system_info:
  default_user:
    name: ent-adm
For enterprise components that will be migrated from private virtualization and cloud solutions to commercial cloud offerings, the above allows them to take not just my build, but any build that's enabled for automated provisioning and inject their account. Instead of launching a system with whatever the default-user is that's baked in, the above allows them to reset any system's initial username to be whatever they want (the above's 'ent-adm' is just an example - I dunno what their preferred account name is - I've simply used the above whenever I'm deploying instance-templates and don't want to remember "instance X uses userid A; instance Y uses userid B; and instance Z uses userid C").

Even more fun if they don't like any of the attributes associated with the default user's account, they can override it with any user-parameter available in cloud-init.

Tuesday, February 16, 2016

Per-Sender SMTP SASL Authentication

One of the customers I do work for runs a multi-tenant environment. One of the issues this customer has been having is "how do we notify tenants that their systems want patching". For their Linux systems, the immediate answer was to use the `yum-cron` facility to handle it.

Unfortunately, the system that would receive these types of notification emails is different than the ones that handle generic SMTP relaying. Instead of being able to set up each Linux system's Postfix service to use a single Smart-relay, we needed to be able to have just the account that's used to send the ready-to-patch notifications relay into through the notification gateway while all other emails get directed through a broader-scope relay.

It took several search iterations to finally uncover the "trick" for allowing this (credit to zmwangx for his GitHub Gist-posting). Overall, it's pretty straight-forward. Only thing that was not immediately obvious was which tokens mapped-through (what was the common key). In summary, to create the solution one needs to do three things:
  1. Modify the Postfix configuration:
    1. Do the standard tasks around defining a "smart" relay
    2. Do the standard tasks around enabling Postfix to act as a SASL-authenticated client
    3. Enable per-sender authentication
    4. Define a per-sender relay-map
  2. Create a file to map a local sender-address to a SASL-credential
  3. Create a file to map a local sender-address to a specific relay-host
One that's done, it's simply a matter of verifying that things are working the way you think they should be.

Modify Postfix: Define Default Smart Relay
As with much of my experimentation the past year or two, I did my testing using Amazon Web Services resources. In this case, I used AWS's Simple Email Service (SES). Configuring Postfix to relay is trivial. Since my testing was designed to ensure that everything except my wanted sender-address to deny relaying, I configured my test system to point to its local SES relay without actually configuring an SES account to enable that relaying. For postfix, this was simply a matter of doing:
postconf -e "relayhost = [email-smtp.us-west-2.amazonaws.com]:587"
Appends creates line to your /etc/postfix/main.cf that looks like:
relayhost = [email-smtp.us-west-2.amazonaws.com]:587
Note: The AWS SES relays are currently only available within a few regions (as of this writing, NoVA/us-east-1, Oregon/us-west-2 and Ireland/eu-west-1). Each relay requires a SASL credential be created to allow relaying. So, no big deal publicizing the relay's name if spammers don't have such credentials.

At this point, any mail sent via Postfix will attempt to relay through the listed relay-host. Further, because SASL client-credentials are not yet set up, those relay attempts will fail.

Modify Postfix: Define Default SASL-Client Credentials
Add a block similar to the following to begin to turn Postfix into an SMTP SASL-client:
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_mechanism_filter = plain
smtp_tls_CAfile = /etc/pki/tls/certs/ca-bundle.crt
smtp_use_tls = yes
smtp_tls_security_level = encrypt
Two critical items above are the "smtp_sasl_password_maps" and "smtp_tls_CAfile" parameters:

  1. The former instructs Postfix where to look for SASL credentials.
  2. The latter tells postfix where to look for root Certificate Authorities so it can verify that the relay host's SSL certificates are valid. The path listed in the above is the default for Red Hat-derived distributions: alter to fit your distribution's locations
The rest of the parameters instruct Postfix to use SASL-authentication over TLS-encrypted channels (required when connecting to an SMTP-relay via port 587) and to use the "plain" mechanism for sending credentials to the relay-host.

Modify Postfix: Enable Per-Sender Authentication
Use the command `postconf -e "smtp_sender_dependent_authentication = yes"` to enable Postfix's per-sender authentication modules. This will add a line to the /etc/postfix/main.cf that looks like:
smtp_sender_dependent_authentication = yes

Modify Postfix: Define SASL-Sender Map
Once per-sender authentication is enabled, Postfix needs to be instructed where to find mappings of senders to credentials. Use the command,  `postconf -e "sender_dependent_relayhost_maps = hash:/etc/postfix/sender_relay"` to enable Postfix's per-sender authentication modules. This will add a line to the /etc/postfix/main.cf that looks like:
sender_dependent_relayhost_maps = hash:/etc/postfix/sender_relay

Create a Sender-to-Credential Map
Edit/create the sender-to-credential mapping file /etc/postfix/sender_passwd. Its contents should be similar to the following:
# Sender Address                                 <userid>:<password>
patch-alert@ses-test.cloudlab.xanthia.com        AKIAICAOGQX5UA0ACDSVJ:pDHM1n4uYLGN4BQOnzGcTSeQXSRDjcKCy6VkmQk+CoBV
Postfix will use the "patch-alert@ses-test.cloudlab.xanthia.com" sender-address as a common key with the value in the relay-map (following).


Create a Sender-to-Relay Map
Edit/create the sender-to-credential mapping file /etc/postfix/sender_relay. Its contents should be similar to the following:
# Sender Address                            [relay-host]:port
patch-alert@ses-test.cloudlab.xanthia.com        [email-smtp.us-west-2.amazonaws.com]:587
Postfix will use "patch-alert@ses-test.cloudlab.xanthia.com" sender-address as a common key with the value in the credential-map (preceding).

Verification
A quick verification test is to send an email from a mapped user-address and a non-mapped user address:
# sendmail -f patch-alert@ses-test.cloudlab.xanthia.com -t <<EOF
To: fubar@cloudlab.xanthia.com
Subject: Per-User SASL Test
Content-type: text/html

If this arrived, things are probably set up correctly
EOF
# sendmail -f unmapped-user@ses-test.cloudlab.xanthia.com -t <<EOF
To: fubar@cloudlab.xanthia.com
Subject: Per-User SASL Test
Content-type: text/html

If this bounced, things are probably set up correctly
EOF
This should result in an SMTP log-snippet that resembles the following:
Feb 16 18:08:09 ses-test maintuser: MARK == MARK == MARK
Feb 16 18:08:22 ses-test postfix/pickup[5484]: 2B7D244AB: uid=0 from=<patch-alert@ses-test.cloudlab.xanthia.com>
Feb 16 18:08:22 ses-test postfix/cleanup[5583]: 2B7D244AB: message-id=<20160216180822.2B7D244AB@ses-test.cloudlab.xanthia.com>
Feb 16 18:08:22 ses-test postfix/qmgr[5485]: 2B7D244AB: from=<patch-alert@ses-test.cloudlab.xanthia.com>, size=403, nrcpt=1 (queue active)
Feb 16 18:08:22 ses-test postfix/smtp[5585]: 2B7D244AB: to=<thjones2@gmail.com>, relay=email-smtp.us-west-2.amazonaws.com[54.187.123.10]:587, delay=0.37, delays=0.02/0.03/0.19/0.13, dsn=2.0.0, status=sent (250 Ok 00000152eb44d396-408494a9-93f0-4f21-8985-460c057537bf-000000)
Feb 16 18:08:22 ses-test postfix/qmgr[5485]: 2B7D244AB: removed
Feb 16 18:08:32 ses-test postfix/pickup[5484]: A339E44AB: uid=0 from=<bad-sender@ses-test.cloudlab.xanthia.com>
Feb 16 18:08:32 ses-test postfix/cleanup[5583]: A339E44AB: message-id=<20160216180832.A339E44AB@ses-test.cloudlab.xanthia.com>
Feb 16 18:08:32 ses-test postfix/qmgr[5485]: A339E44AB: from=<bad-sender@ses-test.cloudlab.xanthia.com>, size=408, nrcpt=1 (queue active)
Feb 16 18:08:32 ses-test postfix/smtp[5585]: A339E44AB: to=<thjones2@gmail.com>, relay=email-smtp.us-west-2.amazonaws.com[54.69.81.169]:587, delay=0.09, delays=0.01/0/0.08/0, dsn=5.0.0, status=bounced (host email-smtp.us-west-2.amazonaws.com[54.69.81.169] said: 530 Authentication required (in reply to MAIL FROM command))
As can be seen in the snippet, the first message (from the mapped sender) was relayed while the second message (from the unmapped sender) was rejected.

Friday, January 8, 2016

Solving Root VG Collisions in LVM-Enabled Virtualization Templates

The nature of template-based Linux OS deployments means that, if a template uses LVM for its root filesystems, any system built from that template will have non-unique volume group (VG) names. In most situations, non-unique VG names are not a problem. However, if you encounter a situation where you need to a broken instance by correcting a problem within the instance's root filesystems, non-unique VG names can make that task more difficult.

To avoid this eventuality, the template user can easily modify each launched template by executing steps similar to the following:
#!/bin/sh

DEFIF=$(ip route show | awk '/^default/{print $5}')
BASEIP=$(printf '%02X' \
         $(ip addr show ${DEFIF} | \
           awk '/inet /{print $2}' | \
           sed -e 's#/.*$##' -e 's/\./ /g' \
          ))

vgrename -v VolGroup00 VolGroup00_${BASEIP}
sed -i 's/VolGroup00/&_'${BASEIP}'/' /etc/fstab
sed -i 's/VolGroup00/&_'${BASEIP}'/g' /boot/grub/grub.conf

for KRNL in $(awk '/initrd/{print $2}' /boot/grub/grub.conf | \
              sed -e 's/^.*initramfs-//' -e 's/\.img$//')
do
   mkinitrd -f -v /boot/initramfs-${KRNL}.img ${KRNL}
done

init 6
Note that the above script assumes that the current root VG name is "VolGroup00". If your current root VG name is different, change the value in the script above as appropriate.

This script may be executed either at instance launch-time or anywhere in the life-cycle of an an instance. The above script takes the existing root VG name and tacks on a uniqueness component. In this case, the uniqueness is achieved by taking the IP address of the instance's primary interface and converting it to a hexadecimal string. So long as a group of systems does not contain any repeated primary IP addresses, this should provide a sufficient level of uniqueness for a group of deployed systems.

Note: renaming the root VG will _not_ solve the problems caused by PV UUID non-uniqueness. Currently, there is no known-good solution to this issue. The general recommendation is to avoid that problem by using a different template to build your recovery-host than used to build your broken host.