Titular Discrepancy: RedHat

Showing posts with label RedHat. Show all posts

Thursday, September 24, 2015

Simple Guacamole

If the enterprise you work for is like mine, access through the corporate firewall is tightly-controlled. You may find that only we-related protocols are left through (mostly) unfettered. When you're trying to work with a serivce like AWS, this can make management of Linux- and/or Windows-based resources problematic.

A decent solution to such a situation is the use of HTML-based remote connection gateway services. If all you're looking to do is SSH, the GateOne SSH-over-HTTP gateway is a quick and easy to setup solution. If you need to manage instances via graphical desktops - most typically Windows but some people like it for Linux as well - a better solution is Guacamole.

Guacamole is an extensible, HTTP-based solution. It runs as a Java servlet under a Unix hosted service like Tomcat. If you're like me, you may also prefer to encapsulate/broker the Tomcat service through a generic HTTP service like Apache or Nginx. My preference has been Apache - but mostly because I've been using Apache since not long after it was formally forked off of the NCSA project. I also tend to favor Apache because it's historically been part of the core repositories of my Linux of choice, Red Hat/CentOS.

Guacamole gives you HTTP-tunneling options for SSH, Telnet, RDP and VNC. this walk through is designed to get you quickly running Guacamole as an web-based SSH front end. Once you've got the SSH component running, adding other management protocols is easy. This procedure is also designed to be doable even if you don't yet actually have the ability to SSH to a AWS-hosted instance.

Start the AWS web console's "launch instance" wizard.
Select an appropriate EL6-based AMI.
Select an appropriate instance type (the free tier instances are suitable for a basic SSH proxy)
On the "Configure Instance" page, expand the "Advanced Details" section.
In the now-available text box, paste in the contents of this script. Note that this script is flexible enough that, if the version of Guacamole hosted via the EPEL project is updated, the script should continue to work. With a slight bit of massaging, the script could also be made to work with EL 7 and associated Tomcat and EPEL-hosted RPMs.
If the AMI you've picked does not provide the option of password-based logins for the default SSH user, add steps (in the "Advanced Details" text box) for creating an interactive SSH user with a password. Ensure that the user also has the ability to use `sudo` to get root privileges.
Finish up the rest of the process for deploying an instance.

Once the instance finishes deploying, you should be able to set your browser to the public hostname shown for the instance in the AWS console. Add "/guacamole/" after the hostname. Assuming all went well, you will be presented with a Guacamole login prompt. Enter the credentials:
Note that these credentials can be changing the:

printf "\t<authorize username=\"admin\" password=\"PASSWORD\">\n"

Line of the pasted-in script. Once you've authenticated to Guacamole, you'll be able to login to the hosting-instance via SSH using the instance's no-privileged user's credentials. Once logged in, you can escalate privileges and then configure additional authentication mechanisms and connection destinations and protocols.

Note: Guacamole doesn't currently support key-based login mechanisms. If key-based logins are a must make use of GateOne, instead.

Wednesday, March 25, 2015

So You Don't Want to BYOL

The Amazon Web Services MarketPlace is pretty awesome. There's oodles of pre-made machine templates to choose some. Even in the face of all that choice, it's not unusual to find that, of all the choices you have, none quite fit your needs. That's the scenario I found myself in.

Right now, I'm supporting a customer that's a heavy user of Linux for their business support systems. They're in the process of migrating from our legacy hosting environment to hosting things on AWS. During their development phase, use of CentOS was sufficient for their needs. As they move to production, however, they want "real" Red Hat Enterprise Linux.

Go up on the MarketPlace and there's plenty of options to choose from. However, my customer doesn't want to deal with buying a stand-alone entitlement to patch-support for their AWS-hosted systems. This requirement considerably cuts down on the useful choices in the MarketPlace. There's still "license included" Red Hat options to choose from.

Unfortunately, my customer also has fairly specific partitioning requirements that are not met by the "license included" AMIs. When using CentOS, this wan't a problem - CentOS's patch repos are open-access. Creating an AMI with suitable partitioning and access to those public repos is about a 20 minute process. While some of that process is leveragable for creating a Red Hat AMI, making the resultant AMI be "license included" is a bit more challenging.

When I tried to simply re-use my CentOS process, supplemented by the Amazon repo RPMs, I ended up with a system that, when I did a yum-query, got me 401 errors. I was missing something.

Google searches weren't terribly helpful in solving my problem. I found a lot of "how do I do this" posts, but damned few that actually included the answer. Ultimately, what it turns out to be is that if you generate your AMI from an EBS snapshot, instances launched from that AMI don't have an entitlement key to access the Amazon yum repos. You can see this by looking at your launched instance's metadata:

# curl http://169.254.169.254/latest/dynamic/instance-identity/document
{
  "accountId" : "717243568699",
  "architecture" : "x86_64",
  "availabilityZone" : "us-west-2b",
  "billingProducts" : null,
  "devpayProductCodes" : null,
  "imageId" : "ami-9df0ec7a",
  "instanceId" : "i-51825ba7",
  "instanceType" : "t1.micro",
  "kernelId" : "aki-fc8f11cc",
  "pendingTime" : "2015-03-25T19:04:51Z",
  "privateIp" : "172.31.19.148",
  "ramdiskId" : null,
  "region" : "us-east-1",
  "version" : "2010-08-31"
}

Specifically, what you want to look at is the value for "billingProducts". If it's "null", your yum isn't going to be able to access the Amazon RPM repositories. Where I came up close to empty on my Google searches was "how to make this attribute persist across images".

I found a small note in a community forum post indicating that AMIs generated from an EBS snapshot will always have "billingProducts" set to "null". This is due to a limitation in the tool used to register an image from a snapshot.

To get around this limitation, one has to create an AMI from a instance of an entitled AMI. Basically, after you've created the EBS you've readied to make a custom AMI, you do a disk-swap with a properly-entitled instance. You then use the "create image" option from that instance. Once you launch AMI you created via the EBS-swap, your instance's metadata will now look something like:

# curl http://169.254.169.254/latest/dynamic/instance-identity/document
{
  "accountId" : "717243568699",
  "architecture" : "x86_64",
  "availabilityZone" : "us-west-2b",
  "billingProducts" : [ "bp-6fa54006" ],
  "devpayProductCodes" : null,
  "imageId" : "ami-9df0ec7a",
  "instanceId" : "i-51825ba7",
  "instanceType" : "t1.micro",
  "kernelId" : "aki-fc8f11cc",
  "pendingTime" : "2015-03-25T19:04:51Z",
  "privateIp" : "172.31.19.148",
  "ramdiskId" : null,
  "region" : "us-east-1",
  "version" : "2010-08-31"
}

Once that "billingProducts" is set, the cloud-init related first-boot scripts will take that "billingProducts" and use it to register the system with the Amazon yum repos. Voilà: you now have a fully custom AMI that uses Amazon-billed access to Red Hat updates.

Note on Compatibility: the Red Hat provided PVM AMIs do not yield well to this method. The Red Hat provided PVM AMIs are all designed with their boot/root device set to /dev/sda1. To date, attempts to leverage the above techniques for PVM AMIs that require their boot/root device set to /dev/sda (used when using a single, partitioned EBS to host a bare /boot partition and LVM-managed root partitions) have not met with success.

Thursday, February 5, 2015

Attack of the Clones

One of the clients I do work for has a fairly significant Linux footprint. However, in these times of greater fiscal responsibility/austerity, my client is looking at even cheaper alternatives. This means that, for systems whose applications don't require Red Hat for warranty-support, CentOS is being subbed into their environment. This is particularly true for their testing environments. There've even been arguments for doing it in production, applications' vendor-support be damned, because "CentOS is the same as Red Hat"

I've previously argued, "they're very, very similar, but they're not truly identical". In particular, Red Hat handles CVEs and errata somewhat differently than CentOS does (Red Hat backports many fixes to prior EL releases, CentOS's stance is generally "upgrade it").

Today, I got bit by one place where CentOS hews far too closely to "the same as Red Hat Enterprise Linux". Specifically, I was using the `oscap` security tool to do a security audit of a test system. I should say, "I was struggling to use the `oscap` security tool...". With later versions of EL6, Red Hat, and as a derivative, CentOS, implement the CPE system for Linux.

This is all fine and good, except where the tools you use rely on the correctness of CPE-related definitions. By the standard of CPE, Red Hat and CentOS are very much not "the same". Because the security-auditing tool I was using (`oscap`) leverages CPEs and because the CentOS maintainers simply repackage the Red Hat furnished security profiles without updating the CPE call-outs, first, the security tool fails horribly. Every test comes back as "notapplicable".

To fix this situation, a bit of `sed`-fu is required:

mv /usr/share/xml/scap/ssg/content/ssg-rhel6-cpe-oval.xml \
   /usr/share/xml/scap/ssg/content/ssg-rhel6-cpe-oval.xml-DIST && \
cp /usr/share/xml/scap/ssg/content/ssg-rhel6-cpe-oval.xml-DIST \
   /usr/share/xml/scap/ssg/content/ssg-rhel6-cpe-oval.xml && \
sed -i '{
   s#Red Hat Enterprise Linux 6#CentOS 6##g
   s#cpe:/o:redhat:enterprise_linux:6#cpe:/o:centos:centos:6##g
}' /usr/share/xml/scap/ssg/content/ssg-rhel6-cpe-oval.xml


mv /usr/share/xml/scap/ssg/content/ssg-rhel6-xccdf.xml \
   /usr/share/xml/scap/ssg/content/ssg-rhel6-xccdf.xml-DIST && \
cp /usr/share/xml/scap/ssg/content/ssg-rhel6-xccdf.xml-DIST \
   /usr/share/xml/scap/ssg/content/ssg-rhel6-xccdf.xml && \
sed -i \
   's#cpe:/o:redhat:enterprise_linux#cpe:/o:centos:centos##g' \
/usr/share/xml/scap/ssg/content/ssg-rhel6-xccdf.xml

Once the above is done, running `oscap` actually produces useful results.

NOTE: Ironically, doing the above edits will cause the various SCAP profiles to flag an error when running the tests that verify that RPMs have been unaltered. I've submitted a bug to the CentOS group so these fixes are included in future versions of the CentOS OpenSCAP RPMs, but, until then, you just need to be aware that the `oscap` tool will flag the above two files.

...And if you found this page because you're trying to figure out how to run `oscap` to get results, here's a sample invocation that should act as a starting-point:

oscap xccdf eval --profile common --report \
   /var/tmp/oscap-report_`date "+%Y%m%d%H%M"`.html \
   --results /var/tmp/oscap-results_`date "+%Y%m%d%H%M"`.xml\
   --cpe /usr/share/xml/scap/ssg/content/ssg-rhel6-cpe-dictionary.xml \
   /usr/share/xml/scap/ssg/content/ssg-rhel6-xccdf.xml

Monday, December 29, 2014

Custom EL6 AMIs With a Root LVM

One of the customers I work for is a security-focused organization. As such, they try to follow the security guidelines laid out within the SCAP guidelines for the operating systems they deploy. This particular customer is also engaged in couple of "cloud" initiatives - a couple privately-hosted and one publicly-hosted option. For the publicly-hosted cloud initiative, they make use of Amazon Web Services EC2 services.

The current SCAP guidelines for Red Hat Enterprise Linux (RHEL) 6 draw the bulk of their content straight from the DISA STIGS for RHEL 6. There are a few differences, here and there, but the commonality between the SCAP and STIG guidance - at least as of the SCAP XCCDF 1.1.4 and STIG Version 1, Release 5, respectively - is probably just shy of 100% when measured on the recommended tests and fixes. In turn, automating the guidance in these specifications allow you to quickly crank out predictably-secure Red Hat, CentOS, Scientific Linux or Amazon Linux systems.

For the privately-hosted cloud initiatives, supporting this guidance was a straight-forward matter. The solutions my customer uses all support the capability to network-boot and provision a virtual machine (VM) from which to create a template. Amazon didn't provide similar functionality to my customer, somewhat limiting some of the things that can be done to create a fully-customized instance or resulting template (Amazon Machine Image - or "AMI" - in EC2 terminology).

For the most part this wasn't a problem to my customer. Perhaps the biggest sticking-point was that it meant that, at least initially, partitioning schemes used on the privately-hosted VMs couldn't be easily replicated on the EC2 instances.

Section 2.1.1 of the SCAP guidance calls for "/tmp", "/var", "/var/log", "/var/log/audit", and "/home" to each be on their own, dedicated partitions, separate from the "/" partition. On the privately-hosted cloud solutions, use of a common, network-based KickStart was used to carve the boot-disk into a /boot partition and an LVM volume-group (VG). The boot VG was then carved up to create the SCAP-mandated partitions.

With the lack of network-booting/provisioning support, it meant we didn't have the capability to extend our KickStart methodologies to the EC2 environment. Further, at least initially, Amazon didn't provide support for use of LVM on boot disks. The combination of the two limitations meant my customer couldn't easily meet the SCAP partioning requiremts. Lack of LVM meant that the boot disk had to be carved up using bare /dev/sdX devices. Lack of console defeated the ability to repartition an already-built system to create the requisite partitons on the boot disk. Initially, this meant that the AMIs we could field were limited to "/boot" and "/" partitions. This meant config-drift between the hosting environments and meant we had to get security-waivers for the Amazon-hosted environment.

Not being one who well-tolerates these kind of arbitrary-feeling deviances, I got to cracking with my Google searches. Most of what I found were older documents that focussed on how to create LVM-enabled, S3-backed AMIs. These weren't at all what I wanted - they were a pain in the ass to create, were stoopidly time-consuming to transfer into EC2 and the resultant AMIs hamstrung me on the instance-types I could spawn from them. So, I kept scouring around. In the comments section to one of the references for S3-backed AMIs, I saw a comment about doing a chroot() build. So, I used that as my next branch of Googling about.

Didn't find a lot for RHEL-based distros - mostly Ubuntu and some others. That said, it gave me the starting point that I needed to find my ultimate solution. Basically, that solution comes down to:

Pick an EL-based AMI from the Amazon Marketplace (I chose a CentOS one - I figured that using an EL-based starting point would ease creating my EL-based AMI since I'd already have all the tools I needed and in package names/formats I was already familiar with)
Launch the smallest instance-size possible from the Marketplace AMI (8GB when I was researching the problem)
Attach an EBS volume to the running instance - I set mine to the minimum size possible (8GB) figuring I could either grow the resultant volumes or, once I got my methods down/automated, use a larger EBS for my custom AMI.
Carve the attached EBS up into two (primary) partitions. I like using `parted` for this, since I can specify the desired, multi-partition layout (and all the offsets, partition types/labels, etc.) in one long command-string.
- I kept "/boot" in the 200-400MB range. Could probably keep it smaller since the plans weren't so much to patch instantiations as much as periodically use automated build tools to launch instances from updated AMIs and re-deploy the applications onto the new/updated instances.
- I gave the rest of the disk to the partition that would host my root VG.
I `vgcreate`d my root volume group, then carved it up into the SCAP-mandated partitions (minus "/tmp" - we do that as a tmpfs filesystem since the A/V tools that SCAP wants you to have tend to kill system performance if "/tmp" is on disk - probably not relevant in EC2, but consistency across environments was a goal of the exercise)
Create ext4 filesystems on each of my LVs and my "/boot" partition.
Mount all of the filesystems under "/mnt" to support a chroot-able install (i.e., "/mnt/root", "/mnt/root/var", etc.)
Create base device-nodes within my chroot-able install-tree (you'll want/need "/dev/console", "/dev/null", "/dev/zero", "/dev/random", "/dev/urandom", "/dev/tty" and "/dev/ptmx" - modes, ownerships and major/minor numbers should match what's in your live OS's)
Setup loopback mounts for "/proc", "/sys", "/dev/pts" and "/dev/shm",
Create "/etc/fstab" and "/etc/mtab" files within my chroot-able install-tree (should resemble the mount-scheme you want in your final AMI - dropping the "/mnt/root" from the paths)
Use `yum` to install the same package-sets to the chroot that our normal KickStart processes would install.
The `yum` install should have created all of your "/boot" files with the exception of your "grub.conf" type files.

Create a "/mnt/boot/grub.conf" file with vmlinuz/initramfs references matching the ones installed by `yum`.
Create links to your "grub.conf" file:
- You should have an "/mnt/root/etc/grub.conf" file that's a sym-link to your "/mnt/root/boot/grub.conf" file (be careful how you create this sym-link so you don't create an invalid link)
- Similarly, you'll want a "/mnt/root/boot/grub/grub.conf" linked up to "/mnt/root/boot/grub.conf" (not always necessary, but it's a belt-and-suspenders solution to some issues related to creating PVM AMIs)

Create a basic eth0 config file at "/mnt/root/etc/sysconfig/network-scripts/ifcfg-eth0". EC2 instances require the use of DHCP for networking to work properly. A minimal network config file should look something like:
```
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=on
IPV6INIT=no
```
Create a basic network-config file at "/mnt/root/etc/sysconfig/network". A minimal network config file should look something like:
```
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=localhost.localdomain
```
Append "UseDNS no" and "PermitRootLogin without-password" to the end of your "/mnt/root/etc/ssh/sshd_config" file. The former fixes connect-speed problems related to EC2's use of private IPs on their hosted instances. The latter allows you to SSH in as root for the initial login - but only with a valid SSH key (don't want to make newly-launched instances instantly ownable!)
Assuming you want instances started from your AMI to use SELinux:
- Do a `touch /mnt/root/.autorelabel`
- Make sure that the "SELINUX" value in "/mnt/root/etc/selinux/config" is set to either "permissive" or "enforcing"
Create an unprivileged login user within the chroot-able install-tree. Make sure a password is set and the the user is able to use `sudo` to access root (since I recommend setting root's password to a random value).
Create boot init script that will download your AWS public key into the root and/or maintenance user's ${HOME}/.ssh/authorized_keys file. At its most basic, this should be a run-once script that looks like:
```
curl -f http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key > /tmp/pubkey
install --mode 0700 -d ${KEYDESTDIR}
install --mode 0600 /tmp/pubkey ${KEYDESTDIR}/authorized_keys
```
Because "/tmp" is an ephemeral filesystem, the next time the instance is booted, the "/tmp/pubkey" will self-clean. Note that an appropriate destination-directory will need to exist

Clean up the chroot-able install-tree:

yum --installroot=/mnt/root/ -y clean packages
rm -rf /mnt/root/var/cache/yum
rm -rf /mnt/root/var/lib/yum
cat /dev/null > /mnt/root/root/.bash_history

Unmount all of the chroot-able install-tree's filesystems.
Use `vgchange` to deactivate the root VG
Using the AWS console, create a snapshot of the attached EBS.
Once the snapshot completes, you can then use the AWS console to create an AMI from the EBS-snapshot using the "Create Image" option. It is key that you set the "Root Device Name", "Virtualization Type" and "Kernel ID" parameters to appropriate values.
- The "Root Device Name" value will auto-populate as "/dev/sda1" - change this to "/dev/sda"
- The "Virtualization Type" should be set as "Paravirtual".
- The appropriate value for the "Kernel ID" parameter will vary from AWS availability-region to AWS availability-region (for example, the value for "US-East (N. Virginia)" will be different from the value for "EU (Ireland)"). In the drop-down, look for a description field that contains "pv-grub-hd00". There will be several. Look for the highest-numbered option that matches your AMIs architecture (for example, I would select the kernel with the description "pv-grub-hd00_1.04-x86_64.gz" for my x86_64-based EL 6.x custom AMI).
The other Parameters can be tweaked, but I usually leave them as is.
Click the "Create" button, then wait for the AMI-creation to finish.
Once the "Create" finishes, the AMI should be listed in your "AMIs" section of the AWS console.
Test the new AMI by launching an instance. If the instance successfully completes its launch checks and you are able to SSH into it, you've successfully created a custom, PVM AMI (HWM AMIs are fairly easily created, as well, but require some slight deviations that I'll cover in another document).

I've automated much of the above tasks using some simple shell scripts and the Amazon EC2 tools. Use of the EC2 tools is well documented by Amazon. Their use allows me to automate everything within the instance launched from the Marketplace AMI (I keep all my scripts in Git, so, prepping a Marketplace AMI for building custom AMIs takes maybe two minutes on top of launching the generic Marketplace AMI). When automated as I have, you can go from launching your Marketplace AMI to having a launchable custom AMI in as little as twenty minutes.

Properly automated, generating updated AMIs as security fixes or other patch bundles come out is as simple as kicking off a script, hitting the vending machine for a fresh Mountain Dew, then coming back to launch new, custom AMIs.

Wednesday, November 26, 2014

Converting to EL7: Solving The "Your Favorite Service Isn't Systemd-Enabled" Problem

Having finally gotten off my butt to knock out getting my RHCE (for EL6) before the 2014-12-19 drop-dead date, I'm finally ready to start focusing on migrating my personal systems to EL7-based distros.

My personal VPS is currently running CentOS 6.6. I use my VPS to host a couple of personal websites and email for family and a few friends. Yes, I realize that it would probably be easier to offload all of this to providers like Google. However, while Google is very good at SPAM-stomping, and provides me a very generous amount of space for archiving emails, one area that they do lack for is email aliases: whenever I have to register to a new web-site, I use a custom email address to do so. At my last pruning, I still had 300+, per-site, aliases. So, for me, number of available aliases ("unlimited" is best) and ease of creating them trumps all other considerations.

Since I don't have Google handling my mail for me, I have to run my own A/V and anti-spam engines. Being a good Internet Citizen, I also like to make use of Sender Policy Framework (via OpenSPF) and DomainKeys (currently via DKIMproxy).

I'm only just into the process of sorting out what I need to do to make the transition as quick and as painless (more for my family and friends than me) a process as possible. I hate outages. And, with a week off for the Thanskgiving holidays, I've got time to do things in a fairly orderly fashion.

At any rate, one of the things I discovered is that my current DomainKeys solution hasn't been updated to "just work" within the systemd framework used within EL7. This isn't terribly surprising, as it appears that the DKIMproxy SourceForge project may have gone dormant, in 2013 (so, I'll have to see if there's alternatives that have the appearance of still being a going concern - in the mean time...) Fortunately, the DKIMproxy source code does come with a `chkconfig` compatible SysV-init script. Even more fortunately, converting from SysV-init to a systemd-compatible service control is a bit more straight forward than when I was dealing with moving from Solaris 9's legacy init to Solaris 10's SMF.

If you've already got a `chkconfig` style init script, moving to systemd-managed is fairly trivial. Your `chkconfig` script can be copied, pretty much "as is" into "/usr/lib/systemd". My (current) preference is to create a "scripts" subdirectory and put it in there. Haven't read deeply enough into systemd to see if this is the "Best Practices" method, however. Also, where I work has no established conventions ...because they only started migrating to EL6 in fall of 2013 - so, I can't exactly crib anything EL7-related from how we do it at work.

Once you have your SysV-init style script placed where it's going to live (e.g., "/usr/lib/systemd/scripts"), you need to create associated service definition files. In my particular case, I had to create two as the DKIMproxy software actually has an inbound and an outbound funtion. Launched from normal SysV-init, it all gets handled as one piece. However, one of the nice things about systemd is it's not only a launcher framework, it's a service monitor framework, as well. To take full advantage, I wanted one monitor for the inbound service and one for the outbound service. The legacy init script that DKIMproxy ships with makes this easy enough as, in addition to the normal "[start|stop|restart|status]" arguments, it had per-direction subcommand (e.g., "start-in" and "stop-out"). The service-definition for my "dkim-in.service" looks like:

[Unit]
     Description=Manage the inbound DKIM service
     After=postfix.service


     [Service]
     Type=forking
     PIDFile=/usr/local/dkimproxy/var/run/dkimproxy_in.pid
     ExecStart=/usr/lib/systemd/scripts/dkim start-in
     ExecStop=/usr/lib/systemd/scripts/dkim stop-in


     [Install]
     WantedBy=multi-user.target

To break down the above:

The "Unit" stanza tells systemd a bit about your new service:

The "Description" line is just ASCII text that allows you to provide a short, meaningful of what the service does. You can see your service's description field by typing `systemctl -p Description show <SERVICENAME>`
The "After" parameter is a space-separated list of other services that you want to have successfully started before systemd attempts to start your new service. In my case, since DKIMproxy is an extension to Postfix, it doesn't make sense to try to have DKIMproxy running until/unless Postfix is running.

The "Service" stanza is where you really define how your service should be managed. This is where you tell systemd how to start, stop, or reload your service and what PID it should look for so it knows that the service is still notionally running. The following parameters are the minimum ones you'd need to get your service working. Other parameters are available to provide additional functionality:

The "Type" parameter tells systemd what type of service it's managing. Valid types are: simple, forking, oneshot, dbus, notify or idle. The systemd.service man page more-fully defines what each option is best used for. However, for a traditional daemonized service, you're most likely to want "forking".
The "PIDFile" parameter tells systemd where to find a file containing the parent PID for your service. It will then use this to do a basic check to monitor whether your service is still running (note that this only checks for presence, not actual functionality).
The "ExecStart" parameter tells systemd how to start your service. In the case of a SysV-init script, you point it to the fully-qualified path you installed your script to and then any arguments necessary to make that script act as a service-starter. If you don't have a single, chkconfig-style script that handles both stop and start functions, you'd simply give the path to whatever starts your service. Notice that there are no quotations surrounding the parameter's value-section. If you put quotes - in the mistaken belief that the starter-command and it's argument need to be grouped, you'll get a path error when you go to start your service the first time.
The "ExecStop" parameter tells systemd how to stop your service. As with the "ExecStart" parameter, if you're leveraging a fully-featured SysV-init script, you point it to the fully-qualified path you installed your script to and then any arguments necessary to make that script act as a service-stopper. Also, the same rules about white-space and quotation-marks apply to the "ExecStop" parameter as do the "ExecStart" parameter.

The "Install" stanza is where you tell systemd the main part of the service dependency-tree to put your service. You have two main dependency-specifiers to choose: "WantedBy" and "RequiredBy". The former is a soft-dependency while the latter is a hard-dependency. If you use the "RequiredBy" parameter, then the service unit-group (e.g., "mult-user.target") enumerated with the "RequiredBy" parameter will only be considered to have successfully onlined if the defined service has successfully launched and stayed running. If you use the "WantedBy" parameter, then the service unit-group (e.g., "mult-user.target") enumerated with the "WantedBy" parameter will still be considered to have successfully onlined whether the defined service has successfully launched or stayed running. It's most likely you'll want to use "WantedBy" rather than "RequiredBy" as you typically won't want systemd to back off the entire unit-group just because your service failed to start or stay running (e.g., you don't want to stop all of the multi-user mode related processes just because one network service has failed.)

Tuesday, June 17, 2014

UDEV Friendly-Names to Support ASM Under VMware-hosted Linux Guest

This past month or so, we've been setting up a new vSphere hosting environment for a new customer. Our first guinnea-pig tenant is being brought into the virtualized hosting-environment. This first tenant has a mix of Windows and Linux systems running a multi-layer data-processing system based on a back-end Oracle database.

As part of our tenancy, process, we'd come up with a standard build request form. In general, we prefer a model that separates application data from OS data. In addition to the usual "how much RAM and CPU do you need" information, the form includes configuration-capture items for storage for applications hosted on the VMs. The table has inputs for both the requested supplemental storage sizes and where/how to mount those chunks.

This first tenant simply filled in a sum of their total additional storage request with no indication as to how they expected to use it. After several iterations of "if you have specific requirements, we need you to detail them" emails, I sent a final "absent the requested configuration specifications, the storage will be added but left unconfigured". It was finally at this point that the tenant responded back saying "there's a setup guide at this URL - please read that and configure accordingly".

Normally, this is not how we do things. The solution we offer tends to be more of a extended IaaS model: in addition to providing a VM container, we provide a basic, hardened OS configuration (installing an OS and patching it to a target-state) configure basic networking and name-resolution and perform basic storage configuration tasks.

This first tenant was coming from a physical Windows/Red Hat environment and were testing the waters of virtualization. As a result, most of their configuration expectations were based on physical servers (SAN based storage with native multipathing-support). The reference documents they pointed us to were designed for implementing Oracle on a physical system using ASM on top of Linux dm-multipath storage objects ...not something normally done within an ESX-hosted Red Hat Linux configuration.

We weren't going to layer-on dm-multipath support, but the tenant still had the expectation of using "friendly" storage object names for ASM. The easy "friendly" storage object name path is to use LVM. However, Oracle generally recommends against using ASM in conjunction with third-party logical volume management systems. So, LVM was off the table. How best to give the desired storage configs?

I opted to let udev do the work for me. Unfortunately, because we weren't anticipating this particular requirements-set, the VM templates we'd created didn't have some of the hooks available that would allow udev to do its thing. Specifically, no UUIDs were being presented into the Linux guests. Further complicating things is the fact that, with the hardened Linux build we furnish, most of the udev tools and the various hardware information tools are not present. Down side is that it made things more difficult than they probably absolutely needed to be. The up side is the following procedures should be portable across a fairly wide variety of Linux implementations:

To have VMware provide serial number information - from which UUIDs can be generated by the guest operating system, it's necessary to make a modification to the VM's advance configuration options. Ensure that the “disk.EnabledUUID” has been created for the VM and the value set to “TRUE”. Specific method for doing so varies depending on whether you use the vSphere web UI or the VPX client (or even the vmcli or direct editing of config files) to do your configuration tasks. Google for the specifics of your preferred management method.
If the you had to create/change the value in the prior step, reboot the VM so that the config changes take effect
Present the disks to be used by ASM to the Linux guest – if adding SCSI controllers, this step will need to be done while guest is powered off.
Verify that VM is able to see new VMDKs. If suplemental disk presentation was done while the VM was running, initiate a SCSI-bus rescan (e.g., `echo "- - -" > /sys/class/scsi_host/host1/rescan`)
Lay :down an aligned, full-disk partition with the `parted` utility for each presented VMDK/disk. For example, if one of the newly-presented VMDKs was seen by the Linux OS as /dev/sdb:

# parted –s /dev/sdb -- mklabel msdos mkpart primary ext3 1024s 100%

Specifying an explicit starting-block (at 1024 or 2048 blocks) and using the relative ending-location, as above, will help ensure that your partition is created on an even storage-boundary. Google around for discussions on storage alignment and positive impact on virtualization environments for details on why the immediately-prior is usually a Good Thing™.
Ensure that the “options=” line in the “/etc/scsi_id.config” file contains the “-g” option
For each newly-presented disk, execute the command `/sbin/scsi_id -g -s /block/{sd_device}` and capture the output.
Copy each disk’s serial number (obtained in the prior step) is copied into the “/etc/udev.d/rules.d/99-oracle-udev.rules” file
Edit the “/etc/udev.d/rules.d/99-oracle-udev.rules” file, ensuring that each serial number has an entry similar to:

KERNEL=="sd*",BUS=="scsi",ENV{ID_SERIAL}=="{scsi_id}", NAME="ASM/disk1", OWNER="oracle", GROUP="oinstall", MODE="660"

The "{scsi_id}" shown above is a variable name: substitute with the values previously captured via the `/sbin/scsi_id` command. The "NAME=" field should be similarly edited to suite and should be unique for each SCSI serial number.

Note: If attempting to make per disk friendly-names (e.g., “/dev/db1p1”, “/dev/db2p1”, “/dev/frap1”, etc.) it will be necessary to match LUNs by size to appropriate ‘NAME=’ entries
Reboot the system so that the udev service can process the new rule entries
Verify that the desired “/dev/ASM/<NAME>” entries exist
Configure storage-consumers (e.g., “ASM”) to reference the aligned udev-defined device-nodes.

If your Linux system has various hardware information tools, udev management interfaces and the sg3tools installed, some tasks for finding information are made much easier and some of the reboot-steps specified in this document become unnecessary.

Thursday, June 12, 2014

Template-Deployed VMs and the "When Was I Built" Problem

For the past number of years, I have been supporting Linux systems hosted within various virtualization environments. Most of these environments have made used of template-based VM deployment.

In a large, dynamic, enterprise-scale environment, the question often comes up, "when was this host built". In such environments, there may be a number of methods to derive such information - hypervisor management server logs, service-automation engine logs, etc. However, such data can also be somewhat ephemeral due to things as small as log-truncation up through replacement of service-automation and configuration-management tools/fraemeworks.

Fortunately, the Enterprise Linux family of Linux distributions (Red Hat, CentOS Scientific Linux, etc.), offers a fairly stable method for determining when a system was first provisioned. Whenever you first build an ELx-based system, one of the files that gets installed - and then never gets updated - is the "basesystem" RPM. So, if you look at the install date for this RPM (and the system time was correctly-set at its installation time), you will have an accurate representation of when the system was built.

That said, it had previously-occurred to me (a while ago, actually) that the “deploy from template” method of building Linux VMs precludes using the rpm database from determining system build time. Unlike with a KickStarted system - where you can always run `rpm -q --qf '%{installtime:date}\n' basesystem` and it will give you the install-date for the system - doing so on a template-built system will mislead you. When deployed from a template, that method returns when the template VM was built, not when the running VM was deployed from that template.

This had been bugging me for several years now. I'd even posed the question of how to solve it on a few forums to no avail (a number of respondents hadn't been aware of the "use basesystem to show my system install-date" trick so hadn't investigated how to solve a problem they didn't know existed). One day, while I was at our engineering lab and was waiting for some other automated tasks to run, I had one of those "I wonder if this will work" moments that allowd me to finally figure out how to “massage” the RPM database so that the basesystem RPM can reflect a newer install date:

# rpm -q --qf '%{installtime:date}\n' basesystem
Tue 12 Jul 2011 11:24:06 AM EDT
# rpm -i --force --justdb basesystem-10.0-4.el6.noarch.rpm
# rpm -q --qf '%{installtime:date}\n' basesystem
Wed 11 Jun 2014 09:21:13 PM EDT

Thus, if you drop something similar to the above into your VM's system prep/cloudinit/etc. scripts, your resultant VM will have its instantiation-date captured and not just its template-build date.

Tuesday, September 3, 2013

Password Encryption Methods

As a systems administrator, there are times where you have to find programatic ways to update passwords for accounts. On some operating systems, your user account modification tools don't allow you to easily set passwords in a programmatic fashion. Solaris used to be kind of a pain, in this regard, when it came time to do an operations-wide password reset. Fortunately, Linux is a bit nicer about this.

The Linux `usermod` utility allows you to (relatively easily) specify password in a programmatic fashion. The one "gotcha" of the utility is the requirement to use hashed password-strings rather than cleartext. The question becomes, "how best to generate those hashes".

The answer will likely depend on your security requirements. If MD5 hashes are acceptable, then you can use OpenSSL or the `grub-md5-crypt` utilities to generate them. If, however, your security requirements require SHA256- or even SHA512-based hashes neither of those utilities will work for you.

Newer Linux distributions (e.g. RHEL 6+) essentially replace the `grub-md5-crypt` utility with the `grub-crypt` utility. This utility supports not only the older MD5 that its predecessor supported, but also SHA256 and SHA512.

However, what do you do when `grub-crypt` is missing (e.g., you're running RedHat 5.x) or you just want one method that will work across different Linux versions (e.g., your operations environment consists of a mix of RHEL 5 and RHEL 6 systems)? While you can use a tool like `openssl` to do the dirty work, if your security requirements dictate an SHA-based hashing algorithm, it's not currently up to the task. If you want the SHAs in a cross-distribution manner, you have to leverage more generic tools like Perl or Python.

The following examples will show you how to create a hashed password-string from the cleartext password "Sm4<kT*TheFace". Some tools (e.g., OpenSSL's "passwd" command) allowed you to choose to use a fixed-salt or a random-salt. From the standpoint of being able to tell "did I generate this script", using a fixed-salt can be useful; however, using a random-salt may be marginally more secure. The Perl and Python methods pretty much demand the specification of a salt. In the examples below, the salt I'm using is "Ay4p":

Perl (SHA512) Method: `perl -e 'print crypt("Sm4<kT*TheFace", "\$6\$Ay4p\$");'`
Python (SHA512) Method: `python -c 'import crypt; print crypt.crypt(""Sm4<kT*TheFace", "$6$Ay4p$")'

Note that you specify the encryption-type used by specifying an numerical representation of the standard encryption-types. The standard encryption types for Linux operating systems (from the crypt() manpage):

1 = MD5
2a = BlowFish (not present in all Linux distributions)
5 = SHA256 (Linux with GlibC 2.7+)
6 = SHA512 (Linux with GlibC 2.7+)

Thursday, August 9, 2012

Why So Big

Recently, while working on getting a software suite ready for deployment, I had to find space in our certification testing environment (where our security guys scan hosts/apps and decide what needs to be fixed for them to be safe to deploy). Our CTA environment is unfortunately-tight on resources. The particular app I have to get certified wants 16GB or RAM to run in but will accept as little as 12GB (less than that and the installer utility aborts).

When I went to submit my server (VM, actually) requirements to our CTA team so they could prep me an appropriate install host, they freaked. "Why does it take so much memory" was the cry. So, I dug through the application stack.

The application includes an embedded Oracle instance that wants to reserve about 9GB for its SGA and other set-asides. It's going on a 64bit RedHat server and RedHat typically wants 1GB of memory to function acceptably (can go down to half that, but you won't normally be terribly happy). That accounted for 10GB of the 12GB minimum the vendor was recommending.

Unfortunately, the non-Oracle components of the application stack didn't seem to have a single file that described memory set asides. It looked like it was spinning up two Java processes with an aggregate heap size of about 1GB.

Added to the prior totals, the aggregated heap sizes put me at about 11GB of the vendor-specified 12GB. That still left an unaccounted for 1GB. Now, it could have been the vendor was requesting 12GB because it was a "nice round number" or they could have been adding some slop to their equations to give the app a bit more wiggle-room.

I could have left it there, but decided, "well, the stack is running, lets see how much it really uses". So, I fired up top. Noticed that the Oracle DB ran under one userid and that the rest of the app-stack ran under a different one. I set top to look only at the userid used by the rest of the app-stack. The output was too long to fit on one screen and I was too lazy to want to add up the RSS numbers, myself. Figured since top wasn't a good avenue, I might be able to use ps (since the command supports the Berkeley-style output options).

Time to hit the man pages...

After digging through the man pages and a bit of cheating (Google is your friend) I found the invocation of ps that I wanted:

`ps -u <appuser> -U <appuser> -orss=`.

Horse that to a nice `awk '{ sum += 1 } END { print sum}' and I had a quick method of divining how much resident memory the application was actually eating up. What I found was that the app-stack had 52 processes (!) that had about 1.7GB of resident memory tied up. Mystery solved.

Tuesday, July 31, 2012

Finding Patterns

I know that most of my posts are of the nature "if you're trying to accomplish 'X' here's a way that you can do it". Unfortunately, this time, my post is more of a, "I've got yet to be solved problems going on". So, there's no magic bullet fix currently available for those with similar problems who found this article. That said, if you are suffering similar problmes, know that you're not alone. Hopefully that's small consolation and what follows may even help you in investigating your own problem.

Oh: if you have suggestions, I'm all ears. Given the nature of my configuration, there's not much in the way of useful information I've yet found via Google. Any help would be much appreciated...

The shop I work for uses Symantec's Veritas NetBackup product to perform backups of physical servers. As part of our effort to make more of the infrastructure tools we use more enterprise-friendly, I opted to leverage NetBackup 7.1's NetBackup Access Control (NBAC) subsystem. On its own, it provides fine-grained rights-delegation and role-based access control. Horse it to Active Directory and you're able to roll out a global backup system with centralized authentication and rights-management. That is, you have all that when things work.

For the past couple months, we've been having issues with one of the dozen NetBackup domains we've deployed into our global enterprise. When I first began trougleshooting the NBAC issues, the authentication and/or authorization failures had always been associated with a corruption of LikeWise's sqlite cachedb files. At the time the issues first cropped up, these corruptions always seemed to coincide with DRS moving the NBU master server from one ESX host to another. It seemed like, when under sufficiently heavy load - the kind of load that would trigger a DRS event - LikeWise didn't respond well to having the system paused and moved. Probably something to do with the sudden apparent time-jump that happens when a VM is paused for the last parts of the DRS action. My "solution" to the problem was to disable automated-relocation for my VM.

This seemed to stabilize things. LikeWise was no longer getting corrupted and seems like I'd been able to stabilize NBAC's authentication and authorization issues. Well, they stabilized for a few weeks.

Unfortunately, the issues have begun to manifest themselves, again, in recent weeks. We've now had enough errors that some patterns are starting to emerge. Basically, it looks like something is horribly bogging the system down around the time that the nbazd crashes are happening. I'd located all the instances of nbazd crashing from its log files ("ACI" events are loged to the /usr/openv/netbackup/logs/nbazd logfiles), and then began to try correlating them with system load shown by the host's sadc collections. I found two things: 1) I probably need to increase my sample frequency - it's currently at the default 10-minute interval - if I want to more-thoroughly pin-down and/or profile the events; 2) when the crashes have happened within a minute or two of an sadc poll, I've found that the corresponding poll was either delayed by a few seconds to a couple minutes or was completely missing. So, something is causing the server to grind to a standstill and nbazd is a casualty of it.

For the sake of thoroughness (and what's likely to have matched on a Google-search and brought you here), what I've found in our logs are messages similar to the following:

/usr/openv/netbackup/logs/nbazd/vxazd.log

07/28/2012 05:11:48 AM VxSS-vxazd ERROR V-18-3004 Error encountered during ACI repository operation.
07/28/2012 05:11:48 AM VxSS-vxazd ERROR V-18-3078 Fatal error encountered. (txn.c:964)
07/28/2012 05:11:48 AM VxSS-vxazd LOG V-18-4204 Server is stopped.
07/30/2012 01:13:31 PM VxSS-vxazd LOG V-18-4201 Server is starting.

/usr/openv/netbackup/logs/nbazd/debug/vxazd_debug.log

07/28/2012 05:11:48 AM Unable to set transaction mode. error = (-1)
07/28/2012 05:11:48 AM SQL error S1000 -- [Sybase][ODBC Driver][SQL Anywhere] Connection was terminated
07/28/2012 05:11:48 AM Database fatal error in transaction, error (-1)
07/30/2012 01:13:31 PM _authd_config.c(205) Conf file path: /usr/openv/netbackup/sec/az/bin/VRTSaz.conf
07/30/2012 01:22:40 PM _authd_config.c(205) Conf file path: /usr/openv/netbackup/sec/az/bin/VRTSaz.conf

Our NBU master servers are hosted on virtual machines. It's a supported configuration and adds a lot of flexibility and resiliency to the overall enterprise-design. It also means that I have some additional metrics available to me to check. Unfortunately, when I checked those metrics, while I saw utilization spikes on the VM, those spikes corresponded to healthy operations of the VM. There weren't any major spikes (or troughs) during the grind-downs. So, to ESX, the VM appeared to be healthy.

At any rate, I've requested our ESX folks see if there might be anything going on on the physical systems hosting my VM that aren't showing up in my VM's individual statistics. I'd previously had to disable automated DRS actions to keep LikeWise from eating itself - those DRS actions wouldn't have been happening had the hosting ESX system not been experiencing loading issues - perhaps whatever was causing those DRS actions is still afflicting this VM.

I've also tagged one of our senior NBU operators to start picking through NBU's logs. I've asked him to look to see if there are any jobs (or combinations of jobs) that are always running during the bog-downs. If it's a scheduling issue (i.e., we're to blame for our problems), we can always reschedule jobs to exert less loading or we can scale up the VM's memory and/or CPU reservations to accommodate such problem jobs.

For now, it's a waiting-game. At least there's an investigation path, now. It's all in finding the patterns.

Wednesday, January 18, 2012

Quick-n-Dirty Yum Repo via HTTP

Recently, we had a bit of a SNAFU in the year-end renewal of our RedHat support. As a result, all of the RHN accounts tied to the previous contract lost access to RHN's software download repositories. This meant that things like being able to yum-down RPMs on rhn_register'ed systems no longer worked and, we couldn't log into RHN and do a manual download, either.

Fortunately, because we're on moderately decent terms with RedHat and they know that the contract eventually will get sorted out, they were willing to help us get through our current access issues. Moving RHN accounts from one permanent contract to another, after first associating them with some kind of temporary entitlement is a paperwork-hassle for all parties involved and is apt to get your account(s) mis-associated down the road. Since all parties knew that this was a temporary problem but needed an immediate fix, our RedHat representative furnished us with the requisite physical media necessary to get us through till the contracts could get sorted out.

Knowing that I wasn't the only one that might need the software and that I might need to execute some burndown-rebuilds on a particular project I was working on, I wanted to make it easy to pull packages to my test systems. We're in an ESX environment, so, I spun up a small VM (only 1GB of virtual RAM, 1GHz of virtual CPU, a couple Gigabytes of virtual disk for a root volume and about 20GB of virtual disk to stage software onto and build an RPM repository on) to act as a yum repository server.

After spinning this basic VM, I had to sort out what to do as far as getting that physical media turned into a repo. I'm not a big fan of copying CDs as a stream of discrete files (been burned, too many times, by over-the-wire corruption, permission issues and the like). So, I took the DVD and made an ISO from it. I then took that ISO and scp'ed it up to the VM.

Once I had the ISO file copied up to the VM, did a quick mount of it (`mount -t iso9660 -o loop,ro /tmp/RHEL5.7_x86_64.iso /mnt/DVD` for those of you playing at home). Once I mounted it, I did a quick copy of its contents to the filesystem I'd set aside for it. I'm kind of a fan of cpio for things like this, so I cd'ed into the root of the mounted ISO and did a `find . -print | cpio -pmd /RepoDir` to create a good copy of my ISO data into a "real" filesystem (note, you'll want to make sure you do a `umask 022` first to ensure that the permission structures from the mounted ISO get copied, intact, along with the files, themselves).

With all of the DVD's files copied to the repo-server and into a writeable filesystem, it's necessary to create all the repo structures and references to support use by yum. Our standard build doesn't include the createrepo tool, so, first I had to locate its RPM in the repo filessytem and then install it onto my repo-server. Doing a quick `find . -name "*createrepo*rpm"` while cd'ed into repo fileystem turned up the path to the requisite RPM. I then did an `rpm -Uh [PathFromFind]` to install the createrepo tool's RPM files.

The createrepo tool is a nifty little tool. You just cd into the root of the directory where you copied your media to, do a `createrepo .`, and it scans the directory structures to find all the RPMs and XMLs and other files and creates the requisite data structures and pointers that allow yum to know how to pull the appropriate RPMs from the filesystem.

Once that's done, if all you care about is local access to the RPMs, you can create a basic .repo file in /etc/yum.repos.d that uses a "baseurl=file:///Path/To/Media" directive in it.

In my case, I wanted to make my new repo available to other hosts at the lab. Easiest way to make the repo available over the network is to do so via HTTP. Our standard build doesn't include the standard RedHat HTTP server, by default. So, I manually installed the requisite RPMs from the repo's filesystem. I modified the stock /etc/httpd/conf/httpd.conf and added the folowing stanzas to it:

Alias /Repo/ "/RepoDir/"
<Directory "/RepoDir">
   Options Indexes MultiViews
   AllowOverride None
   Order allow,deny
   Allow from all
</Directory>

[Note: this is probably a looser configuration than I'd have in place if I was making this a permanent solution, but this was just meant as a quick-n-dirty workaround for a temporary problem.]

I made sure to do a `chkconfig httpd on` and then did a `service httpd start` to activate the web server. I then took my web browser and made sure that the repo filesystem's contents were visable via web client. It wasn't: I forgot that our standard build has port 80 blocked by default. I did the requisite juju to add an exception to iptables for port 80 and all was good to go.

With my RPMs (etc.) now visable via HTTP, I logged into the VM that I was actually needing to install RPMs to via yum. I escalated privileges to root and created an /etc/yum.repos.d/LAB.repo file that looked similar to the following:

[lab-repo]
name=RHEL 5.7
baseurl=http://repovm.domain.name/Repo
enabled=1
gpgcheck=0

I did a quick cleanup of the consuming VM's yum repo information with a `yum clean all` and then verified taht my consuming VM was able to properly see the repos's data by doing a `yum list`. All was good to go. Depending on how temporary this actually ends up being, I'll go back and make my consuming VM's .repo file a bit more "complete" and more properly layout the repo-server's filesystem and HTTP config.

Wednesday, July 6, 2011

Show Me the Boot Info!

For crusty old systems administrators (such as yours truly), the modern Linux boot sequence can be a touch annoying. I mean, the graphical boot system is pretty and all, but, I absolutely hate having to continually click on buttons just to see the boot details. And, while I know that some Linux distributions give you the option of viewing the boot details by either disabling the graphical boot system completely (i.e., nuke out the "rhgb" option from your grub.conf's kernel line) or switching to an alternate virtual console configured to show boot messages, that's just kind of a suck solution. Besides, if your default Linux build is like the one my company uses, you don't even have the alternate VCs as an option.

Now, this is a RedHat-centric blog, since that's what we use at my place of work (we've a few devices that use embedded SuSE, but, I probably void the service agreement any time I directly access the shell on those!). So, my "solution" is going to be expressed in terms of RedHat (and, by extension, CentOS, Scientific Linux, Fedora and a few others). For many things in RedHat, they give you nifty files in /etc/sysconfig that allow you to customize behaviors. So, I'd made the silly assumption that there'd be an /etc/sysconfig/rhgb type of file. No such luck. So, I dug around in the init scripts (grep -li is great for this, by the way) to see if there were any mentions of tt>rhgb. There was. Well, there was mention of rhgb-client in /etc/init.d/functions.

Unfortunately, even though our standard build seems to include manual pages for every installed component, I couldn't find a manual page for rhgb-client (or an infodoc, for that matter). The best I was able to find was a /usr/share/doc/rhgb-${VERSION}/HOW_IT_WORKS file (I'm assuming that ${VERSION} is consistent with the version of the RHGB RPM installed - it seemed to be). While an interesting read, it's not exactly the best, most exhaustive document I've ever read. It's about what you'd expect from a typical README file, I guess. Still, it didn't display what, if any, arguments that the rhgb-client would take.

Not wanting to do anything too calamitous, I called `rhgb-client --help` as a non-privileged user. I was gladdened to see that it didn't give me one of those annoying "you must be root to run this command" errors. It also gave some usage details:

rhgb-client --help
Usage: rhgb-client [OPTION...]
  -u, --update=STRING      Update a service's status
  -d, --details=STRING     Show the details page (yes/no).
  -p, --ping               See if the server is alive
  -q, --quit               Tells the server to quit
  -s, --sysinit            Inform the server that we've finished rc.sysinit

Help options:
  -?, --help               Show this help message
  --usage                  Display brief usage message

I'd hoped that since /etc/init.d/functions had shown an "--update" argument, it might take other arguments (and, correctly, assumed one would be "--help"). So, I used the above and updated my /etc/init.d/functions script and added "--details=yes" and rebooted. Lo and behold: I get the graphical boot session but get to see all the detailed boot messages, too! Hurrah.

Still, it seemed odd that, since the RHGB components are (sorta) configurable, there wasn't a file in /etc/sysconfig to set the requisite options. I hate having to hack config files that are likely to get overwritten the next time the associated RPM gets updated. I also figure that I can't be the only person out there that wants the graphical boot system and details. So, why havent the RHGB maintainers fixed this (and, yes, I realize that Linux is a community thing and I'm free to contribute fixes to it - I'd just hoped that someone like RedHat or SuSE would have had enough complaints from commercial UNIX converts to have already done it for me)? Oh well, one of these days, I suppose.

Thursday, May 5, 2011

Linux Active Directory Integration and PAM

Previously, I've written about using LikeWise to provide Active Directory integration to Linux and Solaris hosts. One of the down sides of LikeWise (and several other similar integration tools) is that it tends to make it such that, if a user has an account in Active Directory, they can log into the UNIX or Linux boxes you've bound to your domain. In fact, while walking someone through setting up LikeWise with the automated configuration scripts I'd written, that person asked, "you mean anyone with an AD account can log in?"

Now, this had occurred to me when I was testing the package for the engineer who was productizing LikeWise for our enterprise build. But, it hadn't really been a priority, at the time. Unfortunately, when someone who isn't necessarily a "security first" kind of person hits you with that question/observation, you know that the folks for whom security is more of a "Job #1" are eventually going to come for you (even if you weren't the one who was responsible for engineering the solution). Besides, I had other priorities to take care of.

This week was a semi-slack week at work. There was some kind of organizational get-together going on that had most of the IT folks out of town discussing global information technology strategies. Fortunately, I'd not had to take part in that event. So, I've spent the week revisiting some stuff I'd rolled out (or been part of the rollout of) but wasn't completely happy with. The "AD integration giving everyone access" thing was one of them. So, I began by consulting the almighty Google. When I'd found stuff I that seemed promising, I fired up a test VM and started trying it out.

Now, SSH (and several other services) weren't really a problem. Many applications allow you to internally regulate who can use the service. For example, with OpenSSH, you can modify the sshd_config file to explicitly define which users and groups can and cannot access your box through that service (for those of you who hit this page looking for tips, do a `man sshd_config` and grep for AllowUsers and AllowGroups for more in-depth information). Unfortunately, it's predictable enough to figure that people that are gonna whine about AD integration giving away the farm are gonna bitch if you tell them they have to modify the configuration of each and ever service they want to protect. No, most people want to be able to go to one place and take care of things with on action or one set of consistent actions. I can't blame them: I feel the same way. Everyone wants things done easily. Part of "easily" generally implies "consistently" and/or "in one place".

Fortunately, any good UNIX or Linux implementation leverages the Pluggable Authentication Management system (aka. PAM). There's about a bazillion different PAM modules out there that allow you to configure any given service's authentication to do or test a similar variety of attributes. My assumption for solving this particular issue was that, while there might be dozens or hundreds of groups (and thousands of users) in an Active Directory forrest, one would only want to grant a very few groups access to an AD-bound UNIX/Linux host. So, I wasn't looking for something that made it easy to grants lots of groups access in one swell-foop. In fact, I was kind of looking for things that made that not an easy thing to do (after all, why lock stuff down if you're just going to blast it back open, again?). I was also looking for something that I could fairly reliably find on generic PAM implementations. The pam_succeed_if is just about tailor made for those requirements.

LikeWise (and the other AD integration methods) add entries into your PAM system to allow users allowed by those authentication subsystems to login, pretty much, unconditionally. Unfortunately, those PAM modules don't often include methods for controlling which users are able to login once their AD authentication has succeeded. Since the PAM system uses a stackable authentication module, you can insert access controls earlier into the stack to cause a user access to fail out earlier than the AD module would otherwise grant the access. If you wanted to be able to allow users in AD_Group1 and AD_Group2 to log in, but not other groups, you'd modify your pam stack to insert the control ahead of the AD allow module.

     account    [default=ignore success=1] pam_succeed_if.so user ingroup AD_Group1 quiet_success
     account    [default=ignore success=1] pam_succeed_if.so user ingroup AD_Group2 quiet_success
     account    [default=bad success=ignore] pam_succeed_if.so user ingroup wheel quiet_success
     account    sufficient    /lib/security/pam_lsass.so

The above is processed such that if a user is a member of the AD-managed group "AD_Group1" or "AD_Group2", it sets the test's success flag to true. If the user isn't a member of those two groups, testing falls through to the next group check - is the user a member of the group wheel (if yes, fall through to the next test; if no, then there's a failure and the user's access is denied). Downside of using this particular PAM module is that it's only availble to you on *N*X systems with a plethora of PAM modules. This is true for many Linux releases - and I know it to be part of RedHat-related releases - but probably won't be available on less PAM-rich *N*X systems (yet one more reason to cast Solaris on the dung-heap, frankly). If your particular *N*X system doesn't have it, you can probably find the sourcecode for it and create yourself the requisite model for your OS.

Monday, May 2, 2011

Vanity Linux Servers and SSH Brute-Forcers

Let me start by saying, that, for years (think basically since OpenSSH became available) I have run my personal, public-facing SSH services relatively locked-down. No matter what the default security posture for the application was - whether compiled from source or using the host operating systems defaults - the first things I did was to ensure that PermitRootLogin was set to "no". I used to allow tunneled clear-text passwords (way back in the day), but even that I've habitually disabled for (probably) a decade, now. In other words, if you want to SSH into one of my systems, you had to do so as a regular user and you had to do it using key-based logins. Even if you did manage to break in as one of those non-privileged users, I used access controls to limit which users could elevate privileges to root.

Now, I never went as far as changing the ports my SSH servers listened on. This always seemed kind of pointless. I'm sure there's plenty of script kiddies whose cracking-scripts don't look for services running on alternate ports, but I've never found much value relying on "security by obscurity".

At any rate, I figured this was enough to keep me basically safe. And, to date, it seems to have. That said, I do periodically get annoyed at seeing my system logs filled with the "Too many authentication failures for root" and "POSSIBLE BREAK-IN ATTEMPT" messages. However, most of the solutions to such problems seemed to be log-scrapers that then blacklisted the attack sources. As I've indicated in prior posts, I'm lazy. Going through the effort of setting up log-scrapers and tying them to blacklisting scripts was more effort than I felt necessary to address something that seemed, primarily to be only a nuisance. So, I never bothered.

I've also been a longtime user of tools like PortSentry (and its equivalents). So, I usually picked up attacks before they got terribly far. Unfortunately, as Linux has become more popular, there seems to be a lot more service-specific attacks and less broad-spectrum attacks (attacks preceded by probing of all possible entry points). Net result: I'm seeing more of the nuisance alerts in my logs.

Still, there's that laziness thing. Fortunately, I'd recently sat through a RedHat training class. And, while I was absolutely floored when the instructor told me that even RHEL 6 still ships with PermitRootLogin set to "yes", he let me know that recent RHEL patch levels included iptables modules that made things like fail2ban somewhat redundant. Unfortunately, he didn't go into any further detail. So, I had to go and dig around for how to do it.

Note: previously, I'd never really bothered with using iptables. I mean, for services that don't require Internet-at-large access, I'd always used things like TCPWrappers or configuring to only listen on loopback or domain sockets to prevent exposing the services. Thus, with my systems, the only Internet-reachable ports were the ones that had to be. There never really seemed to be a point in enabling a local firewall when the system wasn't acting as a gateway to other systems. However, the possibility of leveraging iptables in a useful way kind of changed all that.

Point of honesty, here: the other reason I'd never bothered with iptables was that its syntax was a tad arcane. While I'd once bothered to learn the syntax for ipfilter - a firewall solution with similarly arcane syntax - so that I could use a Solaris-based system as a firewall for my house, converting my ipfilter knowledge to iptables didn't seem worth the effort.

So, I decided to dig into it. I read through manual pages. I looked at websites. I dug through my Linux boxes netfilter directories to see if I could find the relevant iptables modules and see if they were internally documented. Initially, I thought the iptables module my instructor had been referring to was the ipt_limit module. Reading up on it, the ipt_limit module looked kind of nifty. So, I started playing around with it. As I played with it (and dug around online), I found there was an even better iptables module, ipt_recent. I now assume the better module was the one he was referring to. At any rate, dinking with both, I eventually set about getting things to a state I liked.

First thing I did, when setting up iptables was decided to be a nazi about my default security stance. That was accommodated with one simple rule: `iptables -P INPUT DROP`. If you start up iptables with no rules, you get the equivalent default INPUT filter rule of `iptables -P INPUT ACCEPT`. I'd seen some documentation where people like to set there's to `iptables -P INPUT REJECT`. I like "DROP" better than "REJECT" - probably because it suits the more dickish side of me. I mean, if someone's going to chew up my systems resources by probing me or attempting to break in, why should I do them the favor of telling their TCP stack to end the connection immediately? Screw that: let their TCP stack send out SYNs and be ignored. Depending on whether they've cranked down their TCP stack, those unanswered SYNs will mean that they will end up with a bunch of connection attempts stuck in a wait sequence. Polite TCP/IP behavior says that, when you send out a SYN, you wait for an ACK for some predetermined period before you consider the attempt to be failed and execute your TCP/IP abort and cleanup sequence. That can be several tens of seconds to a few hours. During that interval, the attack source has resources tied up. If I sent a REJECT, they could go into immediate cleanup, meaning they can more quickly move onto their next attack with all their system resources freed up.

The down side of setting your default policy to either REJECT or DROP is that it applies to all your interfaces. So, not only will your public-facing network connectivity cease, so will your loopback traffic. Depending on how tightly you want to secure your system, you could bother to iterate all of the loopback exceptions. Most people will probably find it sufficient to simply set up the rule `iptables -A INPUT -i lo0 -j ACCEPT`. Just bear in mind that more wiley attackers can spoof things to make it appear to come through loopback and take advantage of that blanket exception to your DROP or REJECT rules (though, this can be mitigated by setting up rules to block loopback traffic that appears on your "real" intefaces - something like `-A INPUT -i eth0 -d 127.0.0.0/8 -j DROP` will do it).

The next thing you'll want to bear in mind with the defualt REJECT or DROP is that, without further fine-tuning, it will apply to each and every packet hitting that filterset. Some TCP/IP connections start on one port, but then get moved off to or involve other ports. If that happens, your connection's not gonna quite work right. One way to work around that, is to use a state table to manage established connections or related connections. Use a rule like `iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT` to accommodate that.

At this point you're ready to start punching the service-specific holes in your default-deny firewall. On a hobbyist or vanity type system, you might be running things like DNS, HTTP(S), SMTP, and IMAP. That will look like:

-A INPUT -p udp -m udp --dport 53 -j ACCEPT		# DNS via UDP (typically used for individual DNS lookups)
-A INPUT -p tcp -m tcp --dport 53 -j ACCEPT		# DNS via TCP (typically used for large zone transfers)
-A INPUT -p tcp -m tcp --dport 80 -j ACCEPT		# HTTP
-A INPUT -p tcp -m tcp --dport 443 -j ACCEPT		# HTTP over SSL
-A INPUT -p tcp -m tcp --dport 25 -j ACCEPT		# SMTP
-A INPUT -p tcp -m tcp --dport 587 -j ACCEPT		# SMTP submission via STARTTLS
-A INPUT -p tcp -m tcp --dport 993 -j ACCEPT		# IMAPv4 + SSL

What the ipt_limits module gets you is the ability to rate-limit connection attempts to a service. This can be a simple as ensuring that only "so many connections" per second are allowed access to the service, limiting the number of connections per time interval per source or outright blacklisting a source that too frequently connects.

Doing the first can be done within the SSH and/or TCP Wrappers (or, for services run through xinetd, through your xinetd config). Downside of this is, since it's not distinguishing sources, if you're being attacked, you won't be able to get in since the overall number of connections will have been exceeded. Generally, potentially allowing others to lock you out of your own system is considered to be "not a Good Thing™ to do". But, if you want to risk it, add a rule that looks something like `-A INPUT -m limit --limit 3/minute -m tcp -p tcp --dport 22 -j ACCEPT` to your iptables configuration and be on about your way (using the ipt_limit module).

If you want to be a bit more targeted in your approach, the ipt_recent module can be leveraged. I used a complex of rules like the following:

-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -m recent --set --name sshtrack --rsource
   -A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 3 --name sshtrack --rsource -j LOG --log-prefix "ssh rejection: "
   -A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update  --seconds 60 --hitcount 3 --name sshtrack --rsource -j DROP
   -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT

What the above four rules do is:

For each new connection attempt to port 22, add the remote source address to the "sshtrack" tracking table
If this is the third such new connection within 60 seconds, update the remote source address entry in the tracking table and log rejection action
If this is the third such new connection within 60 seconds, update the remote source address entry in the tracking table and DROP the connection
Otherwise, accept the new connection.

I could have chosen to simply "rcheck" rather than "update" the "sshtrack" table. However, by using "update", it essentially resets the time to live from the last connect attempt packet to whatever might be the next attempt. This way, you get the full sixty second window rather than (60 - ConnectInterval). If it becomes apparent that attackers start to use slow attacks to get past the rule, one can up the "seconds" from 60 to some other value. I chose 60 as a start. It might be reasonable to up it to 300 or even 900 since it's unlikely that I'm going to want to start more than three SSH sessions to the box within a 15 minute interval.

As a bit of reference: on RHEL-based systems, you can check what iptables modules are available by listing out '/usr/include/linux/netfilter_ipv4/ipt_*'. You can then (for most) use `iptables -m [MODULE] --help` to show you the options for a given module. For example:

# iptables -m recent --help | sed -n '/^recent v.*options:/,$p'
recent v1.3.5 options:
[!] --set Add source address to list, always matches.
[!] --rcheck Match if source address in list.
[!] --update Match if source address in list, also update last-seen time.
[!] --remove Match if source address in list, also removes that address from list.
--seconds seconds For check and update commands above.
Specifies that the match will only occur if source address last seen within
the last 'seconds' seconds.
--hitcount hits For check and update commands above.
Specifies that the match will only occur if source address seen hits times.
May be used in conjunction with the seconds option.
--rttl For check and update commands above.
Specifies that the match will only occur if the source address and the TTL
match between this packet and the one which was set.
Useful if you have problems with people spoofing their source address in order
to DoS you via this module.
--name name Name of the recent list to be used. DEFAULT used if none given.
--rsource Match/Save the source address of each packet in the recent list table (default).
--rdest Match/Save the destination address of each packet in the recent list table.
ipt_recent v0.3.1: Stephen Frost . http://snowman.net/projects/ipt_recent/

Gives you the options for the "recent" iptables module and a URL for further infomation lookup.

Friday, April 22, 2011

Automating Yum Setup

Recently, as part of my employer's "employee development" programs, I took Red Hat's RH255 class so that I could prep for getting my RHCSA and RHCE (my opinions on the experience are grist for another post - and I don't know, yet, whether that post should be in this blog or one of my personal blogs). The RH255 class I took was based on RHEL 6. In the US, they moved to training based on RHEL 6.0 about six months ago. One of the interesting things (I thought) my instructor said was that RHEL 6.0 included a function that tried to automatically configure `yum` to take mounted CDROMs and ISOs and treat them as installation repositories.

I may have misheard or misinterpreted what he said. It may also be a case that, since my instructor is in the RHEL 6.1 beta program, he was referring to a feature in RHEL 6.1 and not RHEL 6.0. Whatever the case may be, popping an RHEL 6.0 DVD (or ISO) into and RHEL 6.0 machine does not (yet) cause that DVD to be automatically included in `yum`'s repository search. Fortunately, the RHEL 6.x media has been set up (prior RHELs may also have been, it's just been so long since I've built an RHEL 5 system from other than an automated-build system) so that it's easy enough to automatically set up the DVD or ISO for inclusion in `yum`'s seaches. All of the yum repository is already on the media. So, you needn't muck about with doing the createrepo stuff (and the time that running createrepo against a multi-gigabyte DVD can take). Below is the script I wrote to take this already-present data and make it available to `yum`:

#!/bin/sh

MNTDIR=${1:-UNDEF}

# Make sure we passed a location to check

if [ ${MNTDIR} == "UNDEF" ]

then

echo "Usage: $0 <ISO Mount Location>"

exit

# Make sure it's actually a directory

if [ ! -d ${MNTDIR} ]

then

echo "No such directory"

exit

#ID directories containing repository data

REPODIRS=$(find ${MNTDIR} -type d -name repodata)

if [ "${REPODIRS}" == "" ]

then

echo "No repository data found"

exit

for DIR in ${REPODIRS}

DIRNAME=$(echo ${DIR} | sed -e 's#'${MNTDIR}'/##' -e 's#/repodata##')

BASEURL=$(dirname ${DIR})

echo "[${DIRNAME}]"

echo "baseurl=${BASEURL}"

echo "enabled=1"

echo

done

The above script takes a directory location (presumably where you mounted your DVD or ISO) and produces yum repository configuration output. I opted to have it output as a capturable stream rather than as a file. That way, I'd have to option of either capturing to a temp location or directly into /etc/yum.repos.d. Doing so gave more flexibility in what to name a file (so that I didn't clobber any existing files). Granted, I could have parameterized the name and location of the output file, but chose not to. If you want to use the above and would rather have the output go directly to an output file in /etc/yum.repos.d (or elsewhere), it's a trivial modification to the above. Have at it.

At any rate, once you run the above and take the resultant output and move it into a file in /etc/yum.repos.d, subsequent invocations of the `yum` command should now include your DVD or ISO. If it fails to do so (or errors result) it's probably because the new .repo file contains section IDs that collide with exiting ones. Just edit your new file to eliminate any collisions.

While I've done some basic error-checking in the file (specifically: ensuring the user passed a location and that the location is a valid directory), I didn't write it to gracefully handle repository directories that have spaces in their path-names. It's also fairly trivial to fix the script to accommodate it. However, it's probably even easier to just make sure you don't mount your DVD or ISO with spaces in the path-names. I'm lazy, so, I'm choosing the latter option and avoiding the coding exercise. If you find you need to have spaces in your path-names and want to use my script, feel free to make the necessary modifications to accommodate your use of space-containing path names.