Titular Discrepancy: Storage

Showing posts with label Storage. Show all posts

Wednesday, May 2, 2018

"How Big Is It," You Ask?

For one of the projects I'm supporting, they want to deploy into an isolated network. However, they want to be able to flexibly support systems in that network being able to fetch feature and patch RPMs (and other stuff) as needed. Being on an isolated network, they don't want to overwhelm the security gateways by having a bunch of systems fetching directly from the internet. Result: a desire to host a controlled mirror of the relevant upstream software repositories.

This begged the question, "what all do we need to mirror." I stress the word "need" because this customer is used to thinking in terms of manual rather than automated efforts (working on that with them, too). As a result of this manual mind-set, they want to keep data-sets around tasks small.

Unfortunately, their idea of small is also a bit out of date. To them, 100GiB — at least for software repositories — falls into the "not small" category. To me, if I can fit something on my phone, it's a small amount of data. We'll ignore the fact that I'm biased by the fact that I could jam a 512GiB micro SD card into my phone. At any rate, they were wanting to minimize the number of repositories and channels they were going to need to mirror so that they could keep the copy-jobs small. I pointed out "you want these systems to have a similar degree of fetching-functionality to what they'd have if they weren't being blocked from downloading directly from the Internet: that means you need <insert exhaustive list of repositories and channels>". Naturally, they balked. The questions "how do you know they'll actually need all of that" and "how much space is all of that going to take" were asked (with a certain degree of overwroughtness).

I pointed out that I couldn't meaningfully answer the former question because I wasn't part of the design-groups for the systems that were to be deployed in the isolated network. I also pointed out that they'd probably be better asking the owners of the prospective systems what they'd anticipate needing (knowing full well that, usually, such questions' answers are somewhere in the "I'm not sure, yet" neighborhood). As such, my exhaustive list was a hedge: better to have and not need than to need and not have. Given the stupid-cheapness of storage and that it can actually be easier to sync all of the channels in a given repository vice syncing a subset, I didn't see a a benefit to not throw storage at the problem.

To the second question, I pointed out, "I'm sitting at this meeting, not in front of my computer. I can get you a full, channel-by-channel breakdown once I get back to my desk.

One of the nice things about yum repositories (where the feather and patch RPMs would come from) is that they're easily queryble for both item-counts and aggregate sizes. Only down side is that the OS-included tools for doing so are more for human-centric, ad hoc queries rather than something that can be jammed into an Excel spreadsheet and =SUM formulae being run. In other words, sizes are put into "friendly" units: if you have 100KiB of data, the number is reported in KiB; if you have 1055KiB of data, the number is reported in MiB; and so on. So, I needed to "wrap" the native tools output to put everything into consistent units (which Excel prefers for =SUMing and other mathematical actions). Because it was a "quick" task, I did it in BASH. In retrospect, using another language likely would have been far less ugly. However, what I came up with worked for creating a CSV:

#!/bin/bash

for YUM in $(
   yum repolist all | awk '/abled/{ print $1}' | \
      sed -e '{
         /-test/d
         /-debuginfo/d
         /-source/d
         /-media/d
      }'  | sed 's/\/.*$//'
   )
do
IFS=$'
'
   REPOSTRUCT=($(
      yum --disablerepo=* --enablerepo=${YUM} repolist -v | \
      grep ^Repo- | grep -E "(id|-name|pkgs|size) *:" | \
      sed 's/ *: /:/'
   ))
   unset IFS

   REPSZ=($(echo "${REPOSTRUCT[3]}" | sed 's/^Repo-size://'))

   if [[ $( echo "${REPSZ[1]}" ) = M ]]
   then
      SIZE=$(echo "${REPSZ[0]} * 1024" | bc)
   elif [[ $( echo "${REPSZ[1]}" ) = G ]]
   then
      SIZE=$(echo "${REPSZ[0]} * 1024 * 1024 " | bc)
   else
      SIZE="${REPSZ[0]}"
   fi
 
   for CT in 0 1 2
   do
      printf "%s;" "${REPOSTRUCT[${CT}]}"
   done
   echo ${SIZE}
done | sed 's/Repo-[a-z]*://'

Yeah... hideous and likely far from optimal ...but not worth my time (even if I had it) to revisit. It gets the job done and there's a certain joy to writing hideous code to solve a problem you didn't want to be asked to solve in the first place.

At any rate, checking against all of the repos that we'd want to mirror for the project, the initial-sync data-set would fit on my phone (without having to upgrade to one of top-end beasties). Pointing out that the only the initial sync would be "large" and that only a couple of the channels updated with anything resembling regularity (the rest being essentially static), the monthly delta-sync would be vastly smaller and trivial to babysit. So, we'll see whether that assuages their anxieties or not.

Thursday, May 7, 2015

EXTn and the Tyranny of AWS

One of the organizations I provide consulting services to opted to start migrating from an in-house, VMware-based virtualization solution to an Amazon-hosted cloud solution. The transition has been somewhat fraught, here and there - possibly moreso than the prior transition from physical (primarily Solaris-based) servers to virtualized (Linux) servers.

One of the huge "problems" is that the organization's various sub-units have habits formed of decade or longer lifecycles. Staff are particularly used to being able to get on console for various things (using GUI-based software-installers, graphical IDEs as well as tasks that actually require console access - like recovering a system stuck in its startup sequence).

For all that AWS offers, console access isn't one of them. Amazon's model for how customers should deploy and manage systems means that they don't figure that console access is strictly necessary.

In a wholly self-service model, this is probably an ok assumption. Unfortunately, the IT model that the organization in question is moving to doesn't offer instance-owners true self-service. They're essentially trying to transport an on-premises, managed multi-tenancy model into AWS. The model they're moving from didn't have self-service, so they're not interested in enabling self-service (at least not during phase one of the move). Their tenants not only don't have console access in the new environment, they don't have the ability to execute the AWS-style recovery-methods you'd use in the absence of console access. The tenants are impatient and the group supporting them is small, so it's a tough situation.

The migration has been ongoing for a sufficiently long period of time that the default `mkfs` behavior for EXT-based filesystems is starting to rear its head. Being well beyond the 180 day mark since the inception of the migration, tenants are finding that, when their Linux instances reboot, they're not coming back as quickly as they did towards the beginning of the migration ...because their builds still leave autofsck enabled.

If you're reading this, you may have run into similar issues.

The solution to this, while still maintaining the spirit of the "fsck every 180 days or so" best practices for EXTn-based filesystems is fairly straight forward:

Disable the autofsck settings on your instances' EXTn filesystems: use `tune2efs -i 0 -c -1 /dev/<DEVNODE>`
Schedule periodic fsck "simulations". This can be done either by running fsck in "dryrun" mode or by doing an fsck of a filesystem metadata image.

The "dryrun" method is fairly straight forward: just run fsck with the "-N" option. I'm not super much a fan of this as it doesn't feel like it gives me the info I'm looking for to feel good about the state of my filesystems.

The "fsck of a filesystem metadata image" is pretty straight forward, automatable and provides a bit more on the "warm fuzzies" side of thing. To do it:

Create a metadata image file using `e2image -fr /dev/<DEVNODE> /IMAGE/FILE/PATH` (e.g. `e2image -fr /dev/RootVG/auditVol /tmp/auditVol.img`)
Use `losetup` to create an fsck'able block-device from the image file (e.g., `losetup /dev/loop0 /tmp/auditVol.img`)

Execute an fsck against the loopback device (e.g., `fsck /dev/loop0`). Output will look similar to the following:

# fsck /dev/loop0
fsck from util-linux-ng 2.17.2
e2fsck 1.41.12 (17-May-2010)
/dev/loop0: recovering journal
/dev/loop0: clean, 13/297184 files, 56066/1187840 blocks

If the output indicates anything other than good health, schedule an outage to do a proper repair of your live filesystem(s)

Granted, if you find you need to do the full check of the real filesystem(s), you're still potentially stuck with the "no console" issue. Even that is potentially surmountable:

Create a "/forcefsck" file
Create a "/fsckoptions" file with the contents "-sy"
Schedule your reboot

When the reboot happens, depending how long the system takes to boot, the EC2 launch monitors may time out: just be patient. If you can't be patient, just monitor the boot logs (either in the AWS console or using the AWS CLI's equivalent option).

Thursday, March 26, 2015

Could You Make It Just A Little More Difficult?

Every so often, you need to share out an HTML file via a public web-server. Back in the pre-cloud days, this meant that you'd toss it up on your vanity web server and call it a day. In the cloud era, you have other options. Amazon's S3 was the first one that I used, but other cloud-storage services can be leveraged in a similar manner for static content.

One of those others is Google's "Drive" service. Unfortunately, Google doesn't exactly seem to want to make it a straight forward affair to share static web content straight from drive. It's easy if you want viewers to see the raw HTML, but not so great if you want them to see rendered HTML.

At any rate, as of the writing of this document (warning: judging by my Google results, the method seems to change over time and even Google's own help pages aren't really kept up to date), this was what I had to do:

Create a new folder in Google Drive (optional: if you're willing to make your top-level gDrive folder publicly-browsable or already have a folder that's set - or you're willing to set - as publicly-browsable, you can skip this step)
Edit the sharing on the folder, setting the permission to "Public View"
Navigate into the folder. Take note of its path. It will look something like:

https://drive.google.com/drive/#folders/0F3SA-qkPpztNflU1bUtyekYYC091a2ttHZJpMElwTm9UcFNqN1pNMlf3iUlTUkJ0UU5PUVk
Take the left part of the URL, up to and including "/#folders/" and nuke it
Replace the deleted part of the original URL and replace it with:

http://www.googledrive.com/host/
The browsable URL to your publicly-viewable folder will now look like:

http://www.googledrive.com/host/0F3SA-qkPpztNflU1bUtyekYYC091a2ttHZJpMElwTm9UcFNqN1pNMlf3iUlTUkJ0UU5PUVk
Clicking on that link will take you to the folder holding your static web content. To get the sharable URL for your file(s), click on the link to the file.
When the file opens in your browser, copy the URL for the file, then send it out (for the sake of your recipients' sanity, you might want to pump it through a URL-shorter service, first - like maybe goo.gl)

In my case, I was trying to collaborate on a system-hardening toolset. I'd run my system through a security scanner that had flagged a number of false findings (actually, all the "fail" findings turned out to be bogus). I wanted to share the report with him and the rules files that the security tool had reported against. So, I sorted out the above so I could post links into our collaboration tool.

Maybe one day Google will make sharing static web content from Drive as (relatively) easy as Amazon has with S3. A "share as web page" button sure would be nice.

Tuesday, June 17, 2014

UDEV Friendly-Names to Support ASM Under VMware-hosted Linux Guest

This past month or so, we've been setting up a new vSphere hosting environment for a new customer. Our first guinnea-pig tenant is being brought into the virtualized hosting-environment. This first tenant has a mix of Windows and Linux systems running a multi-layer data-processing system based on a back-end Oracle database.

As part of our tenancy, process, we'd come up with a standard build request form. In general, we prefer a model that separates application data from OS data. In addition to the usual "how much RAM and CPU do you need" information, the form includes configuration-capture items for storage for applications hosted on the VMs. The table has inputs for both the requested supplemental storage sizes and where/how to mount those chunks.

This first tenant simply filled in a sum of their total additional storage request with no indication as to how they expected to use it. After several iterations of "if you have specific requirements, we need you to detail them" emails, I sent a final "absent the requested configuration specifications, the storage will be added but left unconfigured". It was finally at this point that the tenant responded back saying "there's a setup guide at this URL - please read that and configure accordingly".

Normally, this is not how we do things. The solution we offer tends to be more of a extended IaaS model: in addition to providing a VM container, we provide a basic, hardened OS configuration (installing an OS and patching it to a target-state) configure basic networking and name-resolution and perform basic storage configuration tasks.

This first tenant was coming from a physical Windows/Red Hat environment and were testing the waters of virtualization. As a result, most of their configuration expectations were based on physical servers (SAN based storage with native multipathing-support). The reference documents they pointed us to were designed for implementing Oracle on a physical system using ASM on top of Linux dm-multipath storage objects ...not something normally done within an ESX-hosted Red Hat Linux configuration.

We weren't going to layer-on dm-multipath support, but the tenant still had the expectation of using "friendly" storage object names for ASM. The easy "friendly" storage object name path is to use LVM. However, Oracle generally recommends against using ASM in conjunction with third-party logical volume management systems. So, LVM was off the table. How best to give the desired storage configs?

I opted to let udev do the work for me. Unfortunately, because we weren't anticipating this particular requirements-set, the VM templates we'd created didn't have some of the hooks available that would allow udev to do its thing. Specifically, no UUIDs were being presented into the Linux guests. Further complicating things is the fact that, with the hardened Linux build we furnish, most of the udev tools and the various hardware information tools are not present. Down side is that it made things more difficult than they probably absolutely needed to be. The up side is the following procedures should be portable across a fairly wide variety of Linux implementations:

To have VMware provide serial number information - from which UUIDs can be generated by the guest operating system, it's necessary to make a modification to the VM's advance configuration options. Ensure that the “disk.EnabledUUID” has been created for the VM and the value set to “TRUE”. Specific method for doing so varies depending on whether you use the vSphere web UI or the VPX client (or even the vmcli or direct editing of config files) to do your configuration tasks. Google for the specifics of your preferred management method.
If the you had to create/change the value in the prior step, reboot the VM so that the config changes take effect
Present the disks to be used by ASM to the Linux guest – if adding SCSI controllers, this step will need to be done while guest is powered off.
Verify that VM is able to see new VMDKs. If suplemental disk presentation was done while the VM was running, initiate a SCSI-bus rescan (e.g., `echo "- - -" > /sys/class/scsi_host/host1/rescan`)
Lay :down an aligned, full-disk partition with the `parted` utility for each presented VMDK/disk. For example, if one of the newly-presented VMDKs was seen by the Linux OS as /dev/sdb:

# parted –s /dev/sdb -- mklabel msdos mkpart primary ext3 1024s 100%

Specifying an explicit starting-block (at 1024 or 2048 blocks) and using the relative ending-location, as above, will help ensure that your partition is created on an even storage-boundary. Google around for discussions on storage alignment and positive impact on virtualization environments for details on why the immediately-prior is usually a Good Thing™.
Ensure that the “options=” line in the “/etc/scsi_id.config” file contains the “-g” option
For each newly-presented disk, execute the command `/sbin/scsi_id -g -s /block/{sd_device}` and capture the output.
Copy each disk’s serial number (obtained in the prior step) is copied into the “/etc/udev.d/rules.d/99-oracle-udev.rules” file
Edit the “/etc/udev.d/rules.d/99-oracle-udev.rules” file, ensuring that each serial number has an entry similar to:

KERNEL=="sd*",BUS=="scsi",ENV{ID_SERIAL}=="{scsi_id}", NAME="ASM/disk1", OWNER="oracle", GROUP="oinstall", MODE="660"

The "{scsi_id}" shown above is a variable name: substitute with the values previously captured via the `/sbin/scsi_id` command. The "NAME=" field should be similarly edited to suite and should be unique for each SCSI serial number.

Note: If attempting to make per disk friendly-names (e.g., “/dev/db1p1”, “/dev/db2p1”, “/dev/frap1”, etc.) it will be necessary to match LUNs by size to appropriate ‘NAME=’ entries
Reboot the system so that the udev service can process the new rule entries
Verify that the desired “/dev/ASM/<NAME>” entries exist
Configure storage-consumers (e.g., “ASM”) to reference the aligned udev-defined device-nodes.

If your Linux system has various hardware information tools, udev management interfaces and the sg3tools installed, some tasks for finding information are made much easier and some of the reboot-steps specified in this document become unnecessary.

Friday, July 8, 2011

CLARiiON Report Data Verification

Earlier this year, the organization I work for decided to put into production an enterprise-oriented storage resource management (SRM) system. The tool we bought is actually pretty cool. We install collectors into each of our major data centers and they pull storage utilization data off of all of our storage arrays, SAN switches and storage clients (you know: the Windows and UNIX boxes that use up all that array-based storage). Then, all those collectors pump out the collected data to a reporting server at our main data center. The reporting server is capable of producing all kinds of nifty/pretty reports: configuration snapshots, performance reports, trending reports, utilization profiles, etc.

As cool as all this is, you have the essential problem of "how do I know that the data in all those pretty reports is actually accurate?" Ten or fifteen years ago, when array-based storage was fairly new and storage was still the realm of systems administrators with coding skills, you'd ask you nearest scruffy misanthrope, "could you verify the numbers on this report," and get an answer back within a few hours (and then within minutes each subsequent time you asked). Unfortunately, in the modern, GUI-driven world, asking your storage guys to verify numbers can be like pulling teeth. Many modern storage guys aren't really coders and frequently don't know the quick and easy way to get you hard numbers out of the devices they manage. In some cases, you may watch them cut and paste from the individual array's management web UIs into something like MicroSoft Calculator. So, you'll have to wait and, often times, you'll have to continually prod them for the data because it's such a pain in the ass for them to produce.

With our SRM rollout, I found myself in just such a situation. Fortunately, I've been doing Unix system adminstration for the best part of 20 years and, therefore, am rather familiar with scripting. I frequently wish I was able to code in better reporting languages, but I just don't have the time to keep my "real" coding skills up to par. I'm also not terribly patient. So, after waiting a couple weeks for our storage guys to get me the numbers I'd asked for, I said to myself, "screw it: there's gotta be a quicker/better way."

In the case of our CLARiiONs, that better way was to use the NaviCLI (or, these days, the NaviSECCLI). This is a tool set that has been around a looooooong time, in one form or another, and has been available for pretty much any OS that you might attach to a CLARiiON as a storage client. These days, it's a TCP/IP-based commandline tool - prior to NaviCLI, you either had platform-specific tools (IRIX circa 1997 had a CLI-based tool that did queries through the SCSI bus to the array) or you logged directly into the array's RS232 port and used its onboard tools (hopefully, you had a terminal or terminal program that allowed you to capture output) ...but I digress.

If you own EMC equipment, you've hopefully got maintenance contracts that give you rights to download tools and utilities from the EMC support site. NaviCLI is one such tool. Once you install it, you have a nifty little command-line tool that you can wrap inside of scripts. You can create these scripts to both provisioning tasks and reporting tasks. My use, in this case, was reporting.'

The SRM we bought came with a number of canned-reports - including ones for CLARiiON devices. Unfortunately, the numbers we were getting from our SRM were indicating that we only had about 77TiB on one of our arrays when the EMC order sheets said we should have had about 102TiB. That's a bit of a discrepancy. I was able to wrap some NaviCLI commands into a couple scripts (one that reported on RAID-group capacity and one that reported physical and logical disk capacities [ed.: please note that these scripts are meant to be illustrative of what you can do, but aren't really something you'd want to have as the nexus of your enterprise-reporting strategy. They're slow to run, particularly on larger arrays]) and verify that the 77TiB was sort of right and that the 102TiB was also sorta right. The group capacity script basically just spits out two numbers - total raw capacity and total capacity allocatable to clients (without reporting on how much of either is already allocated to clients). The disk capacity script reports how the disks are organized (e.g., RAID1, RAID5, Spare, etc.) - printing total number of disks in each configuration category and how much raw capacity that represented. Basically, the SRM tool was reporting the maximum number of blocks that were configured into RAID groups, not the total raw physical blocks in the array that we'd thought it was supposed to report.

Having these number in hand allowed us to tear apart the SRM's database queries and tables so that we could see what information it was grabbing, how it was storing/organizing it and how to improve on the vendor-supplied standard reports. Mostly, it consisted of changing the titles of some existing fields and adding some fields to the final report.

Yeah, all of this begs the question "what was the value of buying an SRM when you had to reverse-engineer it to make the presented data meaningful?" To be honest, "I dunno." I guess, at the very least, we bought a framework through which we could put together pretty reports and ones that were more specifically meaningful to us (though, to be honest, I'm a little surprised that we're the only customers of the SRM vendor to have found the canned-reports to be "sadly lacking"). It also gave me an opportunity to give our storage guys a better idea of the powerful tools they had available to them if only they were willing to dabble at the command line (even on Windows).

Still the vendor did provide a technical resource to help us get things sorted out faster than we might have done without that assistance. So, I guess that's something?

Tuesday, April 19, 2011

Handling Resized LUNs on RHEL 5

Perhaps it's a reflection of it being nearly two years since touching a non-Linux *NIX operating system, but I feel like I recall Solaris, Irix, HP/UX and AIX all handling resized LUNs far more gracefully than Linux currently seems to. I seem to recall that, if you resized a LUN on your storage array and presented it to your *N*X host, it was fairly non-disruptive to make that storage available to the filesystems residing on top of the LUN. Maybe I'm mis-remembering, but I can't help but feel that Linux is still showing it's lack of Enterprise-level maturity in this area.

Don't get me wrong, you can get Linux to make use of re-sized LUNs without having to reboot the box. So, that's definitely a good start. But, thus far, I haven't managed to tangle loose a method that doesn't require me to unmount a filesystem (and, if I'm using LVM or multipathd, having to disrupt them, as well).

That said, if you've ended up on this page, it's probably because you're trying to figure out "how do I get RedHat to make use of the extra space on the LUN my SAN administrator grew for me" and Google (et. al.) sent you here.

In sorting this issue out, it seems like the critically-deficient piece of the puzzle is the way in which Linux updates device geometry. As near as I can tell, it doesn't really notice geometry changes by itself, and, the tools available for making it see geometry changes aren't yet optimized for fully on-the-fly configuration changes. But, at least they do provide a method that saves you the several minutes that a physical host reboot can cost you.

In digging about an playing with my test system, what I've come up with is a workflow something like the following:

Unmount any filesystems residing on the re-configured LUN (`umount`)
Stop any logical volumes that are currently active on the LUN (`vgchange`)
Nuke out any partitions that reference the last block of the LUN's previous geometry (`fdisk` for this)
Tell linux to reread the geometry info for the LUN (`blockdev --rereadpt` for this)
Re-create the previously-nuked partition, referencing the new ending-block (typically `fdisk` - particularly if you have to do block offsets for your SAN solution - for this and add `kpartx` if you're using a multipathing solution)
Grow any logical volume elements containing that re-created/grown partition (`pvresize` for this)
Grow and restart any logical volumes containing that re-created/grown partition (`vgchange` and `lvresize` for this)
Grow and remount any filesystems that were on the re-created/grown partition (`e2fsresize` for this - unless you're using an incompatible filesystem-type - and then `mount`)

Omitted from the list above, but should be inferred by the clueful reader is "stop and restart any processes that use the grown-LUN" (The `fuser` command is really helpful for this).

Obviously, if you're using something other than LVM for your logical volume managment (e.g., VxVM), the `vgchange`, `pvresize` and `lvresize` commands have to be replaced with the appropriate logical volume management system's equivalent commands.

At any rate, if anyone knows how I can call `blockdev --rereadpt` without needing to stop filesystems (etc.), please comment. I'm really trying to figure out the least disruptive way of accomodating resized LUNs and haven't quite got to where I think I should be able to get.