Titular Discrepancy: Solaris

Showing posts with label Solaris. Show all posts

Friday, February 8, 2013

The "Which LUN Is That" Game

One of the fun things about SAN storage has to do with its tendency to be multipathed. Different operating systems handle this differently - but most of the ones I've worked with tend to do what I refer to as "ghosting".

"Ghosting" is, by no means, a technical term. It's mostly meant to convey that, when an OS sees a device down multiple paths, it sees the device multiple times. While you may only have one actual chunk of storage presented from your SAN to your host, you might see that chunk 2, 3, 4 or more times in your storage listings. I call these "extra" presences in the output "ghosts".

At any rate, one of the joys associated with ghosting is determining, "am I seeing all of the LUNs I expect to see and am I seeing each of them an appropriate number of times. If you're only presenting one chunk of storage down four paths and you see four copies of the LUN show up, it's easy enough to say, "yep: things look right". It's even easy(ish) when you have multiple chunks of different-sized chunks of storage to determine if things are right: 1) you have PATHsxLUNs number of storage chunks visible; 2) each set of chunks is an identifiable size. However, if you're presenting multiple chunks that are the same size or multiple chunks with differing levels of pathing-redundancy, things get a bit trickier.

One of the things I'd sorta liked about Solaris's STMS drivers were, once you'd activated the software for a given LUN, the ghosts disappeared into a single meta-device. For better or worse, not everyone used STMS - particularly not on older Solaris releases or configurations that used things like EMC's PowerPath or VERITAS's DMP software. PowerPath actually kinda made the problem worse, as, in addition to the normal ghosts, you added a PowerPath meta-device for each group of LUN-paths. This made the output from the `format` command even longer.

All of that aside, how do you easily identify which disks are ghosts of each other and which ones are completely separate LUNS? The most reliable method I've found is looking for the LUNs' serial numbers. If you have eight storge chunks visible, four of which have one serial number and four of which have a different serial number, you know that you've got two LUNs presented to your host and that each is visible down four paths. But how do you get the serial numbers?

Disks' and LUNs' serial numbers are generally found in their SCSI inquiry responses in what's referred to as "code page 83". How you get to that information is highly OS dependent.

On Solaris - at least prior to Solaris 10 - you didn't generally have good utilities for pulling serial numbers from LUNs. If you wanted to pull that info, you'd have to fire up the `format` utility in "expert" mode, issue the command-sequence "scsi → inquire". By default, this dumps out code page 83 as part of the response. It dumps this info in two parts: a big, multi-line block of hex-codes and a smaller multi-line block of ASCII text. Your disk/LUN serial number is found by ignorg the big, multiline block of hex values and looking at the third line from the smaller ASCII block.
On Linux, they provide you a nice little tool that allows you to directly dump out the target SCSI inquiry code-page. Its default behavior is pretty much to dump just the serial number (actually, the serial number is embedded in a longer string, but if you send that string over to your SAN guys, they'll generally recognize the relevant substring and match it up to what they've presented to your host). The way you dump out that string is to use the command `scsi_id -ugs /block/sdX` (where "sdX" is something like "sda", "sdh", etc.).

At any rate, multi-pathing software and associated utilities aside, once you've determined which serial numbers correspond to which disk device-nodes, it because a trivial exercise to determine "am I seeing all of the LUNs I expect to see" and "are my LUNs presented down the expected number of SAN-fabric paths".

Note: if you're running Solaris 10 or an earlier Solaris release with appropriate storage device management packages installed, you may have access to tools like `prtpicl`, `luxadm` and `fcinfo` with which to pull similarly-useful pathing information.

Wednesday, September 7, 2011

NetBackup with Active Directory Authentication on UNIX Systems

While the specific hosts that I used for this exercise were all RedHat-based, it should work for any UNIX platform that both NetBackup 6.0/6.5/7.0/7.1 and Likewise Open are installed onto.

I'm a big fan of leveraging centralized-authentication services wherever possible. It makes life in a multi-host environment - particularly where hosts can number from the dozens to the thousands - a lot easier when you only have to remember one or two passwords. It's even more valuable in modern security environments where policies require frequent password changes (or, if you've even been through the whole "we've had a security incident, all the passwords on all of the systems and applications need to be changed, immediately" exercise). Over the years, I've used things like NIS, NIS+, LDAP, Kerberos and Active Directory to do my centralized authentication. If your primary platforms are UNIX-based, NIS, NIS+, LDAP and Kerberos have traditionally been relatively straight-forward to set up and use.

I use the caveat of "relatively" because, particularly in the early implementations of each service, things weren't always dead-simple. Right now, we seem to be mid-way through the "easiness" life-cycle of using Active Directory as a centralized authentication source for UNIX operating systems and UNIX-hosted applications. Linux and OSX seem to be leading the charge in the OS space for ease of integration via native tools. There's also a number of third-party vendors out there who provide commercial and free solutions to do it for you, as well. In our enterprise, we chose LikeWise, because, at the time, it was the only free option that also worked reasonably-well with our large and complex Active Directory implementation. Unfortunately, not all of the makers of software that runs on the UNIX hosts seem to have been keeping up on the whole "AD-integration within UNIX operating environment" front.

My latest pain in the ass, in this arena, is Veritas NetBackup. While Symantec likes to tout the value of NetBackup Access Control (NBAC) in a multi-administrator - particularly one where different administrators may have radically different NetBackup skill sets or other differentiating factors - using it in a mixed-platform environment is kind of sucktackular to set up. While modern UNIX systems have the PAM framework to make writing an application's authentication framework relatively trivial, Symantec seems to still be stuck in the pre-PAM era. NBAC's group lookup components appear to still rely on direct consultation of a server's locally-maintained group files rather than just doing a call to the host OS's authentication frameworks.

When I discovered this problem, I opened a support case with Symantec. Unfortunately, their response was "set up a Windows-based authentication broker". My NetBackup environment is almost entirely RedHat-based (actually, unless/until we implement BareMetal Restore (BMR) or other backup modules that require specific OSes be added into the mix, it is entirely RedHat-based). The idea of having to build a Windows server just to act as an authentication broker struck me as a rather stupid way to go about things. It adds yet another server to my environment and, unless I cluster that server, it introduces a single point of failure into and otherwise fairly resilient NetBackup design. I'd designed my NetBackup environment with a virtualized master server (with DRS and SRM supporting it) and multiple media servers for both throughput and redundancy

We already use LikeWise Open to provide AD-base user and group management service for our Linux and Solaris hosts. When I first was running NetBackup through my engineering process, using the old Java auth.conf method for login management worked like a champ. The Java auth.conf-based systems just assumes that any users trying to access the Java UI are users that are managed through /etc/passwd. All you have to do is add the requisite user/rights entries into the auth.conf file and Java treats AD-provided users the same as it treats locally-managed users. Because of this, I suspected that I could work around Symantec's authorization coding lameness.

After a bit of playing around with NBAC, I discovered that, so long as the UNIX group I wanted to map rights to existed in /etc/group, NBAC would see it as a valid, mappable "UNIX PWD" group. I tested by seeing if it would at least let me map the UNIX "wheel" group to one of the NBAC privilege groups. Whereas, even if I could look up the group via getent, if it didn't exist in /etc/group, NBAC would tell me it was an invalid group. Having already verified that a group's presence in /etc/group allowed NBAC to use a group, I proceded to use getent to copy my NetBackup-related groups out of Active Directory and into my /etc/group file (all you have to do is a quick `getent [GROUPNAME] >> /etc/group` and you've populated your /etc/group file).

Unfortunately, I didn't quite have the full groups picture. When I logged in using my AD credentials, I didn't have any of the expected mapped-privileges. I remembered that I'd explicitly emptied the userids from the group entries I'd added to my /etc/group file (I'd actually sed'ed the getents to do it ...can't remembery why, at this point - probably just anticipating the issue of not including userids in the /etc/group file entries). So, I logged out of the Java UI and reran my getent's - this time leaving the userids in place. I logged back into the Java UI and this time I had my mapped privileges. Eureka.

Still I wasn't quite done. I knew that, if I was going to roll this solution into production, I'd have to cron-out a job to keep the getent file up-to date with the changing AD group memberships. I noticed, while nuking my group entry out of getent, that only my userid was on the group line and not every member of the group. Ultimately, tracked it down to LikeWise not doing full group enumeration by default. So, I was going to have to force LikeWise to enumerate the group's membership before running my getent's.

I proceded to dig around in /opt/likewise/bin for likely candidates for forcing the enumeration. After trying several lw*group* commands, I found that doing a `lw-find-group-by-name [ADGROUP]` did the trick. Once that was run, my getent's produced fully-populated entries in my /etc/group file. I was then able to map rights to various AD groups and wrote a cron script to take care keeping my /etc/group file in sync with Active Directory.

In other words, I was able to get NBAC to work with Active Directory in an all RedHat environment and no need to set up a Windows server just to be an authentication broker. Overall, I was able to create a much lighter-weight, portable solution.

Tuesday, March 22, 2011

udev Abuse

I've probably mentioned before that I am a lazy systems administrator. So, I tend to like things be "self-documenting" and I like things to be as consistent as possible across platforms. I particularly like when a command that's basically common between two operating system versions gives me all of the same kinds of information - particularly if it's information that helps me avoid running multiple other commands.

I've also probably mentioned that, while I've managed a number of different UNIX and UNIX-like operating systems, over the years, the bulk of that has been on Sun systems (not that I prefer Sun systems - I actually always preferred IRIX with AIX a close second). So, I'm used to the Sun way of doing things (and, no, I will never accept that as now being the "Oracle way").

As someone coming from a heavy-Solaris background, I got used to NIC devices being assigned names that reflected the vendor/driver/architecture of the NIC in question. The fact that I could have ten NICs from ten different vendors, each with their own set of capabilities, but all just show up with a NIC device name of ethX, under Linux, always drove me kind of nuts. Yes, I know that I can get the information from other tools (`ethtool`, `kudzu`, looking through "/sys/class/net", etc.) but why should I have to when a crappy OS like Solaris allows me to get all that kind of stuff just by typing `ifconfig -a`?

Fortunately, Linux does provide to "fix" this grievous lack of self-documenting output. You just have to mess with the udev device-naming rules. These rules are stored under "/etc/udev/rules.d". In my particular case, I had a system that was equipped with a pair of dual-ported 10Gbps Mellanox Ethernet cards, a pair of Broadcom NetXtreme 10Gbps Ethernet NICs and a quad-port Broadcom card with 1Gbps Ethernet NICs on it. Now, for what I was using the system for, I didn't particularly care about the 1Gbps NICs, but I did care about the 10GBps NICs. I had specific plans for laying out my system. Even more importantly, once I turned the system over, I didn't want to be pestered by (less Linux-savvy) people about "which device is which kind of NIC." So, I improvised. I created my own rule file, "61-net_custom.rules", to make udev give more self-documenting (Solaris-esque) names to the 10Gbps NICs. Two simple rules:

DRIVER=="bnx2x", NAME="bnx%n"
DRIVER=="mlx4_en", NAME="mlxiv%n"

And my Broadcom 10Gbps NICs started showing up as bnxX devices and my Mellanox 10Gbps NICs started showing up as mlxivX devices in my `ifconfig -a` output. Well... I did have to tell udev to update itself so it would rename the devices, but, you get the general idea. Unfortunately, Linux purists (not sure you have such given how much of a mongrel Linux is) would probably whine about this. Furthering the misfortune is that, because Linux doesn't have standard driver-specific device naming for NICs (e.g., unlike Solaris where someone sees "ce0" and they know it's the first Cerdes 1Gbps Ethernet NIC in the system), the names I've chosen won't necessarily be inherently meaningful. Oh well, that's what a run-book is for, I suppose.