Tuesday, March 22, 2011

udev Abuse

I've probably mentioned before that I am a lazy systems administrator. So, I tend to like things be "self-documenting" and I like things to be as consistent as possible across platforms. I particularly like when a command that's basically common between two operating system versions gives me all of the same kinds of information - particularly if it's information that helps me avoid running multiple other commands.

I've also probably mentioned that, while I've managed a number of different UNIX and UNIX-like operating systems, over the years, the bulk of that has been on Sun systems (not that I prefer Sun systems - I actually always preferred IRIX with AIX a close second). So, I'm used to the Sun way of doing things (and, no, I will never accept that as now being the "Oracle way").

As someone coming from a heavy-Solaris background, I got used to NIC devices being assigned names that reflected the vendor/driver/architecture of the NIC in question. The fact that I could have ten NICs from ten different vendors, each with their own set of capabilities, but all just show up with a NIC device name of ethX, under Linux, always drove me kind of nuts. Yes, I know that I can get the information from other tools (`ethtool`, `kudzu`, looking through "/sys/class/net", etc.) but why should I have to when a crappy OS like Solaris allows me to get all that kind of stuff just by typing `ifconfig -a`?

Fortunately, Linux does provide to "fix" this grievous lack of self-documenting output. You just have to mess with the udev device-naming rules. These rules are stored under "/etc/udev/rules.d". In my particular case, I had a system that was equipped with a pair of dual-ported 10Gbps Mellanox Ethernet cards, a pair of Broadcom NetXtreme 10Gbps Ethernet NICs  and a quad-port Broadcom card with 1Gbps Ethernet NICs on it. Now, for what I was using the system for, I didn't particularly care about the 1Gbps NICs, but I did care about the 10GBps NICs. I had specific plans for laying out my system. Even more importantly, once I turned the system over, I didn't want to be pestered by (less Linux-savvy) people about "which device is which kind of NIC." So, I improvised. I created my own rule file, "61-net_custom.rules", to make udev give more self-documenting (Solaris-esque) names to the 10Gbps NICs. Two simple rules:

DRIVER=="bnx2x", NAME="bnx%n"
DRIVER=="mlx4_en", NAME="mlxiv%n"

And my Broadcom 10Gbps NICs started showing up as bnxX devices and my Mellanox 10Gbps NICs started showing up as mlxivX devices in my `ifconfig -a` output. Well... I did have to tell udev to update itself so it would rename the devices, but, you get the general idea. Unfortunately, Linux purists (not sure you have such given how much of a mongrel Linux is) would probably whine about this. Furthering the misfortune is that, because Linux doesn't have standard driver-specific device naming for NICs (e.g., unlike Solaris where someone sees "ce0" and they know it's the first Cerdes 1Gbps Ethernet NIC in the system), the names I've chosen won't necessarily be inherently meaningful. Oh well, that's what a run-book is for, I suppose.

Wednesday, March 9, 2011

Issues With LikeWise Open for UNIX/Linux Active Directory Integration

Recently it came up on management's radar that, "we're starting to get more UNIX systems out in our farms and we need some way to manage logins for those hosts in much the same way we do for our Windows hosts." Previous central login managment was done through the traditional tools like NIS or Kerberos (but mostly, just relied on ever-expiring local password tables). In a security conscious environment, NIS really isn't a responsible choice, any longer. Standing up a Kerberos infrastructure just to support UNIX systems is kind of an inefficient use of resources - especially when you already have a serviceable Active Directory infrastructure out there.

Fortunately, in the UNIX world, particularly in the Open Systems (Linux, BSD, etc.) realm, there are a number of choices available for joining a system to Active Directory. Many modern UNIX operating systems include this kind of functionality through Winbind. However, the various UNIX vendors or packagers don't necessarily keep fully up to date with the Windbind version they ship in their OS (looking in your direction, here, Sun Oracle). So, if you're running or looking to run Active Directory 2008 the Winbind included with your UNIX may not be a workable solution. Even if your UNIX does include an up to date version of Winbind (but not the bleeding-edge ones that are part of the next-generation Samba project), it may not be up to the task if you have a particularly large or complex Active Directory namespace. In this case, you'll probably be stuck using a commercial offering (or, a commercial AD-integration product's "free" version).
We were kind of stuck in the latter boat. Our AD namespaces tend to be rather large and rather complex. So, we started experimenting with the free version of the LikeWise product, LikeWise Open. In the test labs, with the smaller and simpler Active Directory deployment that supports it, we ran into no issues in testing. However, when we tried to take it into production, we ran into some issues in environments that had "messy" Active Directory deployments.

Specifically, the error we were getting (and could never really find a workaround for in the LikeWise Open support forums) was the "LW_ERROR_ENUM_DOMAIN_TRUSTS_FAILED" error. The domain that was giving us fits was a large domain (tens of thousands of users and thousands more groups, server objects and other, miscellaneous AD objects) that consisted of many AD servers scattered around the globe. Further complicating matters was the fact that these ADs were members of multiple domains and therefore had cross-realm/domain-trust components, as well. Further complicating it is the fact that not all of these AD servers, particularly the ones that had trust relationships with other realms, fully agreed on what time it was.
On the down side, the problem and the lack of easily-Googleable solutions cast doubt as to whether we'd be able to use this product in our enterprise. On the plus side, it gave me a chance to do some troubleshooting. I'm one of those masochists that likes a good challenge and digging into the guts of things. So, I did a bunch of online research as well as poring through the LikeWise administration guides.

Older versions of Likewise used to allow configuration-tweaking through an lsassd.conf file (indeed, other vendors, such as VMware still have this). The latest iteration of LikeWise, unfortunately, does not. Instead, the makers of LikeWise have decided to implement a Windows-esque "registry" for their product. Dunno why plain text files don't work (or even XML files, for that matter) - probably just wanted to make things more familiar for Windows admins that might get saddled with brining those evil Unix boxes into their domains. Whatever. It's painful but not insurmountable. LikeWise provides tools for hacking this file: lwregshell and lw-edit-reg. For me, lw-edit-reg was a more comfortable tool to use. All I had to do was make sure my EDITOR environmental variable was set to VI and I could hack the file to my heart's content with `vi`.

At any rate, I fired up lw-edit-reg and began to dig around for likely tweakables. Given that my error mentioned "TRUSTS", I did a global search for any parameters metniong "TRUSTS". I found the parameter, "DomainManagerIgnoreAllTrusts", and saw that it was set to false (well, the registry equivalent which was "dword:00000000"). So, I tried changing that to "true" (modifying the value to "dword:00000001"). I then bounced all of the LikeWise processes and re-attempted to join my box to the messy domain. VoilĂ , it worked and all was happy in UNIX-as-AD-Client land!

Of course, it wasn't until after I'd found my fix through the lw-edit-reg manipulations that I thought to check to see if lwconfig could be used. It seems like there's two variants of the lwconfig command in the LikeWise 6 release family. One supports the "--dump" option (which shows all tunable parameters along with their currently-set values) and one does not (thus, seeing available tunables and their current values are a bit less straight-forward). At any rate, upon further investigation, I found that I could use lwconfig to make my settings changes. Thus, it makes the software installation and configuration a lot more scriptable. I can modify my automated installation policies to do an `lwconfig DomainManagerIgnoreAllTrusts true` operation if it has issues with the normal `domainjoin-cli ...` operation.