Friday, November 12, 2010

NetApp as Block Storage (fibrechannel) for RedHat Linux

I work for an enterprise that's starting to go down the Linux path. This has mostly come about due to the fact that they are also heavily pursuing virtualization technologies. It's part of an overall effort to make their IT more efficient by increasing the utilization of server resources and reducing the sheer size of the physical infrastructure.

The enterprise I work for is heavily Windows-oriented. Most of their "UNIX" people are converts from Windows - frequently converts of necessity or managerial-dictate rather than choice. As a UNIX-user since 1989, I'm one of the few "true" UNIX geeks they have. So, I frequently get tasked with making technologies work with their UNIX platforms.

Most recently, there's been an effort that's required the deployment of a few RedHat-based physical servers. The applications being deployed on these server requires block-based storage. iSCSI isn't really an option in this environment, as they are just now starting to get into 10Gbps Ethernet deployments (and have no interest in deploying dedicated iSCSI networking infrastructure). The primary block mode storage platforms in use by our enterprise are solutions from EMC and, to a much lesser extent, NetApp.

The first Linux project to be rolled out on fibrechannel storage will be using NetApp. So, I had to go through the joys of putting together a "How To" guide. While my background is heavy in UNIX, most of my career has been spent on Solaris, AIX and IRIX. Linux has only started appearing in my professional life within the last 24 months or so. It's changed a lot since I first used it back in the early- to mid-90s. Given my relative thinness in Linux, Google (and the storage systems' vendor docs) have been my boon companions.

I'm writing this page mostly so I have a crib-sheet for later use. Delivered documentation seems to have a habit of becoming lost by our documentation custodians.

To start off with, the storage engineers wanted to make sure that there were several items in place for this solution:

  • Up to date QLogic drivers
  • Availability of the QLogic SANsurfer utilities
  • Ability to use the Linux native multipathing solution

Qlogic HBA Drivers

Ultimately, I had three choices for the QLogic drivers: use the ones that come with RedHat Enterprise Linux 5; use ones furnished by our hardware vendor in their driver and utilities "support pack"; or, use the ones from QLogic.

Some parties wanted to use the ones supplied by our hardware vendor. This made a certain sense since they needed other components installed from the "support pack". The idea was that, by going with the "support pack", 100%, for firmware, drivers and utilities, we'd have a "known quantity" of compatibility tested software. That decision lasted less than two business days. The QLogic (and some other) drivers supplied as part of the "support pack" were distributed in source-code form. Our security requirements make it so that deploying such software would require maintaining externally compiled versions of those drivers. While not inherently problematic, it was unwieldy. Further exacerbating things was the fact that using these binaries would require additional coordination with components already installed as part of the core OS.

As noted previously, Linux, in our enterprise, has primarily been slated for virtualized efforts. As such, the primary build versions for Linux were optimized for virtual hardware. A lot of the supporting components for physical deployments were stripped out at the behest of security. It became quickly apparent that maintaining this software would be an odious task and none of us wanted to be on the hook for it. So, we opted to not use the source-code-derived drivers from the "support pack".

We also discovered that the driver packages offered by QLogic were in similar source-form. So, they, too, were disqualified from consideration.

Fortunately, RedHat's Enterprise Linux 5.x comes with the drivers needed for the QLogic HBAs we use. Thus, we chose the path of least resistance - to use the RedHat drivers. All I had to do was document which driver revisions were included with each RHEL release and patch-level to be in use in our farms. An annoying task, but fairly easy to knock out, given the earliness of deployment of Linux in our enterprise.

SANsurfer

We also had a choice of suppliers for the SANsurfer utilities. We could use the ones straight from QLogic or the ones from our hardware vendor. We chose the latter, primarily based on the fact that our support contracts are with the hardware vendor, not QLogics. It was assumed that, should we run into issues, support would be more easily gotten from a company we've got direct support contracts with rather than one that we didn't. Fortunately, both our hardware vendor and QLogics both provide their utilities in nice, standalone install bundles. We didn't have to worry about hassling with compiling tools or dependency tracking.

Linux MultiPath and NetApp FC Storage

RedHat Linux include a fairly serviceable multipathing solution. It's extensible through plugins that provide multipathing policy and management modules.

In order to allow the native multipathing drivers to work with NetApp LUNs, I had to grab the NetApp FCP Host Utilities Kit from NetApp's NOW site. Interestingly, NetApp seems to be one of the few vendors that hasn't "unified" their version numbering across supported platforms. Windows, Solaris, ESX and Linux all seem to be at different 5.x levels (with the latest Linux version being 5.3).

I'll give NetApp credit: their documentation for the utilities was pretty straignforward and the software bundle was trivial to install. All I had to do was grab the RPM from the NOW site and install it on the test box. If I had any complaint, it's that they didn't include an sample /etc/multipath.conf file. Fortunately, they did include one in the installation PDFs. So, I did a quick copy-and-paste into a PuTTY vi session, cleaned up the file and saved it off for inclusion in an automated provisioning policy. Basically, the file looks like

defaults {
        user_friendly_names     yes
        max_fds                 max
        queue_without_daemon    no
        flush_on_last_del       yes
        }

blacklist {
        wwid DevId
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z]"
        devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
        }

devices {
        device {
                vendor                  "NETAPP"
                product                 "LUN"
                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout            "/sbin/mpath_prio_ontap /dev/%n"
                features                "1 queue_if_no_path"
                hardware_handler        "0"
                path_grouping_policy    group_by_prio
                failback                immediate
                rr_weight               uniform
                rr_min_io               128
                path_checker            directio
                }
        }

So, if you found this page via an internet search, the above is all you should need to make the Linux multipathing service work with the NetApp FCP Host Utilities to make multipathing work with NetApp arrays.

Once you have the above /etc/multipath.conf file in place, just start up the multipath daemon and set it up to restart at system boot. For RHEL 5.x, this is just a matter of executing:

     # service multipathd start
     # chkconfig multipathd on

Once this service is started, any LUNs you present to your RedHat box over your fabric will be path-managed

LUNs "On the Fly"

One of the things you take for granted using OSes like Solaris, AIX and IRIX is the inclusion of tools that facilitate the addition, modification and deletion of hardware "on the fly". Now, there are utilities you can install onto a Linux system, but there doesn't, yet, seem to be a core component that's the equivalent of Solaris's devfsadm. Instead, the only really universally available method for achieving similar results is to do:

     # echo "- - -" > /sys/class/scsi_host/hostN/scan

Doing the above, after your friendly neighborhood SAN administrator has notified you "I've presented your LUNs to your system", you'll see new storage devices showing up in your `fdisk` output (without having to reboot!). It's worth noting that, even with the NetApp FCP host utilities installed, if your LUNs are visible on multiple fibrechannel paths, you'll see a /dev/sdX entry for each and every path to that device. So, if you have a LUN visible to your system through two HBAs and two SAN fabrics, you'll see four /dev/sdX devices for every "real" LUN presented.

Multipath Magic!

If you want to verify that the multipathing service is seeing things correctly, execute `multipath -ll`. This will yield output similar to the following:

mpath0 (360a9800043346d364a4a2f41592d5849) dm-7 NETAPP,LUN
[size=20G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:1 sda 8:0   [active][undef]
 \_ 0:0:1:1 sdb 8:16  [active][undef]
 \_ 1:0:0:1 sdc 8:32  [active][undef]
 \_ 1:0:1:1 sdd 8:48  [active][undef]

 

If you want to verify it by using the utilities included in the NetApp FCP Host Utilities, execute `sanlun lun show -p all`. This will yield output similar to the following:

filer:/vol/fcclient/lun1 (LUN 1)          Lun state: GOOD
Lun Size:     20g (21474836480)  Controller_CF_State: Cluster Disabled
Protocol: FCP           Controller Partner:
DM-MP DevName: mpath0   (360a9800043346d364a4a2f41592d5849)     dm-7
Multipath-provider: NATIVE
--------- ---------- ------- ------------ --------------------------------------------- ---------------
   sanlun Controller                                                            Primary         Partner
     path       Path   /dev/         Host                                    Controller      Controller
    state       type    node          HBA                                          port            port
--------- ---------- ------- ------------ --------------------------------------------- ---------------
     GOOD  primary       sda        host0                                            0c              --
     GOOD  primary       sdb        host0                                            0d              --
     GOOD  primary       sdc        host1                                            0b              --
     GOOD  primary       sdd        host1                                            0a              --

 

In both the multipath and sanlun output, there will be two fields worth noting: the device name and the device ID. In the above output, the device name is "mpath0" and the device ID is "360a9800043346d364a4a2f41592d5849". In the Linux device tree, "mpath0" can be found at "/dev/mapper/mpath0"; the device ID can be found at "/dev/disk/by-id/360a9800043346d364a4a2f41592d5849". Use the "/dev/mapper" entries in your "/etc/fstab" and when you run `mkfs`.

If you're feeling really enterprising, you can verify the multipathing works by downing FC paths in your SAN and verifying that the Linux hosts detects and reacts appropriately.

No comments:

Post a Comment