Titular Discrepancy: Linux Multipath Path-Failure Simulation

Previously, I've discussed how to set up the RedHat Linux storage multipathing software to manage fibrechannel-based storage devices. I didn't, however, cover how one tests that such a configuration is working as intended.
When it comes to testing resilient configurations, one typically has a number of testing options. In a fibrechannel fabric situation, one can do any of:

Offline a LUN within the array
Down an array's storage processors and/or HBAs
Pull the fibrechannel connection between the array and the fibrechannel switching infrastructure
Shut off a switch (or switches) in a fabric
Pull connections between switches in a fabric
Pull the fibrechannel connection between the fibrechannel switching infrastructure and the storage-consuming host system
Disable paths/ports within a fibrechannel switch
Disable HBAs on the storage-consuming host systems
Disable particular storage targets within the storage-consuming host systems

Generally, I favor approaches that limit the impact of the tested scenario as much as possible. I favor approaches that limit the likelihood of introducing actual/lasting breakage into the tested configuration.
I also tend to favor approaches where I have as much control of the testing scenario as possible. I'm an impatient person and having to coordinate outages and administrative events with other infrastructure groups and various "stakeholders" can be a tiresome, protracted chore. Some would say that indicates I'm not a team-player: I like to think that I just prefer to get things done efficiently and as quickly as possible. Tomayto/tomahto.
Through most of my IT career, I've worked primarily the server side of the house (Solaris, AIX, Linux ...and even - *ech* - Windows) - whether as a systems administrator or as an integrator. So, my testing approaches tend to be oriented from the storage-consumer's view of the world. If I don't want to have to coordinate outside of a server's ownership/management team, I'm pretty much limited to the last three items on the above list: yanking cables from the server's HBAs, disabling the server's HBAs and disabling storage targets within the server.
Going back to the avoidance of "introducing actual/lasting breakage", I tend to like to avoid yanking cables. At the end of the day, you never know if the person doing the monkey-work of pulling the cable is going to do it like a surgeon or like a gorilla. I've, unfortunately, been burned by run-ins with more than a few gorillas. So, if I don't have to have cables physically disconnected, I avoid it.
Being able to logically disable an HBA is a nice test scenario. It effects the kind of path-failure scenario that you're hoping to test. Unfortunately, not all HBA manufacturers seem to include the ability to logically disable the HBA from within their management utilities. Within commercial UNIX variants - like Solaris or AIX - this hasn't often proven to be a problem. Under Linux, however, the abilty to logically disable HBAs from within their management utilities seems to be a bit "spotty".
Luckily, where the HBA manufacturers sometimes leave me in the lurch, RedHat Linux leaves me some alternatives. In the spirit of the Linux DIYism, those alternatives aren't really all that fun to deal with ...until you write tools, for yourself, that removes some of the pain. I wrote two tools to help myself in this area: one is a tool which offlines designated storage paths and one is a tool which attempts to restore those downed storage paths.
Linux makes it possible to change the system-perceived state of a given device path by writing the term "offline" to the file location, /sys/block/${DEV}/device/state. Thus, were one to want to make the OS think that the path to /dev/sdg was down, one would execute the command, `echo "offline" > /sys/block/sdg/device/state`. All that my path-downing script is makes it so you can down a given /dev/sdX device by executing `pathdown.sh <DEVICE>` (e.g., `pathdown sdg`). There's minimal logic built in to verify that the named /dev/sdX device is a real, downable device and it provides a post-action status of that device, but, other than that, it's a pretty simple script.
To decide which path one wants to down, it's expected that the tester will look at the multipather's view of its managed devices using `multipath -l <DEVICE>` (e.g., `multipath -l mpath0`). This command will produce output similar to the following:

mpath0 (360a9800043346d364a4a2f41592d5849) dm-7 NETAPP,LUN
[size=20G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:1 sda 8:0   [active][undef]
 \_ 0:0:1:1 sdb 8:16  [active][undef]
 \_ 1:0:0:1 sdc 8:32  [active][undef]
 \_ 1:0:1:1 sdd 8:48  [active][undef]

Thus, if one wanted to deactivate one of the channels in the mpath0 multipathing group, one might issue the command `pathdown sdb`. This would result in the path associated with /dev/sdb being taken offline. After taking this action, the output of `multipath -l mpath0` would change to:

mpath0 (360a9800043346d364a4a2f41592d5849) dm-7 NETAPP,LUN
[size=20G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:1 sda 8:0   [active][undef]
 \_ 0:0:1:1 sdb 8:16  [failed][faulty]
 \_ 1:0:0:1 sdc 8:32  [active][undef]
 \_ 1:0:1:1 sdd 8:48  [active][undef]

Typically, when doing such testing activities, one would be performing a large file operation to the disk device (preferably a write operation). My test sequence is typically to

Start an `iostat` job, grepping for the devices I'm interested in, and capturing the output to a file
start up a file transfer (or even just a `dd` operation) into the device.
Start downing paths as the transfer occurs
Wait for the transfer to complete, then kill the `iostat` job
Review the captured output from the `iostat` job to ensure that the I/O behaviors I expected to see actually occurred

In the testing environment I had available when I wrote this page, I was using a NetApp filer presenting blockmode storage via fibrechannel. The NetApp multipathing plugin supports concurrent, multi-channel operations to the target LUN. Thus, the output from my `iostat` job will show uniform I/Os across all paths to the LUN, and then show outputs drop to zero on each path that I offline. Were I using an array that only supported Active/Passive I/O operations, I would expect to see the traffic move from the downed path to one of the failover paths, instead.
So, great: you've tested that your multipathing system behaves as expected. However, once you've completed that testing, all of the paths that you've offlined have stayed offline. What to do about it?
The simplest method is to reboot the system. However, I abhor knocking my systems' `uptime` if I don't absolutely have to. Fortunately, much as Linux provides the means to offline paths, it provides the means for reviving them (well, to be more accurate, to tell it "hey, go check these paths and see if they're online again"). As with offlining paths, the methods for doing so aren't currently built into any OS-provided utilities. What you have to do is:

Tell the OS to delete the device paths
Tell the OS to rescan the HBAs for devices it doesn't currently know about
Tell the multipath service to look for changes to paths associated with managed devices

The way you tell the OS to (gracefully) delete device paths is to write a value to a file. Specifically, one writes the value "`1" to the file /sys/block/${DEV}/device/delete. Thus, if one is trying to get the system to clean up for the downed device path, /dev/sdb, one would issue the command `echo "1" > /sys/block/sdb/device/delete`.
The way you tell the OS to rescan the HBAs is to issue the command `echo "- - -" > /sys/class/scsi_host/${HBA}/scan`. In Linux, the HBAs are numbered in the order found and named "hostN" (i.e., "host0", "host1", etc.). Thus, to rescan HBA 0, one would issue the command `echo "- - -" > /sys/class/scsi_host/host0/scan` (for good measure, rescan all the HBAs).
The way to tell the multipath service to look for changes to paths associated with managed devices is to issue the command `multipath` (I acutally use `multipath -v2` because I like the more verbose output that tells me what did or didn't happen as a result of the command). Granted, the multipath service periodically rescans the devices it manages to find state-change information, but I don't like to wait for systems to "get around to it".
All that my path fixing script does is rolls up the above, three steps into one, easy to remember and use command.
Depending on how you have logging configured on your system, the results of all the path offlining and restoration will be logged. Both the SCSI subsystem and the multipathing daemon should log events. Thus, you can verify the results of your activities by looking in your system logs.
That said, if the system you're testing is hooked up to an enterprise monitoring system, you will want to let your monitoring groups know that they need to ignore the red flags you'll be generating on their monitoring dashboards.

Titular Discrepancy

Wednesday, November 17, 2010

Linux Multipath Path-Failure Simulation

No comments:

Post a Comment