Wednesday, April 25, 2012

Asymmetrical NIC-bonding

Currently, the organization I am working for is making the transition from 1Gbps networking infrastructure to 10Gbps infrastructure. The initial goal had been to first migrate all of the high network-IO servers that were using trunked 1Gbps interfaces to using 10Gbps Active/Passive configurations.
Given the current high per-port cost of 10Gbps networking, it was requested that a way be found to not waste 10Gbps ports. Having a 10Gbps port sitting idle "in case" the active port became unavailable was seen as financially wasteful. A a result, we opted to pursue the use of asymmetrical A/P bonds that used our new 10Gbps links for the active/primary path and reused our 1Gbps infrastructure for the passive/failover path.

Setting up bonding on Linux can be fairly trivial. However, when you start to do asymmetrical bonding, you want to ensure that your fastest paths are also your active paths. This requires some additional configuration of the bonded pairs beyond just the basic declaration of the bond memberships.
In a basic bonding setup, you'll have three primary files in the /etc/sysconfig/network-scripts directory: ifcfg-ethX, ifcfg-ethY and ifcfg-bondZ. The ifcfg-ethX and ifcfg-ethY files are basically identical but for their DEVICE and HWADDR parameters. At their most basic, they'll each look (roughly) like:

DEVICE=ethN
HWADDR=AA:BB:CC:DD:EE:FF
ONBOOT=yes
BOOTPROTO=none
MASTER=bondZ
SLAVE=yes

And the (basic) ifcfg-bondZ file will look like:

DEVICE=bondZ
ONBOOT=yes
BOOTPROTO=static
NETMASK=XXX.XXX.XXX.XXX
IPADDR=WWW.XXX.YYY.ZZZ
MASTER=yes
BONDING_OPTS="mode=1 miimon=100"

This type of configuration may produce the results you're looking for, but it's not guaranteed to. If you want to absolutely ensure that your faster NIC will be selected as the primary NIC (and that it will fail back to that NIC in the event that the faster NIC goes offline and then back online), you need to be a bit more explicit with your ifcfg-bondZ file. To do this, you'll mostly want to modify your BONDING_OPTS directive. I also tend to add some BONDING_SLAVEn directives, but that might be overkill. Your new ifcfg-bondZ file that forces the fastest path will look like:

DEVICE=bondZ
ONBOOT=yes
BOOTPROTO=static
NETMASK=XXX.XXX.XXX.XXX
IPADDR=WWW.XXX.YYY.ZZZ
MASTER=yes
BONDING_OPTS="mode=1 miimon=100 primary=ethX primary_reselect=1"



The primary= tells the bonding driver to set the ethX device as primary when the bonding-group first onlines. The primary_reselect= tells it to use a interface selection policy of "best".

Note: The default policy is "0". This policy simply says "return to interface declared as primary". I choose to override with policy "1" as a hedge against the primary interface coming back in some kind of degraded state (while most of our 10Gbps media is 10Gbps-only, some of the newer ones are 100/1000/1000). I only want to fail back to the 10Gbps interface if it's still running at 10Gbps and hasn't for some reason, negotiated down to some slower speed.

When using the more explicit bonding configuration, the resultant configuration will resemble something like:

Ethernet Channel Bonding Driver: v3.4.0-1 (Octobber 7, 2008)



Bonding Mode: fault-tolerance (active-backup)
Primary Slave: ethX (primary_reselect better)
Currently Active Slave: ethX
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0



Slave Interface: ethX
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: AA:BB:CC:DD:EE:FF



Slave Interface: ethY
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: FF:EE:DD:CC:BB:AA

No comments:

Post a Comment