Tuesday, November 30, 2010

Linux (Networking): Oh How I Hate You

Working with most commercial UNIX systems (Solaris, AIX, etc.) and even Windows, you take certain things for granted. Easy networking setup is one of those things. It seems to be particularly so for systems that are designed to work in modern, redundant networks. Setting up things like multi-homed hosts is relatively trivial. I dunno. It may just be that I'm so used to how commercial OSes do things that, when I have to deal with The Linux Way™ it seems hopelessly archaic and picayune.

If I take a Solaris host that's got more than one network address on it, routing's pretty simple. I can declare one default route or I can declare a default route per interface/network. At the end of the day, Solaris's internal routing mechanisms just get it right. The only time there's really any mucking about is if I want/need to set up multiple static routes (or use dynamic routing protocols).

Linux... Well, perhaps it's just the configuration I had to make work. Someone wanted me to get a system with multiple bonded interfaces set up with VLAN tagging to route properly. Having the commercial UNIX mindset, I figured "just declare a GATEWAY in each of the bonded interface's /etc/sysconfig/network-scripts file" and that would be the end of the day.

Nope. It seems like Linux has a "last default route declared is the default route" design. Ok. I can deal with that. I mean, I used to have to deal with that with commercial UNIXes. So, I figured, "alright, only declare a default route in one interface's scriptfile". And, that sorta worked. I always got that one default route as my interface. Unfortunately, Linux's network routing philosophy didn't allow that to fully work as experience with other OSes might lead one to expect.

On the system I was asked to configure, one of the interfaces happened to be on the same network as the host I was administering it from. It should be noted that this interface is a secondary interface. The host's canonical name points to an IP on a different LAN segment. Prior to configuring the secondary interface on this host, I was able to log into that primary interface with no problems. Unfortunately, adding that secondary interface that was on the same LAN segment as my administration host cause problems. The Linux routing saw to it that I could only connect to the secondary interface. I was knocked out of trying to get into the primary interface.

This seemed odd. So, I started to dig around on the Linux host to figure out what the heck was going on. First up, a gander at the routing tables:

# netstat -rnv
Kernel IP routing table
Destination     Gateway          Genmask         Flags   MSS Window  irtt Iface
192.168.2.0     0.0.0.0          255.255.255.0   U         0 0          0 bond1.1002
192.168.33.0    0.0.0.0          255.255.255.0   U         0 0          0 bond0.1033
169.254.0.0     0.0.0.0          255.255.0.0     U         0 0          0 bond1.1002
0.0.0.0         192.168.33.254   0.0.0.0         UG        0 0          0 bond0.1033

Hmm... Not quite what I'm used to seeing. On a Solaris system, I'd expect something more along the lines of:

IRE Table: IPv4
  Destination             Mask           Gateway          Device   Mxfrg Rtt   Ref Flg  Out  In/Fwd
-------------------- ---------------  -------------------- ------  ----- ----- --- --- ----- ------
default              0.0.0.0          1
92.168.8.254                1500*     0   1 UG    1836      0
192.168.8.0          255.255.255.0    1
92.168.8.77         ce1     1500*     0   1 U      620      0
192.168.11.0         255.255.255.0    192.168.11.222       ce0     1500*     0   1 U        0      0
224.0.0.0            240.0.0.0        1
92.168.8.77         ce1     1500*     0   1 U        0      0
127.0.0.1            255.255.255.255  127.0.0.1            lo0     8232*     0   1 UH   13292      0

Yeah yeah, not identically formatted output, but similar enough that things on the Linux host don't look right if what you're used to seeing is the Solaris system's way of setting up routing. On a Solaris host, network destinations (i.e., "192.168.2.0", "192.168.33.0", "192.168.8.0" and "192.168.11.0" in the above examples) get routed through an IP address on a NIC. On Linux, however, it seems like all of the network routes were configured to go through whatever the default route was.

Now, what `netstat -rnv` is showing for interface network routes may not be strictly representative of what Linux is actually doing, but, both what Linux is doing and how its presented is wrong - particularly if there's firewalls between you and the multi-homed Linux hosts. The above output is kind of a sloppy representation of Linux's symmetrical routing philosphy. Unfortunately, because of the way Linux routes, If I have a configuration where the multi-homed Linux host has an two IP addresses - 192.168.2.123 and 192.168.33.123 - and I'm connecting from a host with an address of 192.168.2.76 but am trying to connect to the Linux host's 192.168.33.123 address, my connection attempt times out. While Linux may, in fact, receive my connection request at the 192.168.33.123 address, its default routing behavior seems to be to send it back out through its 192.168.2.123 address - ostensibly because the host I'm connecting from is on the same segment as the Linux host's 192.168.2.123 address. 

Given my background, my first thought is "make the Linux routing table look like the routing tables you're more used to.". Linux is nice enough to let me do what seem to be the right `route add` statements. However, it doesn't allow me to nuke the seemingly bogus network routes pointing at 0.0.0.0.

Ok, apparently I'm in for some kind of fight. I've gotta sort out the routing philsophy differences between my experience and what the writers of the Linux networking stack are. Fortunately, I have a good friend (named "Google") who's generally pretty good at getting my back. It's with Google's help that I discover that this kind of routing problem is handled through Linux's "advanced routing" functionality. I don't really quibble about what's so "advanced" about sending a response packet back out the same interface that the request packet came in on. I just kinda shrug and chalk it up to differences in implementation philosophy. It does, however, leave me with the question of, "how do I solve this difference of philosphy?"

Apparently, I have to muck about with files that I don't have to on either single-homed Linux systems or multi-homed commercial UNIX systems. I have to configure additional routing tables so that I can set up policy routing. Ok, so, I'm starting to scratch my head here. By itself, this isn't inherently horrible. However, it's not one of those topics that seems to come up a lot. It's neither well-documented in Linux nor do many relevant hits get returned by Google. Thus, I'm left to take what hits I do find and start experimenting. Ultimately, I found that I had to set up five files (in addition to the normal /etc/sysconfig/network-scripts/ifcfg-* files) to get thinks working as I think they ought:

/etc/iproute2/rt_tables
/etc/sysconfig/network-scripts/route-bond0.1033
/etc/sysconfig/network-scripts/route-bond1.1002
/etc/sysconfig/network-scripts/rule-bond0.1033

/etc/sysconfig/network-scripts/rule-bond1.1002

Using the "/etc/iproute2/rt_tables" file is kind of optional. Mostly, it lets me assign logical/friendly names to the extra routing tables I need to set up. I like friendly names. They're easier to remember and can be more self-documenting than plain-Jane numeric IDs. So, I edit the "/etc/iproute2/rt_tables" and add two lines:

2         net1002
33        net1033

I should probably note that what I actually wanted to add was:

1002      net1002
1033      net1033

I wanted these aliases as they would be reflective of what my 802.1q VLAN IDs were. Unfortunately, Linux seems to limit the range of numbers you can allocate table IDs out of. Worse, there are reserved and semi-reserved IDs in that limited range, further limiting your pool of table ID candidates. So, creating an enterprise-deployable standard config file might not be practical on a network with lots of VLANs, subnets, etc. Friendly names set up, I then had to set up the per "NIC" routing and rules files.

I set up the "/etc/sysconfig/network-scripts/route-${IF}.{$VLANID}"  with two routing statements:

table ${TABLENAME} to ${NETBASE}/${CIDR} dev ${DEV}.${VLAN}
table ${TABLENAME} to default via ${GATEWAY} dev ${DEV}.${VLAN}

This might actually be overkill. I might only need one of those lines. However, it was late in the day and I was tired of experimenting. They worked, so, that was good enough. Sometimes, a sledgehammer's a fine substitute for a scalpel.

Lastly, I set up the "/etc/sysconfig/network-scripts/rule-${IF}.{$VLANID}" files with a single rule, each:

from ${NETBASE}/${CIDR} table ${TABLENAME} priority ${NUM}

Again, the priority value's I picked may be suboptimal (I set the net1002 priority to "2" and the net1033 priority to "33"). But, since they worked, I left them at those values.

I did a `service network restart` and was able to access my multi-homed Linux host by either IP address from my workstation. Just to be super-safe, I bounced the host (to shake out anything that might have been left in place by my experiments). When the box came back from the reboot, I was still able to access it through either IP address from my workstation.

No comments:

Post a Comment