My July 19, 2009 entry, Linux Ethernet Bonding for Switch Fail-Over, talks about two switches connected via a valid Inter-Switch Link (ISL). What does that mean?
In order to answer that seemingly simple question, we need to answer another question: "How do switches fail-over?"
Well, actually, they don't. Or at least, not exactly.
Switches serve a more physical function than things like databases or firewalls that we often associate with failover. This is because servers and other devices are physically plugged into switches, often as their only connection to the network. Because of this, switches can't fail-over in the way that, for example, a Cisco ASA firewall can fail-over. If a firewall fails, its backup unit can seamlessly take over traffic. But if a switch fails, there is no seamless way to unplug connected devices and plug them into a different switch. (Hmmm, perhaps a business opportunity for a robotics startup? Nah...)
But we can create redundancy at the switching layer of the network. This redundancy is generally accomplished through two mechanisms: host distribution and trunking.
Host distribution means ensuring that hosts are connected to multiple switches. In other words, if you have six web servers, all six of them shouldn't be connected to the same switch. Trunking means allowing switches to pass ethernet traffic to each other, regardless of which VLAN that traffic belongs to.
The best way to achieve host distrubution varies. The least expensive way is to simply divide your hosts in half, and connect one half to one switch, and one to another. This is simple and cheap, but it has a couple of drawbacks. First, if a switch does fail you lose half of your capacity. Your site might still be up, but unless you are running at 50% capacity, you may be serving packets slower than you would like. And second, if a switch fails you'll need to failover any monolithic services, like databases, which could have survived the switch failure if they had an alternate connection.
Another way is to simply use ethernet bonding to have every host connected to two different switches. This provides the greatest flexibility, and prevents failover of monolithic services like databases. But it comes at a significant cost, you have to purchase twice as many switchports for a given number of hosts.
Many sites us a hybrid. Horizontally scalable servers like databases are connected to one switch each, and a carefully divided between switches so that you only lose a percentage of them in any given switch failure. Monolithic hosts like databases or storage heads use ethernet bonding so that the host remains available in a switch failure scenario.
So that's host distrubution, what about trunking? Why can't two switches just be connected with a cable?
Well, if you are not using VLANs on a switch, they can be. But there are two problems with that scenario. One is that the cable becomes a Single Point of Failure (SPOF) for your network. And two is that the inter-switch traffic will be limited to the size of one port.
But even if you were willing to accept those problems, if you are using VLANs you have another problem. A simple cable will only transmit packets for the VLAN it is connected to. Traffic on other VLANs will be orphaned. Trunking solves this problem via "tagging" of packets -- the switch tags every packet with a VLAN identifier, which is removed on the receiving switch and sent to the correct VLAN on that swtich. It sounds like a lot of work for the switch, but modern switches handle it effortlessly.
So a trunk is usually two components. First, multiple ports on a switch are joined together in an "etherchannel", which always many ports to act as a single port. Thus, a gigabit switch can have a 5Gbps connection to another switch by creating a 5-port etherchannel. Then, the etherchannel port is trunked, using IEEE 802.1q trunking. Most major switch vendors support multiple trunking protocols; I like to use 802.1q because it works well across mutliple vendors.
Comments