Sunday, September 26, 2010

LAG, Load Rebalancing & QoS Shaping

LAG feature exposes only one L2 interface to the IP stack for each LAG instance.  It hides all the links in the LAG instance underneath it.  It sounds good in the sense that IP stack & other applications are completely transparent with respect to number of links that are being added and removed.

Though many applications and IP stack don't care about the LAG, links and its properities,  one application QoS would need to worry about the link properities - specifically its bandwidth (shaping bandwidth).  In ideal world,  even QoS does not need to worry about links and its properties. As we all know, to ensure that there is mis-ordering of the packets in a given conversation,  distributor function of the LAG module distributes the conversations across the links, not the packets. If there are large number of conversations compared to the links, there is always possibility of equal distribution of the traffic across the links. But when there are small number of conversations, which by the way not so uncommon, then there is a possibility of unequal distribution with respect to the traffic.  That is, there could be more traffic in some conversations compared to others. If high traffic conversations go to few links, then there is unequal distribution.  Let me cover QoS and changes required in QoS to work with LAG.

Load Rebalancing:

LAG distributor normally implements the concept of 'Load Rebalancing'.  Load rebalncing happens in three cases.
  •  When LAG observes that there is unequal distribution.
  •  When new link is added to the LAG instance.
  •  When existing link is removed, disabled or broken.
Though new link and removal of existing link to/from the LAG instance is not the focus of this article, let me just give a gist of  issues that need to be taken care.  Packet mis-order issue must be taken care well.  When the new link is added,  if hash distribution is changed immediately, some of the existing conversations might be balanced to other links. If it is done arbitrarily, then there is a possibility of packets being received by collector in out-of-order for brief amount of time.  To make sure that new link is used effectively,  there are two methods can be used. Both can be used togehter though.
  •  New conversations would use new hash distribution.
  •  Current conversations can be put onto other links only if the conversation is idle for X milliseconds - Time at which we know that packets would have been collected by the collector.
When link is no longer active,  then packet mis-ordering is no longer a big issue.  The traffic has to flow and new distribution can take effective immediately and distirbute the conversations that belong to the old port to existing ports immediately.

Now on to redistribution due to unequal utilitization of links:

Redistribution can be done in two ways - Changing the hash algorithm or fields to be used in hash algorithm.  Second is to some how increase the number of conversations.  Second method of increasing the conversations would work only in cases where tunnels (such as Ipsec) are conversations.  By increasing the number of tunnels,  there is a good possibility of increasing the distribution. Actual 5-tuple flows are sent on multiple tunnels. See this link here on how LAG & IPsec work together.

Changing the hash algorithm or adding/removing fields to the hash algorithm would have mis-order issues.  In some deployments mis-order once in a while is okay.  In those cases, this methoed can be used. To use this method, rebalancing should not happen very frequently.  Typically following mehtod is used - If a link utilization is more than X% (Typically 5 to 10% - configurable parameter) away from the average usage of the trunk, then it is candidate for redistribution.  Stop doing redistribution for configurable amount of seconds to ensure that there are no frequent redistributions.

QoS:

Typically QoS shaping & scheduling function runs on top of L2 interfaces.  Trunk link would be given the shaping bandwidth. Shaping is typically implemented using token bucket algorithm.  Whenever there are tokens available,  scheduling function is invoked.  Scheduling function selects the next packet and sends the packet out.

LAG instance which is actiing as L2 interface has the shaping bandwidth which is sum of all the links. If the scheudling decision is taken purely based on the LAG trunk bandwidth, there is a possibility that scheduled packet would get dropped if the packet goes on link which is already completely utilized. This happens when there is uneven traffic in the convesations.  Rebalancing helps, but it takes some time rebalance the traffic. Hence QoS shaping and scheduling function should not only consider the LAG instance bandwidth, but also the individual link bandwidth while making scheduling decision. By considering both,  at least the paket from the high traffic conversation is not scheduled and resides still in the queue, there by avoiding packet drop.
At the same time, it is not good to under utilize other links. Scheduling, in this case, can move to other traffic that fall in other under-utilized links.

LAG is important feature, but it has its own challenges.  IPsec and Qos implementations need to work with LAG properly to utilize LAG effectively.

Comments?

No comments: