Monday, December 31, 2012

L4 Switching & IPSec Requirements & Openflow extensions


L4 Switching - Introduction

L4 switching typically involves connection tracking,  NAT and common attack checks.  Stateful inspection firewalls,  NAT, URL filtering  and SLB (Server Load Balancing) are some of the middle-box functions that take advantage of L4 switching.  These middle-box functions inspect first few packets of every connection (TCP/UDP/ICMP etc..) and offload the connection processing to some fast path entity to do L4 switching.  Normally both fastpath and normal path functionality reside in the same box/device.

Openflow, promoted by industry big names,  is one of the protocols that separates control plane from data plane where control planes of multiple switches implemented in one logical central controller and leaving the data plane alone at devices, which are programmable to work in a specified way,  thereby making devices simple.

Middle-box functions described above can also be separated - Normal path (service plane) and fast path (L4 switchiing).  Normal path works on first packet or at the most first few packets and fast path works on the connection for rest of the packets in the connection. By implementing normal path (Service plane) in centralized logical controllers and leaving the fastpath (L4 switching) at the physical/logical device level,  similar benefits of CP/DP separation can be achieved.  Benefits include
  • Programmable devices where device personality can be changed by controller applications.  Now,  Openflow switch can be made as L2 switch, L3 switch and/or L4 switch.
  • By removing major software logic (in this case, it is service plane) from device to the central location, cost of ownership goes down for end customers.
  • By centralizing the service plane,  operation efficiency can be improved significantly
    • Ease-of software upgrades
    • Configure/Manage from a central location.
    • Granular flow control.
    • Visibility across all devices.
    • Comprehensive traffic visualization

L4 Switching - Use cases:

  • Branch office connectivity with corporate head quarters :   Instead of having firewall, URL filtering and Policy control on every branch office device,  it can be centralized at corporate office. First few packets of every connection could go to main office.  Main office controller decides on the connection. If the connection is allowed,  it lets the openflow switch in the branch office to forward rest of the traffic on that connection.  This method only requires simple openflow switches in the branch office and all intelligence can be centralized at one location, that is, at main office. Thus, it reduces the need for skilled administrator at every branch office.
  • Server Load Balancing in Data Centers:   Today data-centers have expensive server load balancers to distribute the incoming connections to multiple servers.  SLB devices are big boxes today for two reasons -  Normal path processing that selects the best server for every new connection and  fastpath packet processing are combined into one box/device.   By offloading packet processing to inexpensive openflow switches,  SLB devices only need to worry about normal path processing, which could be done in lesser expensive boxes and even using commodity PC hardware.  
  • Managed Service providers providing URL filtering capability to home & SME offices :  Many homes today use PC based software to do URL filtering  to provide safe internet experience to kids.  SME administrators require URL filtering service to increase the productivity of their employees by preventing them to access recreational sites and also to prevent any malware contamination.  Again, instead of deploying URL filtering service at customer office premises,   service providers like to host this service centrally and manage centrally across multiple of their customers for operational efficiency.  Openflow controller implementing URL filtering service can get hold of packets until URL is fetched, find the category of URL, apply policy and decide on whether to continue the connection.  If the connection is to  be continued, then it can program the openflow switch in customer premises to forward rest of the packets.  
  • Though URL filtering service is one use case given,  this concept of centralizing the intelligence at one central place and programming the openflow switches for fast path is equally applicable for "Session Border Controller" and "Application Detection using DPI" and other services. Whichever service needs to inspect first few packets of the connection is candidate for Service Plane/L4 Switching  separation using Openflow.

L4 Switching - Processing

L4 switching or fastpath entity are used to do connection level processing. Any connection consists of two flows -  Client-to-Server (C-S) flow and Server-to-Client (S-C) flow.  

Connections are typically 5-tuple based.  

Functionality of L4 switching:
  • Connection Management :  
    • Acts on the connection Add/Remove/Query/Modify requests from the service plane.
    • Inactivity timeout :  Activity is observed on the connection. If no activity for some time (programmed during the connection entry creation),   connection entry is removed and notification is sent to the normal path.
    • TCP Termination : L4 switching entity can remove the connection entry if it observes the both endpoints of TCP connections exchange FINs and ACKed. It can also remove the entry if TCP packet with RST is seen.
    • Periodic notifications of the state collected to Service plane.
  •  Packet processing logic involves
    • Pre-flow lookup actions typically performed:
      • Packet integrity checks :  Checksum verification of both IP and transport headers,  Ensuring that the length field values are consistent with packet sizes etc..
      • IP Reassembly:  As the flows are identified by 5-tuples, it is required that packets are reassembled to determine the flow across fragments. 
      • IP Reassembly related attack checks and remediation
        • Ping-of-Death attack checks (IP fragment Overrun) - Checking for full IP packet never exceeds 64K.
        • Normalization of the IP fragments to remove overlaps to protect from teardrop related vulnerabilities.
        • Too many overlap fragments - Remove the IP reassembly context when too many overlaps observed.
        • Too many IP fragments  & Very small IP fragment size:  Drop the reassembly context and associated IP fragments.
        • Incomplete IP fragments resulting Denial of Service attacks - Limit the number of reassembly contexts on per IP address pair,  IP address etc..
    • TCP Syn flood protection using TCP Syn Cookie mechanism
    • Flow & Connection match:  Find the flow entry and corresponding connection entry.
    • Post lookup actions
      • Network Address Translation :  Translation of SIP, DIP, SP, DP.
      • Sequence number translation in TCP connections :  This is normally needed when the service plane updates the TCP payload that decreases or increases the size or when the TCP syn cookie processing is applied on the TCP connection.
      • Delta Sequence number updates in TCP connections :  This is normally required to ensure that right sequence number translation is applied, especially for retransmitted TCP packets.
      • Sequence number attack checks :  Ensuring that sequence numbers of the packets going in one direction of the connection are within some particular range. This is to ensure that any attacker injecting the traffic is not honored.  This check is mainly required to stop TCP RST packets that are generated and sent by attackers.  As RST packet terminates the connection and thereby creating DoS attack, it is required that this check is done.
      • Forwarding action :  To send the packet out by referring to routing tables and ARP tables.

IPSec - Introduction

IPSec is used to secure the IP packets. IP packets are encrypted to avoid leakage of data on the wire and also authenticated to ensure that the packets are indeed sent by the trusted party.  It runs in two modes - Tunnel mode and transport mode.   IP packets are encapsulated in another IP packet in tunnel mode.  No additional IP header in involved in transport mode. IPSec applies the security on per packet basis - Hence it is datagram oriented protocol.  Multiple encryption and authentication algorithms can be used to secure the packets.   Encryption keys and authentication keys are established on per tunnel basis by the Internet Key Exchange Protocol (IKE).   Essentially,  IPSec protocol suite contains IKE protocol and IPSec-PP (Packet processing).  

Traditionally,  implementations use proprietary mechanism between IKE and IPSec-PP as both of them sit in the same box.  New way of networking (SDN) centralizes control plane with distributed data paths.  IPSec is one good candidate for this.  When the control plane (IKE) is separated from the data path(IPSec-PP),   then there is a need for some standardization for communication between IKE and IPSec-PP.  Openflow is thought to be one of the protocols to separate out control plane and data plane.   But Openflow as defined in OF 1.3.x specification did not keep IPSec CP-DP separation in mind.  This blog post tries to provide extensions required in OF specification to enable IPsec CP-DP separation.


ONF (Open Networking Foundataion) is standardizing Openflow protocol.  Controllers using OF protocol programs the OF switches with flows and instructions/actions to be performed on packets.  Openflow 1.3+ specification supports multiple tables which simplifies the controller programming as each table in the switch can be used for a purpose.

Contrary to what everybody says,  Openflow protocol is not really friendly let alone to SP/FP separation, but also to L3 CP/DP separation.  The way I look at OF today is that it is good for L2 CP/DP separation and also good for traffic steering kind of use cases, but it is not sufficient for L3 and L4 switching.  This post tries to give some of the features that are required in OF specifications to support L4 switching in particular and L3 switching to some extent.

Extensions Required to realize L4 switching & IPSec CP-DP separation

Generic extensions:

Table selection on packet-out message:

Problem Description
Current OF 1.3 specification does not give provision for controllers to start the processing of packet-out message from a specific table in the switch.  Currently,  controller can send the packet-out message with set of actions.    These actions are expected to be executed in the switch.  Typically 'OUTPUT' action is expected to be specified by the controller in the action list.  This action sends the packets out on the port given in "OUTPUT" action.  One facility is provided to run through OF pipeline though.  If the port in 'OUTPUT' action is specified to be a reserved port OFPP_TABLE,  then the packet-out message execution starts from table 0.

Normally controllers takes advantage of multiple tables in the OF switch by dedicating tables to purposes.  For example, when L4 switching  with L3 forwarding are used, multiple tables are used - A table to classify packets,  A table for Traffic Policing entries,  A table for PBR and few tables for routing tables,  a table for L4 connections and a table for next hop entries and a table for traffic shaping rules.   Some tables are populated pro-actively by the OF controller , hence miss packets are not expected by controllers.  Entries in some tables are populated re-actively.  Tables such as 'L4 connections',  'Next hop' entries are created reactively by the controllers.   Processing of the miss packets (packet-in) can result into a flow being created in the table which was the cause of miss packet. Then controllers likes to send back the packet to the OF switch and would like OF switch to start processing the packet as OF switch does when there is a matching entry. Controller programming does not get complicated  if that was possible.  As indicated before,  due to limitations of the specification,  controller can't ask switch to start from a specific table.   Due to lack of this feature in the switches,   controllers are forced to figure out the actions that would need to be performed and then program the action list in the packet-out message.  In our view,  that is quite complex task for the controller programmers.  As I understand,  many controller applications simply using packet-in message to create the fow, but drop the packet-in message.  This is not good at all.  Think of SYN packet getting lost.  It would take some time for client to retransmit the TCP SYN packet and that delay results into very bad user experience.

Also, in few cases,  applications (CP protocols such as OSPF, BGP)  generate packets that needs to be sent out on the data network.  Controllers typically sit in management network, not on the data network.  Hence these packets are to be sent to the data network via OF switches.  This can only be achieved by sending these packets as packet-out messages to the switch.  Yet times, these packets need to be exposed to only part of the table pipeline.  In these cases, it would be required to have ability for controllers to start the packet processing in the switch for these packets from a specific table given by controller.


Freescale Extension :

struct ofp_action_fsl_experimenter_goto_table {
      uint16_t type; /* OFPAT_EXPERIMENTER. */
      uint16_t len; /* Length is a multiple of 8. */
      uint32_t experimenter; /** FSL Experimenter ID **/
      uint16_t fsl_action_type; /** OFPAT_FSL_GOTO_TABLE **/
      uint16_t table_id;

OFP_ASSERT(sizeof(struct ofp_action_fsl_experimenter_goto_table == 8);

table_id :  When switch encounters this action,  switch starts processing the packet from the table specified by 'table_id'.

Though this action is normally present in the action_list of e packet-out messages, this action can be used in flow_mod actions too.  When there is GOTO_TABLE instruction and GOTO_TABLE action,  then the GOTO_TABLE action takes precedence and GOTO_TABLE instruction is ignored.

Notice that 'table_id' size is 2 bytes.  This was intentional.  We believe that OF specifications in future would need to support more than 256 tables.

Saturday, December 29, 2012

L2 Network Virtualization & Is there a role for Openflow controllers?




Current Method of Network Virtualization

IaaS (Cloud Service Providers) providers do provide network isolation among their tenants. Even Enterprise private cloud operators are increasingly expected to provide network isolation among tenants - tenants being departments,  divisions,  test networks, lab networks etc..  This allows tenants to have their own IP addressing space and possibly overlapping with other tenants' address space.

Currently VLANs are used by network operators to create tenant specific networks.  Some of the issues related to VLAN are:
  • VLAN IDs are limited to 4K.  If tenants require 4 networks each on average,   only 1K customers can be satisfied on a physical network.  Network operators are forced to create additional physical networks when more tenants sign up.
  • Performance bottlenecks associated with the VLANs :  Even though many physical switches support 4K VLANs,  many physical switches don't provide line rate performance when the number of VLAN IDs go beyond certain limit ( some switches don't work well beyond 256 VLANs)
  • VLAN based networks have operational headaches -  VLAN based network isolation requires  all L2 switches are configured when a new VLAN is created or an existing VLAN is deleted.  Though many L2 switch vendors provide central console to work with their brand L2 switches,  it is operational  difficulty when switches from multiple vendors are present.
  • Loop convergence time is very high
  • Extending VLANs across data center sites or extending VLANs to customer premise has operational issues with respect to interoperable protocols,  Out-of-band understanding among network operators is required to avoid VLAN ID collisions.
To avoid issues associated with  capabilities  of  L2 switches,  networks having switches from multiple vendors  and limitations associated with VLANs,  increasingly overlays are used to virtualize physical networks to create multiple logical networks.

Overlay based Network Virtualization

Any L2 network requires the preservation of L2 packet from source to destination.  Any broadcast packet should go to all network nodes attached to the L2 network.  All multicast packets should go to network nodes that are willing to receive multicast packets of groups of their choice.

Overlay based network virtualization provides above functionality by overlaying the Ethernet packets using outer IP packet - Essentially tunneling Ethernet packetsfrom one place to another.

VxLAN, NVGRE are two of the most popular overlay protocols that are being standardized.  Please see my blog post on VxLAN here.

VxLAN provides 24 bits of VNI (Virtual Network Identifier).  In theory around 16M virtual networks can be created.  Assuming that each tenant may have 4 networks on average,  in theory, 4M tenants can be supported by CSP using one physical network.  That is, there is no bottleneck with respect to identifier space. 


Openstack is one of the popular open source cloud orchestration tools.  It is becoming formidable alternative to VMWare vCenter and VCD.   Many operators are using Openstack and  KVM hypervisor as a secondary source of cloud virtualization in their networks.  Reliability of Openstack came long way and many vendors are providing support for a fee. Due to these changes,  adoption of openstack+KVM is going up as a primary source of virtualization. Openstack mainly has four components -  'Nova' for VM management across multiple physical servers,  'Cinder' for storage management,  'Quantum' for network topology management and 'Horizon' to provide front end user experience to operators (administrators and tenants).

Quantum consists of set of plugins -  Core plugin and multiple extension plugins.  Quantum defines API for plugins and let various vendors to create backend for the plugins.  Quantum core plugin API defines the management API of virtual networks - Virtual networks can be created using VLAN,  GRE and being upgraded to support VxLAN too.
Quantum allows operators to create virtual networks.  As part of VM provisioning,  Openstack NOVA provides facility for operators to choose the virtual networks on which this VM needs to be placed on. 
When 'Nova scheduler' chooses a physical server to place the VM, it asks the quantum to provide MAC address, IP address and other information to be assigned to VM using 'create_port'  API.  Nova asks quantum as many times as number of virtual networks that VM belongs to.   Quanutm provides required information to NOVA back.  As part of this call,  Quantum comes to know about the physical server and the virtual networks that needs to be extended to the physical server.  It, then, informs the quantum agent (that sits in host Linux of each physical server) the virtual networks it needs to create.   Agent on the physical server gets more information on virtual networks from quantum and then create the needed resources.   Agent uses OVS (Open Virtual Switch ) package that is present in each physical server to do the job.  Please see some description of OVS below.   Quantum agent in each physical server creates two openflow bridges - integration bridge (br-int) and tunnel  bridge (br-tun).  Agent also connects south side of br-int to north side of br-tun  using loopback port pair.   Virtual network port creation and association to br-tun is done by quantum agent for every new virtual network or when the virtual network is deleted.  North side of br-int towards VMs is handed by libvirtd and associated drivers as part of VM management. See below.

Nova talks to 'nova-compute' package in the physical server to bring up/down VMs.  'Nova-compute' in the physical server uses 'libvirtd' package to bring up VMs,  create ports and associate them with openflow switches using OVS package.  Brief description of some of the work, libvirtd does with the help of OVS driver are:
  • Creates a Linux bridge for each port that is associated with the VM.
  • North side of this bridge is associated with VM Ethernet port (using tun/tap technology).
  • Configures ebtables to provide isolation among the VMs.
  • South side of this bridge is associated with Openflow integration bridge (br-int).  This is achieved by creating loopback port pair with one port attached to Linux bridge and another port associated with the Openflow switch, br-int.

Openvswitch (OVS)

OVS is openflow based switch implementation.  It is now part of Linux distribution.  Traditionally Linux bridges are used to provide virtual network functionality in KVM based host Linux.  In Linux 3.x kernels,  OVS has taken that responsibility and Linux bridge is used for the purposes of enabling 'ebtables'.
OVS provides set of utilities :  ovs-vsctl and ovs-ofctl.   "ovs-vsctl" utility is used by OVS quantum agent in physical servers to create openflow datapath entities (br-int, br-tun),   initialize Openflow tables and add both north and south bound ports to the br-int and br-tun.   "ovs-ofctl" is command line to create openflow flow entries in the openflow tables of br-int and br-tun.  It is used by OVS quantum agent to create default flow entries to enable typical L2 switching (802.1D) functionality.  Since OVS is openflow based,  external openflow controllers can manipulate the traffic forwarding by creating flows in br-int and br-tun.  Note that external controllers are required only to add 'redirect' functionality AND virtual switching functionality can be achieved even without external openflow controller. 

Just to outline various components in the physical server:
  • OVS  package - Creates Openflow switches,  Openflow ports and associate them to various switches and ofcourse provides ability for external controllers to control the traffic to/from VMs to external physical networks.
  • Quantum OVS Agent :  Communicates with the Quantum plugin in Openstack tool to get to know the virtual networks and configure OVS to realize those networks in the physical server.
  • OVS Driver in Libvirtd :  Allows connecting VMs to virtual networks and configures 'ebtables' to provide isolation among VMs.

Current VLAN based Network Virtualization solution

Openstack and OVS together can create VLAN based networks.  L2 switching is happening with no external openflow controller.   OVS Quantum agent with the help of plugin knows the VMs, their vports and corresponding network ports. Using this information,  agent associates the VLAN ID to each vport connected to the VMs.  This information is used by OVS to know which VLAN to use when packets come from VMs.  Also,  agent creates one rule to do the LEARNING for packets coming in from the network. 

Overlay based Virtual Networks

Companies like Nicira and bigswitch networks are promoting overlay based virtual networks. OVS in each compute node (Edges of the physical network) is used as starting point of overlays.  All L2 and L3 switches connecting compute nodes are only used for transporting the tunneled packets.  They don't need to participate in the virtual networks.   Since OVS in compute nodes is encapsulating and decapsulating inner ethernet packets into/from another IP packet,  in-between switches transfer the packets using outer IP header addresses and outer MAC headers.  Essentially,  overlay tunnels start and end at compute nodes.  With this,  network operators can configure the switches in L3 mode instead of problematic L2 mode.  May be, in future, one might not see any L2 switches in the data center networks.

Typical packet flow would be something like this:

- A VM sends a packet and it lands on the OVS in the host Linux.
- OVS applies actions based on the matching flows in br-int and packet is sent to br-tun.
- OVS applies actions based on the matching flows in br-tun and packet is sent out on the port (overlay port)
- OVS sends the packet to overlay protocol layer.
- Overlay protocol layer encapsulates the packet and sends out the packet.

In reverse direction,  packet flow would look like this:

- Overlay protocol layer gets hold of the incoming packet.
- Decapsulates the packets and exposes the packet with right port to the OVS br-tun.
- After applying any actions on the packet using matching OF flows in br-tun,  packet is sent to br-int.
- OVS applies the actions on the matching flows and figures out the destination port (one-to-one mapping with VM port)
- OVS sends the inner packet to the VM.

Note that:

- Inner packet is only seen by OVS and VM.
- Physical switches only see encapsulated packet.

VxLAN based Overlay networks using Openstack

OVS and VxLAN:

There are many open source implementations of VxLAN in OVS and integration of this with openstack.   Some details about one VxLAN implementation in OVS.

  • Creates as many  vports in OVS as number of VxLAN networks in the compute node.  Note that,  even though  there could be large number of VxLAN based overlay networks,  only networks to which local VMs belong are created in OVS as vports.  For example,  If there are VMs corresponding to two overlay networks,  then two vports are created. 
  • VxLAN implementation depends on VTEP entries to find out the remote tunnel endpoint address for a destination MAC address of the packet received from the VMs.  IP address is used as DIP of the outer IP header.
  • If there is no matching VTEP entries,  Multicast learning happens as per VxLAN.
  • VTEP entries can be created manually too.   A separate command line utility is provided to create VTEP entries to vports.  
  • Since Openstack has knowledge of VMs and physical servers they are hosting,  Openstack with the help of quantum agent in each compute node can create VTEP entries pro-actively.

Commercial Products

Openstack and OVS provide fantastic facilities to manage virtual networks using VLAN and overlay protocols.  Some commercial products seem to be doing following:
  • Provide their own Quantum Plugin in the openstack.
  • This plugin communicates with their central controller (OFCP/OVSDB and Openflow controllers).
  • Central controller is used to communicate with OVS in physical servers to manage virtual networks and flows.
Essentially,  these commercial products are adding one more controller layer between Quantum in openstack and physical servers.

My views:

In my view it is not necessary.  Openstack, OVS,  OVS plugin, OVS agent, and  OVS libvirtd driver are becoming mature and there is no need for one more layer of abstraction.  It is a matter of time where these open source components would be feature rich, reliable and supported by vendors such as redhat.   With OVS being part of Linux distributions and with ubuntu providing all of above components,  operators are better of sticking with these components instead of going for proprietary software.

Since OVS is openflow based,  there could be Openflow controller to add value with respect to traffic steeering and traffic flow redirections.  It should provide value, but one should make sure that default configuration is good enough to realize virtual networks without need for openflow controller.

In summary, I believe that Openflow controllers are not required to manage virtual networks in physical servers, but are required to add value added services such as traffic steering,  traffic visualization etc..