Tuesday, November 29, 2011

Software Defined Networking Trend - Winners

It is not given that the SDN is going to be successful, though there are very high chances of it due to push from Network operators.  Unlike earlier efforts, some vendors have resigned to the fact that SDN will become successful and have/are-having products ready in the market.

Let us see who would be the winners if SDN picks up.

Certainly Network Operators/Service Providers/Data Center Operators are clear winners :
  • Since Controller software is expected to provide flexibility for operators to add-on their modules on top of networking operating system and customize it,  it allows them 
    • Gets the overall view of network.
      • In case of L2 network,  all devices put together can be viewed as one mega switch.  Addition and Removal of L2 devices can be controlled from central location without any traffic disruption as flows can be programmable from central location.
      • Allows virtualization of the network on logical basis (Example: Tenant).
    • Facilities to partition the network with fine granular flow/traffic orchestration.
    • New innovations on top of Network Operating system on Controllers.
    • Simplify the user interface of their central management systems with more and better control.
      • Hierarchical Network visualization and Control.
  • Reduce Protocol interoperability testing overhead dramatically as SDN would eliminate many protocols that are normally run among devices under one administration control.  Due to this, it also reduces the protocol message overhead on the networks.
  • Network Operators have choice of Control Plane software from one single vendor and using network devices from other vendors.  
  • Network Operators can procure only the relevant control plane protocols as per their network requirements.
Software Vendors are also winners here:   At this time, due to self-contained nature (purpose-built) of Network devices,  software vendors are not able to enter into this market as it requires building the hardware too.   With the separation of Dataplane (at device level) and Software (Control Plane), it provides opportunities for software vendors.  It is an opportunity for Network Management Software vendors to expand their offering beyond Management to include Control Plane functionality.   Though I see new vendors here such as Nicira and Big switch networks,  there is a big possibility that successful companies in this area would be acquired by biggies.  I see that VMWare,  Microsoft are some of few companies that may be entering into this market.

It is logical step for VMWare in the sense that they already have very good Management Software to control virtual machines and Virtual Switches within the servers.  Extending this to cover physical switches is logical step.

Merchant L2/L3 Switch Silicon vendors are winners too: Due to open standards (For example: Openflow),  the value of proprietary silicon chips (by network device vendors today) diminishes. Merchant silicon vendors would provide all the features with lesser cost due to wider customer base.  It not only helps Merchant L2/L3 switch silicon vendors, but also Network operators due to reduces cost of goods.

Few of them or all of Broadcom,  Marvell and Intel (Fulcrum) would be the winners if they start adopting SDN standards more aggressively.

Network Device Vendors :  (Some will win and some may lose)   
  • With Openflow based SDN,  much of the Control plane software is no longer going to be part of the devices as it moves to Controllers. Hence barrier to entry into device market becomes lesser and hence one would see more network device vendors getting into this market. One should not be surprised to see SMB device vendors such as Netgear getting into high end market.
It could be challenging for existing L2/L3 switch vendors in short term. But I believe these vendors would be in a good position to provide Controller software due to existing and proven software.  I would expect that these vendors either would provide entire controller software in a packaged software form or even as an appliance.  Or some vendors might provide control plane protocols as a virtual appliance.

SaaS Providers are also winners:  Though it is going to take a while,  I believe when SDN percolates to Enterprises,  Enterprise Networking Administrator departments may like to go for Cloud based Controller solution.  Since performance and capacity is one important aspect,  Cloud providers might offload some controller functionality to a special appliance which would be deployed in Enterprise networks.  Even though it is deployed in Enterprise networks,  there is no expectation to control this device directly by Enterprise admins.  Admins are still would go to cloud to manage their network. Cloud Servers will internally communicate with the purpose built  Slave controller appliances (deployed in Enterprise networks) to do laborious jobs.

Embedded Processor Vendors ( Can keep their value, hence Neutral ):   Currently Embedded processors are used to run entire control plane software.  One might think that not much processing power is required anymore due to movement of control plane software to controller.  On the contrary,  there may be more performance requirement on the Embedded processor for following reasons:
  • Need for SSL/TLS TCP connections to talk to Controllers.
  • They might need to provide shadow tables to tables in Silicon devices.
  • In short term,  Merchant L2/L2 switch silicon vendors might not support Openflow in thier chipsets. Rather they may support Cache tables and expect Embedded processors to do the job of creating the cache entries based on the tables populated by Controllers.  The traffic to Embedded processors may not be insignificant.  Even considering 1% of traffic to embedded processor from L2/L3 silicon, it could lead to 1Gbps in 100Gbps switch.  That is not insignificant.
  • Some basic services are expected to be supported in Embedded processors such as
    • Netflow exporting.
    • BFD (Bidirectional Flow Detection).
    • Proxy ARP 
  • As I mentioned in my previous post,  some services such as IDS/IPS or Intelligent Classification (Application Detection) might be provisioned from the Controller by uploading virtual appliance onto the Embedded processor.  It should have enough processing power to handle this kind of processing.  It may even require JVM which means more processing power.

Sunday, November 27, 2011

My views on SDN (Software Defined Networking) phases

In my last post, I have talked about the need for SDNs.  I wanted to give my views on how SDN is going to play out in the market place.

Phase 1 -  Openflow based Data path implementations (Hybrid implementation)

Device Side:  Many L2 and L3 switch  vendors are providing Openflow1.0  based implementation in their switches.  Almost all these devices continue to support existing L2/L3 switching functionality with local control plane software, with additional configuration option to enable Openflow.  That is, openflow is being added by the network device vendors as an option to the existing firmware.   I also think that the openflow is only extended to L2 and L3 switch devices for now.  It will be a while before market sees Openflow based Network Service Devices.

Controller Side:   There are few companies providing Openflow protocol layer  in Java, Python and other languages.  This is lowest layer of  software required on controllers and this communicates with Openflow based network devices.  Big switch networks (www.bigswitchnetworks.com) and Nicira (www.nicira.com) are two of the companies I am aware of providing this layer at this time.  They also plan to implement applications on top of this layer,  but I guess that is phase2 of the SDNs.   Some call this layer Network Operating system. .  Beacon based Java framework for Openflow is one good library I have found.  Please see this link to understand more about BEACON. 

Due to limited controller applications, phase1 of the SDN is limited to research community and big network operators who are willing to develop their own controller applications.

Phase 2 :  Openflow 1.1 based devices and Virtual instances of Control Plane software

Device Side:  I expect that second phase of SDN would have Openflow 1.1 implemented by many network device vendors in their L2 and L3 switch devices.   It is also very likely that market will see Openflow-only based network devices.   In phase2,  I expect that SSL/TLS openflow security  would be a common feature.   Some of the features that can be expected to see from device vendors implementing Openflow 1.1 are:
  • Openflow 1.1 Multi-Table feature.
  • Openflow 1.1 action handling
    • Apply actions of each matching table entry to packets immediately
    • Collect actions across all tables and apply all actions to the packets before packet egress.
  • Openflow 1.1 Group actions
    • To allow multiple actions on a packet.
    • To setup common set of actions for multiple table entries. 
  • Flexibility of Matching Criteria on Match fields
    • Values in ranges
    • IP addresses in subnets
    • 'NOT' Operator 
    • Multiple ranges/Subnets for a given match field.
  • SSL/TLS Support

Controller Side: Virtual instances of L2 switch and L3 switch control plane protocol stacks would be the next logical step on top of Openflow protocol library.   In this phase,  one expectation is that there would be one virtual instance for every device it controls.   Some of the features that would be seen from controllers in phase2 are:
  • Multiple Virtual instances of L2 and L3 Control Plane protocol stacks and Management of flows by these virtual instances on corresponding devices.  Due to large number of virtual instance requirement,  thin virtualization systems would be used such as Linux containers or Linux processes with Network Name spaces.
  • SSL/TLS Support.
  • Centralized Management application on top of Virtual instances to provide one portal to the controller system. Even though it is single portal,  every virtual instance would need to be configured independently.

Phase 3 (Auto Discovery of Network Devices and True Centralization AND Support beyond L2/L3 Switch Devices:

There will two types of protocols that may be required. One is auto-discovery of controllers by devices and second is topology discovery of devices by controllers.

Multiple features would be provided by Controllers and Devices. Some of them are listed below.
  • Auto discovery of Controllers  : Devices will discover the controllers and gather information on how to reach them.
  • Topology discovery of devices :  Devices will provide information to controllers for controller to know following information.
    • Device specific information 
      • Make (vendor and Model) of device.
      • Type of functionality supported (L2, L3 switch etc..)
      • Capabilities of the device such as Number of ports , Number of tables supported,  Size of each table, Size of all tables put together, Type of actions,  Capabilities of Group table,  Number of flows,  SSL/TLS support,  QoS capabilities (Number of queues, schedulers, algorithms for shaping, scheduling and queueing).
    • Connectivity information- This information is required if controllers are expected to program the device with flows via in-band network.  I believe that in future,  network operators may not like to create a controller network for controller/device communication.  Controllers and network devices are expected to communicate without any special network via same network as data traffic.  To enable this communication,  all intermediate devices between the device that is being programmed and controller need to allow controller channel flows.  Connectivity information is required to be known to the controller to create channel flows in appropriate devices. 
  • NAT Support:  Network Address Translation as required by ADCs,  Load Balancers etc..  There may be Openflow 1.2 or later versions supporting NAT action.
  • IPSec Support:  I believe future versions of Openflow specification may add IPsec action on the tables. 
  • Multi Controller Support:  A given device may be controlled by more than one controller.  Resource division across controllers is one feature that can be expected in future. 
On Controller side:
  • Multi-Tenant Support:  Virtualization of network for each tenant would be supported in future. Data Center provider no longer need to worry about creating several physical networks for multiple tenants when this feature is supported by Controllers.
  • Avoiding running control plane protocols across devices that are managed by the same Controller.
  • Topology specific configuration rather than the device specific configuration.
  • I also expect more programmability options in controllers for  Network operators to customize or add new features on controllers.

Phase 4 - Extending Programmabililty to Network Devices to support Service Planes

As discussed in the previous post, service plane functionality is different from the control plane. Service plane functions  normally process certain amount of data before they handover the flows to Data Plane (Fast path).  Though the amount of traffic expected to be processed is subset of entire traffic, this could be significant.  Due to this, some service plane software may not be implemented in the controller for reasons such as performance and reduce the amount of traffic between devices and controllers.

In addition, there are some services such as IPFIX/NetFlow which are better implemented along with Data Plane.

If these services are expected to be implemented by device vendors, then it beats the SDN purpose. SDN is expected to give flexibility for network operators to develop service plane functionality or customize the service planes or purchase service planes from different vendor than the network device vendor.

I believe phase4 of SDN involves not only controller side flexibility for network operators, but also extending this programmability on the processors in the devices.

How it may work:
  • Controller Software vendor provides the binary image for several services.
  • Controller, based on services provisioned, pushes the relevant binary images to network devices.
Challenges of Controller Provider:
  • Network devices have different types of processors - Power Cores, MIPS,  ARM, x86 etc..  Creation of binary images for multiple CPU architectures for each service could become cumbersome.
  • Many processors in recent past are SoCs (System-On-Chip), which contains different types of accelerators.  Even though multiple SoCs might use same cores,  programming will be different across different SoCs due to different types of peripherals and acceleration engines.  That adds to the complexity and increases the number of binary images required.
It is possible that Network Operators might choose a particular processor and mandate the processor from all network device vendors.  Although this is practical for big network operators, but it is not practical option for many network operators.

I believe there would be standardization on the software architecture in Network devices too.  One possible standardization could be a requirement to support Java Virtual Machine in processors used in Network devices.  There could be more abstractions on top of JVM by usage of middleware packages such as "Spring framework" or "OSGi' etc..   This standardization allows software vendors to develop the software in Java using one of these frameworks.  Since they are java based,  these applications need not worry about CPU architecture.

Java SDK also provide several providers for different accelerations - Crypto Provider (JCP) is one example.  As long as operating system in device processors provide these providers based on their acceleration devices,  then one Java based image would work fine.

Software Defined Networking (SDN)

Before going into the details of SDN,  it is required to revisit the current networks and different network devices within the network.   Then this post talks about the problems associated with network devices as seen by service providers and network operators.  I try to describe how SDNs are expected to solve the issues faced by network operators.  

Current Networks

Current Networks consists of multiple types of network elements - Layer 2 Switches, Layer 3 Switches/routers and Network Service elements such as Network Security Devices,   Application Delivery Controllers, WAN Optimization Devices and many more.

L2 Switch devices:

L2 switch devices connect multiple computers in L2 network.  L2 switches have large concentration of ports with ports connected to servers, laptops, access points,  printers etc..   Each L2 switch device typically has anywhere between 8 to 64 ports.  Since there could be more than 64 computers  in one L2 domain,  multiple L2 switches are used with L2 switches inter-connected. Main functionality of L2 switch is to allow the connectivity among the computers.  L2 switches are rated based on how many Ethernet ports they support,  speed of those ports (1G,  10G etc..), bandwidth of the switch (rate at which the device can switch traffic),  latency  introduced on packets by the switch and features applied on the switches packets.

Though switching the traffic is  main functionality of the L2 switch,  Embedded processor  in the L2 switch device runs some control plane protocols.  Spanning Tree Protocol and their variations such as Rapid STP and Multiple STP are used by L2 switch devices to find the loops and avoid loops by disabling some ports. Multiple MAC Registration Protocol (MMRP) on top of MRP (Multiple Registration Protocol) is used to propagate the Multicast memberships across multiple switch devices, thereby avoiding Multicast packet flooding on the ports. Multiple VLAN Registration Protocol (MVRP) is used across the switch devices to know the VLAN versus ports relationship to create the logical LANs across multiple switches.  There are lot of other control plane protocols used for interoperability among the switch devices.

As discussed above, L2 switch devices mainly contain two different functions - Data plane functionality and Control plane functionality.  Data Plane functionality is mainly used to switch the traffic and Control plane functionality ensures that right context is established in the Data plane either based on the configuration or based on the result of  protocol state with peer switch devices.

Let us visit some of the critical "Data Plane" functionality of the switches.

  • Packet switching:  Packets received on one port is switched into other port based on the destination MAC address.  It does this by referring  "MAC address versus port" table  maintained by the switch.  This table is called "Learning table".  If there is no matching entry in the table,  it forwards the packet to all ports except the port on which original packet came in.
  • Population of Learning Table:  Every time packet is received by the switch from any ports, it extracts the source MAC address and populates the table with this source MAC address and port.  Basically Switch assumes that the host with MAC address as source MAC address of the packet is reachable from this ingress port.  
  • Shaping and scheduling on the ports in egress direction:  L2 switch devices provide support to shape the traffic if the connecting computer/Network devices does not accept certain rate.  When there is higher packet load than the shaping bandwidth,   L2 switches provide queuing and scheduling functionality with different algorithms such as priority based scheduling,  Deficit Round Robin algorithms, strict priority scheduling and combination of multiple algorithms.
  • Access Control Functionality:  Switch devices also provide access control functionality to filter the packets out,  to apply different actions based on the type of traffic.
Layer 3 (L3) switch devices/Routers:

Routers or L3 switch devices are used to separate out different L2 domains.  Layer 3 switching is typically based on the IPv4/Ipv6 addresses. Yet times, it could also be based on the other fields of Ipv4/IPv6 headers such as DSCP value.

Like in L2 switch devices, L3 switch devices also have Data Plane and Control Plane functionality.  Data Plane functionality forwards the packets based on the routing database.  Control plane functionality manages the routing database.  Control plane protocols such as RIP,  OSPF, BGP for unicast and PIM-SM, MLDv2/IGMP protocols for Multicast are used to populate the unicast and multicast routing databases respectively.   Control plane protocols in a device work with other L3 switch devices to figure out the best routes to reach the destinations.

Network Service Devices:

Network Service Devices also have two higher level functions - Control Plane and Data Plane.  Typically in network services,  people tend to use the term "Service Plane" instead of  "Control Plane".   Similarly,  the term "Fast Path" is used instead of "Data Plane".  Unlike L2/L3 switch devices,  data traffic is processed by both Service Plane and Fast Path.   Service Plane after processing certain amount of data in a given flow, typically decides to offload rest of processing into the "Fast Path".  Then onwards, any traffic on the offloaded flows are handed by "Fast Path".   I am not going into the details of when the Service Plane traffic decides to offload the flows to Fast Path as that is a different subject by itself.   Just to give an example,  ADCs might process the HTTP packets until it processes HTTP Request Headers and then offload rest of the connection to Fast Path.

Gist of Network Device functionality:

In essence,  almost all network devices have Control plane and Data Plane functionality.   Most of the intelligence resides in the Control plane.  Data Plane functionality,  though does not much intelligence, but the processing is expected to happen at very high speed.  Control plane creates the flow contexts in the Data plane as part of its processing and Data plane uses these flow contexts to act on the traffic.

Packaging of Network Devices today:

Cisco,  Juniper, HP, Brocade,  Dell are some of the L2, L3 switch device and Network service device vendors.  Network operators in Enterprises,  Service providers and Data Centers are customers of these devices.

Vendors provide self-contained devices with easy-to-use configuration mechanisms.  These devices come with both Control and Data Plane functionality.   Separation of Control and Data Plane and interfaces between them are proprietary to each vendor architecture.

In summary, vendors provide equipment  to network operators and operators configure these devices using Command Line interface,  HTTP based GUI or Centralized Management Systems to meet their deployments.

Network Operator Challenges:

I heard following challenges by network operators (Information based on conferences and reports)

  • Addition of new control plane protocols:  One operator indicated that they wanted to introduce new routing protocol suitable for their deployment.  The operator felt that the existing routing protocols such as IS-IS,  OSPF,  RIP are inadequate or overly complex.   Due to the nature of the network devices today,  they are not programmable.  Operators have following choices - Pay network device vendors to implement this new protocol,    Standardize the protocol through standard bodies and hope that vendors would implement the protocol or create the network devices themselves.   All of them are costly or time consuming.  It appears that one vendor asked for millions of dollars to implement the protocol  and maintain it for the life of product.  It is prohibitive for the operators to go in this direction due to the cost associated with it.  Going through standard bodies takes few years minimum and not an option for the operator due to urgent nature of the request.  Developing own device by the operator is also costly as it requires huge number Engineering resources.  
  • Cost of interoperability is very high:  One more operator indicated that these network devices have so many protocols implemented,  ensuring that there is interoperability maintained is a huge task.  It appears that operator network has thousands of network devices from different vendors. Whenever there is a new device purchased or new image upgrade on some devices,  this operator spends few man years of effort to ensure that the new device or new image continue to inter-operate with existing devices from other vendors for all protocol supported.  With the increasing number of control plane protocols,  the cost of interoperability is also going up.  This operator indicated that there are 100s of protocols for which they need to ensure interoperability with every new purchase of network device or new image upgrade.  I heard the number of few million dollars being spent every year on this by this operator.
  • Reduction of Control plane and discovery protocols traffic on the wire:  A Data Center operator indicated that there are large number of discovery protocol traffic observed in the network.  It appears that large number of ARP packets seen a network consisting of hundreds of thousands of virtual servers.  As I understand around 10-20% of CPU cycles are used  in processing the ARP requests by each virtual server.  As it is known, ARP packets are sent with broadcast MAC address. Due to this, every network device and computers in the L2 network will receive all the ARP requests.  ARP requests that belong to local IP address are responded back and all other ARP request packets are ignored by the devices/computers.   Processing cycles used to determine the ARP requests to act on are not insignificant.   As indicated above, around 10-20% of the cycles are used up in doing above operation.  This operator indicated that their Network repository system has all the devices and associated MAC addresses. Operator wanted to use this information to populate the relevant network devices facing the appropriate computers and let them respond to the ARP requests without propagating the ARP requests to other Network devices, thereby reducing the ARP packets on the network.   Though many devices supported the Proxy ARP functionality,  but the scale at which it was required was the issue faced by this operator. That is, the operators wanted at least 64K proxy ARP records to be supported by each network device, but operator did not find many devices supporting more than 2K proxy ARP records.
What do Network Operators want?

Network Operators would like to have flexibility beyond the configuration provided by network devices vendors.  Operators do understand the value of "Data Plane" to support Multi gigabit throughput.   Network operators believe that the need for programmability in Control plane part of the software.

As indicated before,  at this time,  network devices have both Control Plane and Data Plane functionality with proprietary interface between the planes.  To allow customization of control plane, addition of new control plane protocols, Network operators would like to have standardized interface between Control Plane and Data Plane. 

Network device vendors use different types of processors for running control plane.  To customize control plane software,  it may be required by network operators to develop and test the control plane on multiple processors. Operators would like to develop control plane only once for all network devices and that too operators would like to develop control plane applications in higher level languages such as Java due to large pool of Java developers.  It allows network operators to utilize existing Engineering pool they have.

Is SDN  the answser?

SDN is expected to solve network operator challenges. SDN enables network operators to develop or customize control plane applications themselves.  Open flow protocol is one of the first steps of SDN.

  • Open flow protocol definition:  Open flow protocol version 1.1 is defined to address the desires of network operators
    • Separation of Control Plane with Data Plane.  
    • Implementation of Control plane at central place serving multiple devices on high end processors.
    • Facility to develop control plane applications in higher level language such as Java and python.
How SDN is expected to address some of the challenges of network operators?
  • With control plane is separated from the Data plane,  control plane software vendor can be different from the network device vendors implementing Data plane.
  • Since there is one vendor for control plane software in a given deployment,  interoperability issues would get reduced dramatically.  
  • If one control plane computer (Controller) is managing multiple network devices, then there is no need to run control plane protocols among them as  control plane instances of each network device is within the same controller and they could have their own proprietary mechanisms to figure out the results without running protocol.   This will reduce amount of the protocol traffic in a network managed by one controller.  Control plane protocols are required to run only in cases where devices are managed by different controllers.
  • Since there are few controllers in a given deployment,  network operators can afford to run the controller software on very high end systems such as Multicore processors.  This enables implementation of controller including control plane protocols in high level language such as Python and Java.   
  • SDN is also trying to standardize different layers in the controller, similar to Server side programming.  It allows even network operators to customize and add newer control plane software without depending on the controller software vendors.
  • Openflow protocol standardized as part of SDN allows creation of flows in the network devices (Data Plane),  it allows granular control of the traffic going through the networks. It is possible to create logical networks without increasing physical network devices.  For example,  research networks or multi-tenant networks can be created on production networks with higher confidence that network continues to run.