Thursday, December 15, 2011

Embrane - Is this SDN Play?

Recently, I came across a company called Embrane while doing some google search on  SDN.  Then I saw a press release that Embrane made a product release announcement. I thought I would check this out and see how far it goes in SDN.  I had gone through the whitepaper published in Embrane website.  If you are interested,  you can find that paper here.

My understanding of Embrane solution:

When I first read the white paper, I was not sure about  Embrane product - Whether it is a platform/framework  to instantiate any type of network service virtual appliances from any vendor or whether the Embrane provides some network services as virtual appliances. By end of reading the whitepaper and after going through their website, it appears that Embrane's main focus is to deliver the framework for any virtual network service appliances including third party virtual appliances.

Embrane architecture mainly consists of four components.  Each component is installed as separate VMs.
  • Elastic Service Manage (ELM)r:  Typically, there would be one VM of this type. Data Center operator works on this VM to provision Distributed Virtual Appliances. 
  • Distributed Virtual Appliances (DVA):  Each DVA is logical set of VMs.  There are three kinds of VMs within each DVA.   Even though, there are multiple VMs within one logical DVA,  it can be treated as one appliance for all practical purposes.  As I understand,  Data Center operator need to instantiate as many DVAs as number of tenants.   If there are two types of network services is required, then there would be 2 DVAs.   So, if there are X number of tenants in a Data Center and each tenant requires Y network services (ADC,  Firewall,  Web Application Firewall, WAN Optimization etc..), then there would be a need for X * Y DVAs.    Now, coming to three kinds of VMs within each DVA.
    • Network Service Virtual Appliances (NSVA):  DVA can have multiple virtual appliances.  These appliances implement actual functionality of network service such as ADC,  Firewall, WOC etc.  Obviously, there must be atleast one network service VA in a DVA.  Multiple VAs can be instantiated by ELM for scaling performance (Scale-out). 
    • Data Plane Dispatcher (DPD):   There will be 1 DPD in each DVA.  DPD is the one which actually distributes the traffic across multiple NSVAs  for linear performance scaling.  
    • Data Plane Manager (DPM):   One DPM VM in each DVA.  DPM is expected configure the NSVAs and DPD in the DVA on behalf of ESM.  Though it is not clear, I am assuming that this will ensure that the configuration integrity is maintained across all NSVAs.   It appears that this is the only VM that requires persistent storage and hence I am guessing that it might be storing the audit and system logs generated by DPD and NSVAs in persistent memory.
If the network service is ADC (Application Delivery Controller),  then it can be viewed that DVA provides one more level of Load balancing.  That is, DPD acts as Load Balnacer to multiple ADCs.  As we know, ADC itself acts as a load balancer to servers.  This makes sense as ADCs have become complex in recent past and computations power requirements have gone up.  Hence, one more layer of load-balancing is indeed required.  In current Data Center deployments, this is achieved using L2 switches.  L2 switches have capability to balance the load across multiple external devices based on hash result of defined fields in L2/L3 and L4 headers.

I have detailed out how L2 switches can be used to distribute the traffic across multiple devices of a cluster.  Please check that out here.

My views:

It appears that DPD functionality is similar to what I described in my earlier post.
Since DPD is a software based distributor,  I expect that it will not have limitations of  L2 switch based load distribution. As we all know that many network services work with sessions (typically 5 tuple based - SIP, DIP, P, SP, DP) to store the state across the packets. Any load distributor is expected to take care of this by sending packets corresponding to a session to only one device in the cluster.   If this is not done, there would be lot of communication across the devices (Virtual appliances) within the cluster.  This may eliminate the benefit of multiple virtual appliances in the cluster.   In my view,  DPD should be distributing the sessions (not the packets blindly) across multiple Virtual appliances.  Since it is software based solution,  it can do one more step and ensure that all sessions corresponding to application sessions are sent to the same virtual appliance.  VOIP based on SIP is one example where there can be 3 UDP sessions corresponding to one application session.  DPD kind of devices need to ensure that the traffic corresponding to all three sessions in this example are sent to one device (Virtual Appliance).  Detection of 5-tuples of data connections is only possible if DPD supports ALGs (Application Level Gateways).  Since there could be more ALG requirements in future,  the challenge is to provide these ALGs on constant fashion by the "Load Distributor" vendor and/or open up DPP architecture for third party vendors to install their own ALGs, thereby maintaining SDN spirit. 

As a described in the same earlier post,  configuration synchronization among the network service devices (NSVAs) is one important aspect of cluster based systems.  I guess DPM is the one which is taking care of it in Embrane solution.

Overall this architecture is good and replicating the physical solution into cloud solution.  It is good for environments where Data Center operators don't allow physical appliances to be deployed by their customers.

It does not appear to be Openflow based. But it can be still considered as part of SDN as it allows third party network service virtual appliances in their framework.


Challenges I see in Embrane solution:

Embrane might be having following features already.   Since I did not find any information on this, I thought it is worthwhile mentioning. I believe that following features are required in DPD kind of load distributors.

As I described above,  classifying the packets across multiple application sessions and ensuring that all packets corresponding to an application session go to the same network service virtual appliance is one big challenge for these kinds of equipment.  I know this personally and it is quite challenging to support multiple ALGs, mainly ensuring interoperability with both clients and servers. 

Some network deployments might see traffic on tunnels such as IP-in-IP, GRE, GTP-U etc..  Traffic corresponding to many sessions is sent on very few tunnels.  To ensure the distribution across multiple NVSAs,   "Load Distributor" need to have flexibility to dig deep into the tunnels and classify the packets based on inner packets.

Performance, I believe, would be the biggest challenge in virtual machine based 'Load Distributor". Classifying the packets, session load distribution and sending subsequent packets of sessions to the selected virtual appliance requires maintenance of millions of  sessions in the "Load Distributors".  What I hear is that one CPU based Virtual machines using VMWare and XEN kind of hypervisors give typically 1Gbps of performance for small size packets.  More processors can be added to the "Load Distributor" virtual machine, but achieving performance in 10s of Gigs may be very challenging.

My 2 Cents:

I believe that the "Load Distributors" need to go beyond virtual machines. Taking advantage of Openflow based switches to forward the packets would be the solution of choice in my view.  Load Distributor virtual machine can do the ALG kind of functionality, selection of Network Service Device for new sessions and create appropriate flows in Openflow switches and leave Openflow switches  to forward subsequent packets  to network service devices/virtual-appliances.  That is, openflow switch can forward the packets to "Load Distributor" if there is no matching flow.   Hence the traffic to Load Distributor is small and one virtual machine would be able to process the load.  Since Data Center operators normally have switches (eventually openflow switches),  this mechanism just work fine in Cloud environments.


1 comment:

Unknown said...

This article describes the futuristic approach which all VM appliance providers need to adapt over the time.
I have not gone through Embrance product in detail but I feel by depending on VM to distribute the load there is a single point of failure. Once it crashes/goes out of service all service appliances are of no use. Having said so, Vmotion can help in such scenarios to migrate the appliances to appropriate ESX servers.
The article suggests openflow based L2 switch that might do the distribution in a better way. I am curious to know if it can provide fail-over in such a situation.

Thanks,
-Ravi.