Random technical bits and thoughts: DPDK based Accelerated vSwitch Performance analysis

DPDK based OVS acceleration improves the performance many folds up to 17x times over native OVS. Please see the link here : https://download.01.org/packet-processing/ONPS1.5/Intel_ONP_Server_Release_1.5_Performance_Test_Report_Rev1.2.pdf

It is interesting to know following :

Performance impact with increasing number of flows.
Performance impact when additional functions introduced.
Performance impact when VMs are involved.

Performance report in 01.org has measured values with many variations. To simplify the analysis, I am taking measured values with one core and hyper-threading disabled. Also, I am taking numbers with 64 byte packets as PPS number is important for many networking workloads.

Performance impact with increasing number of flows:

Scenario : Phy-OVS-Phy (Packets received from Ethernet ports are processed by the host Linux OVS and then sent out. There is no VM involvement. Following snippet is taken from Table 7-12.

One can observe from above table is that performance numbers are going down with increasing number of flows, rather, dramatically. Performance went down by 35.8% when the flows are increased from 4 to 4K. Performance went down by 45% when the flows are increased from 4 to 8K.
Cloud and telco deployments typically can have large number of flows due to SFC and per-port filtering functionality. It is quite possible that one might have 128K number of flows. One thing to note is that the performance degradation is not very high from 4K to 8K. That is understandable that cache thrashing effect is constant after some number of simultaneous flows. I am guessing that performance degradation with 128K flows would be around 50%.

Performance impact when additional functions introduced

Many deployments are going to have not only VLAN based switching, but also other functions such as VxLAN, Secured VxLAN (VxLAN-o-Ipsec) and packet filtering (implemented using IPTables in case of KVM/QEMU based hypervisor). There are no performance numbers in the performance report with VxLAN-o-Ipsec and IPTables, but it has performance numbers with VxLAN based networking.

Following snippet was taken from Table 7-26. This tables shows PPS (packets per second) numbers when OVS DPDK is used with VxLAN. Also, these numbers are with one core and hyper-threading disabled.

Without VxLAN, performance numbers with 4 flows is 16,766,108 PPS. With VxLAN, PPS number dropped to 7.347,179, a 56% degradation.

There are no numbers in the report for 4K and 8K flows with VxLAN. That number would have helped to understand the performance degradation with both increasing number of flows & with additional functions.

Performance impact when VMs are involved

Table 7-12 of the performance report shows the PPS values with no VM involved. Following snippet shows the performance measured when the VM is involved. Packet flows through Phy-OVS-VM-OVS-Phy components. Note that VM is running on a separate core and hence VM-level packet processing should not impact the performance.

Table 7-17 shows the PPS value for 64 byte packets with 4 flows. PPS number is 4,796,202
PPS number with 4 flows without VM involvement is 16,766,108 (as shown in Table 7-12).
Performance degradation is around : 71%.

PPS value with 4K flows with VM involved (from Table 7- 21): 3,069,777
PPS value with 4K flows with no VM involved (from Table 7-12) : 10,781,154
Performance degradation is around : 71%.

Let us see the performance impact with the combination of VM involvement & increasing number of flows -

PPS number with 4 flows without VM involvement is 16,766,108 (as shown in Table 7-12).
PPS numbers with 4K flows with VM involvement is 3,069,777 (from Table 7-21)
Performance degradation is around : 81%.

Observations:

Performance degradation is steep when the packets are handed over to the VM. One big reason being that the packet is traversed through the DPDK accelerated OVS twice unlike Phy-OVS-Phy case where the OVS sees the packet only once. Second biggest possible culprit is the virtio-Ethernet (vhost-user) processing in the host.
Performance degradation is also steep when number of flows are increased. I guess cache thrashing is the culprit here.
Performance degradation when more functions are used.

Having said that, main thing to note is that, DPDK based OVS acceleration in all cases is showing almost 7 to 17x improvement over native OVS. That is very impressive indeed.

Comments?

Random technical bits and thoughts

Sunday, November 15, 2015

DPDK based Accelerated vSwitch Performance analysis

No comments:

About Me

Interesting Links