Thanks for your response. The issue is not with physical switch. There is no broadcast traffic observed on physical switch. The unicast traffic which is generated, reaches the vswitch on ESXi host. But the vswitch is able to switch only up to certain extent. Following are example results.
(1) Using 64 Byte packets, with 4 vCPUs and 4 vNICs inside VM, Tx perf is 3.4MPPS and Rx perf is 870KPPS.
(2) Using 64 Byte packets, with 8 vCPUs and 4 vNICs inside VM, Tx perf is 4.9MPPS and Rx perf is 1.15MPPS
From my analysis, it looks like host vswitch is becoming the bottleneck. Kindly let me know if anybody has made similar observations.
Also, do anybody has performance results for DPDK based L2 forward, running in a VM that is using vmxnet3 PMD?
Thanks in advance.