Hi Idan,
I'm glad you made some progress. Be sure you are aligning with NUMA nodes if you are using virtual machines with a large number of cores.
Also are you using hyperthreading? Do you have more VMs allocated to the host than available cores?
Look at your diagram the pCPU is only moving between CPU 19 and 31. How many cores do you have per socket? If the answer is twelve than maybe you are staying within a single processor?
Are you reserving memory? How is the memory paced on the motherboard? Is is distributed evenly per socket? Check your bios on the host and make sure power saving mode is off.