I'm working on an environment that is exhibiting a troubling issue. The cluster is made up of the following:
vSAN 6.6 (upgraded from 6.5)
5 hosts w/
12 x 4TB spindles
2 x 2TB NVMe PCIe cards
10Gb vSAN network
Standard vSAN storage policy with FTT=1 and very little reserved cache
Datacenter virtualization only
There are three significant (30TB+ each) workloads in this cluster along with various other typically sized workloads.
The cluster is working fantastic from a day to day performance perspective. However, when the workloads were initially populated, things became VERY un-balanced and a significant amount of congestion was introduced (though the end users did not notice an impact). CLOMD additionally had issues on one of the hosts which compounded the balance trouble. Much of the CLOMD issues have been resolved, and we ended up removing a host and allowing the resync to rebuild the redundancy (at the request of Support). The removed host was then re-added and allowed to sync. Proactive re-balance was initiated to start to get the added host balanced.
However, one of the other hosts remains unbalanced to the point of initiating Reactive Balancing. My first theory was that CLOMD was causing issues on that host too, but it's definitely running. It seems that the Reactive Balancing is extremely slow and unable to keep up with change on the disks (whereas Proactive seems to make headway). In fact, I see the Data To Be Moved seems to be increasing on those disks. It's becoming a bit of a concern as one of those disks only has 2% free space now and 8 of the 12 disks on the one host show capacity warnings.
So, here are my questions; is there a way to prioritize the rebalancing (particularly the reactive version)? Does anyone have suggestions on freeing up space on that one, unbalanced host? It seems as though when a disk is reactive balancing, it does not participate in proactive rebalancing.
Any suggestions would be appreciated. Thanks all!