So am currently rolling out new VCSAs (6.5u1) and new clusters running ESXi 6.5u1. Historically we have used autodeploy so netdump has been a must to retain info from a PSOD. Although we are now using SD Cards the centralised source for dump info provided by netdump is too good to lose.
The issue I am hitting is that netdump starts but immediately fails with a :
(via NMI in the iDRAC)
Starting network coredump from IP-HOST to IP-VCSA
Cannot continue NetDump from DF/NMI/MCE stack
(via vsish -e set /reliability/crashMe/Panic 1)
Starting network coredump from IP-HOST to IP-VCSA
Netdump: FAILED: Couldn't attach to dump server at IP-HOST IP-VCSA.
Stopping Netdump.
It then proceeds to coredump instead.
I have made sure the netdumper service is started (and set to auto), /storage/netdump partition increased from 1GB to 10GB, file size limit changed in /etc/sysconfig/netdumper and ESXi host is configured to use the vCenter and the correct VMK (in our case vmk2). I can test connectivity from the ESXi hosts' SSH and that passes, as well as being able to send UDP on 6500 to the target (nc -z -w 1 -s VMkernelIPAddress -u DumpCollectorIPAddress DumpCollectorPortNumber). I have even unloaded the firewall on the ESXi host and get the same result. Baffled and not sure where to go from here. /var/log/vmware/netdumper/netdumper.log also contains no information (except confirmation that the check command has compelted).
EDIT: Ok, so looking at the netdumper.log I can see the following entry when netdumper is started;
2017-11-23T22:49:12.060Z| netdumper| I125: Configured size limits: 5 GB per file, 10 GB per host, 20 GB for all
Looking at a coredump on a host these are some 7.4GB. Is the issue the fact that the single file is greater than 5GB? If so anyone know where this parameter is configured?
Message was edited by: Tim Alexander - new detail