Quantcast
Channel: VMware Communities: Message List
Viewing all articles
Browse latest Browse all 219257

ESXi 5.5u3 upgrade on HP Proliant ML350G9 with Windows 2012R2 Standard and SQL: unresponsive / freezing server

$
0
0

Hi all,

          I will warn you, this is a little long - also, maybe it would be useful to dump some info before asking for help and suggestions:

 

HARDWARE:

                         HP Proliant ML350G9 with 2GB Flash Cache RAID adapter - 24 GB RAM - 2 x SAS HDDs in RAID-1 - 1 six-core XEON CPU.

 

VIRTUALIZATION:

                          VMWARE ESXi 5.5u3 (HP image version)

 

GUESTS VM:

                         1 x Windows 7 64 bit (IMAP mail server) (40 users);

                          1 x Windows 2012 R2 Domain Controller and File Server (40 users, light Office documents);

                          1 x Windows 2012 R2 Member Server and SQL Server with accounting app (3 users);

 

All was very well as of a week ago.

 

I had ESXi 5.5 installed (no further (u)pdates, HP-supplied ESXi image) and the very same guest VMs that we have today.

 

Due to the infamous HP AMS 10.0.0 memory leak (see VMware KB: ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already a…) I decide to take the host down after hours and perform a full ESXi upgrade, not only that HP agent VIB, because, well, it looked like a good idea at the time.

 

Upgrade completes without errors on a Saturday morning, reboot shows all VMs restarting without so much as a hitch, I call it a day, pat myself on the back, hurry up to the steaming spaghetti dish my sweetheart had put on the table, the Sun is shining, there is hint of Spring in the air, life is good.

 

Comes Monday morning.

 

I had already decided to drop by, just to state that everything indeed was ok, have a look around, take a coffee from the vending machine, chat with the good looking young receptionist - you know, customer care.

 

Phone rings, 'can you come up here and have a look at this?' 'sure, I'm on my way!', I glance at the watch, 08.39, I'm totally in time for my next appointment (which I will actually cancel later), I can afford to spare some minutes to reconnect some lost link or confirm a random software upgrade that by chance must have been put online last weekend.

 

"not responding" translucent screen from the accounting app. On all three desktop PCs. Anything else is OK.

 

I will save you the quick and dirty first checks (cabling, switches, let's restart our workstations together because it's fun waiting a good 10 minutes looking out the windows - instead of inside Windows - pun! - etc), problem is server-side, no doubt about this.

 

Just to be sure, I also perform a LAN cable and switch port change on the physical host machine (with all people connected at the time - let's talk about quick hands) - everything works, but the accounting app is just going bananas on start every time, AND draining the virtual guest down with him in the black pitch pit.

 

By that I mean, I am indeed still able to reach the Windows 2012 console via vSphere client, but everything is soooooooo slooooooooowwwww that many seconds may pass between a click and a reaction from the Windows machine.

 

A fresh reboot of the Win2012 Guest allows for a temporary restore of normal console speed, but as the accounting clients reconnect (I presume via SQL queries) to their DBs, they slow down and start loosing packets and finally show various connection errors. This can take some time - from some minutes to several hours, as every time I believe I resolved the issue, I'm called back again because nobody is able to use the app anymore.

 

Sometimes a Guest server reboot allows them to work for half a day, sometimes I have too actually perform some modifications to the virtual machine to allow it to resume its activities.

 

Now it's three full days into the issue, and it is still standing. Noteworthy is, none of the other VM Guests are showing signs of malfunction - even the twin Windows 2012 R2 DC and File server is working at full speed without having to be rebooted a single time.

 

What I have already tried, after Googling the hell out of the symptoms I have at hand.

 

- checked that I'm using the VMXNET3 network adapter (as usual), because the Intel 1000e is EVIL or so I'm said;

- removed and reapplied the VMWare Tools, with reboots in between, because an error message was showing about them (no more);

- switched the VMXNET3 adapter with the Intel 1000e because I read about  some VMXNET3 issues with SQL;

- switched back to the VMXNET3 the day after, because the issue persists and customer is down again;

- applied every sort of TCP offload disabling at the "netsh int tcp set global" level, be it RSS, TCO, Chimney, you name what;

- NOT applied (yet) TCP offload disabling at the NIC advanced properties level, because I believe the previous line disables all of them globally at OS level anyway (I could be wrong);

- rebooted several times, cussed a lot, cried a little, punched random furniture, sent heartful prayers to <insert divinity of faith here> - issue is still there!

 

I suspect the main issue are the SQL transactions / data exchange, because I can successfully transfer gigs of files forth and back via SMB protocol, and I'm given no strange behaviour back, but, as the slowing down and / or connection problems can show after some time I can also be wrong.

 

Yesterday evening we rebooted the Guest OS, ensured all the stated TCP "offload" settings were DISABLED, tried to use the accounting app, it looked to work as a charm, went home for the night. This morning, nobody could connect to that server anymore, console was non-responsive, VMWare "reboot guest" command failed, I had to power it off and restart. This usually brings no good to the open SQL databases.

 

I am now ready to accept suggestions.

 

Even if they should be along the lines of "YOU MUTTONHEAD you just had to click this button / apply this patch / change this setting / look at this KB document / pray Chtulhu instead of those other useless deities to resolve everything good and forever and ever!"

 

Any idea? Thanks in advance.             


Viewing all articles
Browse latest Browse all 219257

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>