Purple screen halt on ESXi 5.5 with Windows Server 2012 R2

Believe it or not, but it seems that it is possible to crash a clean ESXi 5.5 host right out of the box by installing a Windows Server 2012 R2 virtual machine with an E1000 virtual network adapter and attempting a file copy to another VM located on the same box.

I was trying recently to copy some data from a Windows Server 2003 VM onto a new 2012 R2 VM on the same host. Expecting that the file copy should be extremely fast (due to proximity of network traffic on the same switch) I was left scratching my head when I noticed only 3-10MB/s transfer rate.

Because I was still running ESXi5.0 I thought it would be better to troubleshoot if I upgraded to the latest version of the hypervisor, only to find that the second I hit ‘paste’ to begin the file transfer the entire hypervisor crashed with a purple screen.

Needless to say, this isn’t a fringe case and others would appear to have noticed this behaviour too. The fix is simple enough, just swap out your E1000 vNIC on the 2012 R2 server with a vmxnet3 adaptor, but how is this simple scenario so dangerous that it is able to take out a whole host?

Thankfully, after swapping the vNIC I was then able to achieve 50-60MB/s throughput continuously, which was more than enough of an improvement given where I started before.

I’m going to link to the original post I found here, but nevertheless I’ll  update this page if I find that there is a known issue somewhere that explains how this behaviour has occurred.

Dell MEM, EqualLogic and VMware ESXi, how many iSCSI connections?

I’ve been working on a fairly large cluster recently which has access to a large number of LUNs. All 16 hosts can see all of the available disks, and so the EqualLogic firmware limits have started to present themselves, causing a few datastore disconnections. As part of the research into the issue I came across several helpful documents, which hopefully should prove essential reading in case you haven’t come across the planning side of this before:

A description of Dell MEM parameters, taken from EqualLogic magazine

Dell EqualLogic PS Arrays – Scalability and Growth in Virtual Environments

EqualLogic iSCSI Volume Connection Count … – Dell Community

Best Practices when implementing VMware vSphere in a Dell …

Configuring and installing Dell MEM for EqualLogic PS series SANs on VMware

If you run into problems with iSCSI connection count then you will need to rethink which hosts are connecting and how many connections they maintain.

These factors are detailed within the documents linked to above, but  in brief, you can attempt to resolve the issue by:

  • Reducing number of LUNs by increasing datastore sizes
  • Reduce the number of parallel connections to a LUN that MEM initiates
  • Use access control lists to create sub-cluster groups of VMs that can see fewer LUNs
  • Break your clusters down further in order to separate different groups of disk from each other, e.g. on a per-storage-pool cluster basis