I’ve been working on a fairly large cluster recently which has access to a large number of LUNs. All 16 hosts can see all of the available disks, and so the EqualLogic firmware limits have started to present themselves, causing a few datastore disconnections. As part of the research into the issue I came across several helpful documents, which hopefully should prove essential reading in case you haven’t come across the planning side of this before:
If you run into problems with iSCSI connection count then you will need to rethink which hosts are connecting and how many connections they maintain.
These factors are detailed within the documents linked to above, but in brief, you can attempt to resolve the issue by:
- Reducing number of LUNs by increasing datastore sizes
- Reduce the number of parallel connections to a LUN that MEM initiates
- Use access control lists to create sub-cluster groups of VMs that can see fewer LUNs
- Break your clusters down further in order to separate different groups of disk from each other, e.g. on a per-storage-pool cluster basis