I’ve been working on a fairly large cluster recently which has access to a large number of LUNs. All 16 hosts can see all of the available disks, and so the EqualLogic firmware limits have started to present themselves, causing a few datastore disconnections.
As part of the research into the issue I came across several helpful documents, which hopefully should prove essential reading in case you haven’t come across the planning side of this before:
A description of Dell MEM parameters, taken from EqualLogic magazine
Dell EqualLogic PS Arrays – Scalability and Growth in Virtual Environments
EqualLogic iSCSI Volume Connection Count … – Dell Community
Best Practices when implementing VMware vSphere in a Dell …
Configuring and installing Dell MEM for EqualLogic PS series SANs on VMware
If you run into problems with iSCSI connection count then you will need to rethink which hosts are connecting and how many connections they maintain.
These factors are detailed within the documents linked to above, but in brief, you can attempt to resolve the issue by:
- Reducing number of LUNs by increasing datastore sizes
- Reduce the number of parallel connections to a LUN that MEM initiates
- Use access control lists to create sub-cluster groups of VMs that can see fewer LUNs
- Break your clusters down further in order to separate different groups of disk from each other, e.g. on a per-storage-pool cluster basis
I’ve spent the last few days working on setting up a split physical-virtual Windows Server 2003 cluster in order to run SQL Server 2005. The plan was to move a physical Windows 2003 cluster running SQL Server 2000 onto the new environment so that we could use eventually use VMware SRM to replicate the virtual host to the DR environment. In the event of a whole site failure we could use the replicated SAN storage and the SRM VM copy in the DR site to bring up the database.
The construction of the cluster is actually quite simple. We built the physical host and configured the SAN storage and pathing so that it could see all of the LUNs and then initialised all of the disks and formatted them. We left the VMware virtual node switched off at this point until we had built the Windows cluster using Cluster Administrator and had all of the disks online in the default cluster groups. We then added the storage to the VMware machine’s configuration using virtual SCSI adaptors and RDM disks (raw device mapping) in physical compatibility mode. When we booted the second server (virtual) we could see all of the LUNs in Disk Management but they were all uninitialised/unknown disks.
The next step was to run the Cluster Administrator on the second node and add it into an existing cluster (i.e. the one belonging to the physical node). This worked, almost, apart from the fact that the quorum disk has a different path due to the virtual SCSI adaptor and disk names – so Cluster Administrator doesn’t know whether you have access to the same quorum disk. This is easy to fix, just use the Advanced – minimal option during the Add Node setup to disable the disk heuristics and allow you to regain control of the installation.
Get this far and the cluster is probably built successfully – but you still won’t see the disks being recognised correctly until you begin to fail them over from the physical to virtual node. As you do this the storage will be recognised automatically and will assume the correct drive letters. Nice!
Having struggled to save the configuration in SANSurfer for the QLogic HBA I was led incorrectly down the path of thinking that the default password ‘config’ was incorrect. In fact, I came across the article below that made me realise that it was driver related. The default Microsoft drivers being used by the QLogic card were dated 2002, and after downloading and installing the later drivers the problem was resolved. But not so fast, the Qlogic advisory for the latest STOR drivers require you to install the STOR miniport drivers and a hotfix before they will work.
Link to Qlogic forum page that solved the issue with “Failed to save configuration” in SANSurfer:
Link to the drivers for the QLA2340 card
Link to the Microsoft fix for the Windows Server 2003 STOR miniport drivers:
Link to the Microsoft hotfix for the Windows Server 2003 STOR storage drivers: