Why Citrix and Microsoft’s new servicing models now make sense

OK, so I wasted a little bit of time. I know.. it’s a shame when that happens, but it’s even worse to make the same mistake twice! So please read on in case you head down the same road without keeping your eyes peeled for the pitfalls. So what’s the take home message of this post? Microsoft and Citrix now need us (no actually, require us) to do as every professional should always do, and plan our release schedules properly!

This post discusses an issue I experienced installing Citrix XenDesktop VDA 7.15 on Windows 10 Fall Creator’s Update – receiving error 1603 when the Citrix Diagnostic Facility component failed to install. If you’re short on time, skip to the end for a series of helpful links – otherwise, bear with me and I’ll take you on a short journey to grudging mindset shift!

I’d wasted a morning patching a Citrix base image from Windows 10 build 1703 to 1709 Creator’s Fall update because we were looking to create a clean desktop for some developers to test their software releases on. But try as I might, the Citrix 7.15 VDA installer wouldn’t complete and always terminated with error 1603 –  the Citrix Diagnostic Facility (CDF) service had failed to install. After investigating the logs though it wasn’t clear why, other than a permissions failure on C:\Windows\assembly\tmp – and even checking those showed little evidence for the cause of the problem.

But here goes, after a little bit more digging I discovered that the latest Citrix VDA does NOT support the latest semi-annual ‘targeted’ release of Windows 10 (1709). See issue #1 on Citrix blog post.

Could I believe it? No, not at first really – how could a desktop OS release made generally available on 17th October 2017 not be compatible with the latest Citrix VDA which has also been chosen recently as the most recent Long Term Service Release version? Surely this new XenDesktop LTSR release would have been coordinated with Microsoft’s own release schedule, with release candidates being shared well in advance so that both vendor’s would have had a chance to test their interaction together?

Apparently not – and therein lies the message. You cannot expect that each vendor is attempting to align their minor and major servicing schedules with each other! ..Assuming.. that the latest Citrix VDA will work with the latest release of Windows is no longer going to float, and that’s why we all need to fully commit to the “test, test and test again” approach.

In fact, the logic was established a long time ago.  The last LTSR release of XenDesktop (7.6) did not support Windows 10 claiming this as a ‘notable exclusion’ despite the fact that early Windows 10 versions had been around for some time.

Notable Exclusions: These are components or features that are just not well suited for the extended lifecycle typically because this is newer technology that we plan on making significant enhancements to over time.  This is where Windows 10 fell when we originally launched 7.6 LTSR.

Citrix then later added retrospective support for Windows 10 by encouraging the use of VDA 7.9 in conjunction with the XenDesktop 7.6 LTSR release when it appeared that this combination worked well. However hope for the future compatibility was even made clear at this time with the following statement being added to the end of that post.

Finally, we want to note that Citrix is targeting to announce a new LTSR version in 2017 adding full LTSR benefits for the Windows 10 platform. However, this current announcement makes it easier for you to jump on Windows 10 desktop virtualization today while still maintaining all the benefits of being LTSR compliant.

And whilst it is indeed true that XenDesktop 7.15 LTSR release fully supports Windows 10 current branch/semi annual channel, it seems that only a simple statement on ‘requiring VDA 7.9 or later’ was made as long as you are happy to stick to the ‘Current release’ path:

Note about Windows 10: Regular support for Windows 10 is available through the Current Release path. Windows 10 does not get the full set of 7.15 LTSR benefits. For deployments that include Windows 10 machines, Citrix recommends that you use the Current Release Version 7.9 or later of the VDA for Desktop OS and of Provisioning Services.

A separate article entitled Windows 10 Compatibility with Citrix XenDesktop makes this clearer,

  • VDA: Although Semi-Annual Channel Targeted releases are intended for pilot trials, Citrix will provide limited support (configuration only) for VDA installations on Windows 10 Semi-Annual Channel Targeted releases, starting from version 1709 forward.

..and goes on further to say that ‘targeted’ releases such as Windows 10 Fall Creator’s Update are not guaranteed to be compatible:

While the Desktop OS VDA is expected to install and work on Windows 10 Semi-Annual Channel Targeted versions, Citrix does not guarantee proper functionality with these builds.

So there – it’s now clear. The LTSR releases, even the most recent, were never intended to deliver the latest compatibility with Microsoft’s own servicing schedule. It just happens in this case that VDA 7.15 is the most recent VDA available currently and for some reason Citrix also chose to adopt this as the version included in the latest LTSR release.

If you’re intending to use LTSR versions and maintain full compatibility with Windows 10 it seems that the only sensible way forward is to fall back on the most recent Semi-Annual Channel release (build 1703) and wait for the next LTSR cumulative release that adds support for the previously circulated Win10 ‘targeted’ version after all of the wrinkles have been ironed out. This is very well explained at the end of the linked article above, which simply states that you can’t be sure of support for specific Windows 10 versions unless you match them with the approved VDA for that Semi-annual channel release. Anything newer just might not work.

  • Windows 10 Creator’s Update (Version 1703) – use VDA 7.9/7.15 for LTSR support
  • Windows 10 Fall Creator’s Update (Version 1709) – Not supported!

So what’s the moral of the story, after all? Citrix and Microsoft have taken the stance to deliver frequent releases for those who are happy to trail-blaze and hotfix, depending upon their current release and semi-annual targeted releases respectively. But if you want to rely upon well-tested and proven operating system and VDA platforms – which are likely to survive the test of time (without high levels of maintenance and unpredictable results) then stick to the aligned Citrix LTSR and Windows Semi-Annual channel versions and plan your releases several months in advance. Anything else, and you could be left scratching your head for a short while until the penny drops!

Update: Since writing this post I’ve become aware of a clear summary of the current situation documented within Carl Stalhood’s excellent VDA 7.15 installation notes under point #7. Citrix have stated that they plan to provide retrospective support for VDA 7.15 on Windows 10 Version 1709 under two scenarios:

  • A new patch (now released) on Nov 14th 2017 (KB4051314) will provide the ability to update an existing Windows installation and existing VDA to Windows 10 version 1709
  • A new patch to be released via the Microsoft Update Catalogue in November Week 4 will allow you to do a fresh new VDA install on a clean Windows 10 version 1709.

NB This is a first draft of this post with minor edits. If you believe that anything included here is erroneous or misleading please get in contact/drop me a line so that I can clean it up. Thanks for reading!

Useful references:
Windows 10 Compatibility with Citrix XenDesktop
Windows 10 Fall Creators Update (v1709) – Citrix Known Issues
Windows 10 Creators Update (v1703) – Citrix Known Issues
XenApp and XenDesktop 7.15 LTSR
Adding Windows 10 Compatibility to XenApp and XenDesktop 7.6 LTSR
FAQ: XenApp, XenDesktop, and XenServer Servicing Options (LTSR)
Windows 10 update history

Oracle licensing on hyper-converged platforms such as Nutanix, VSAN etc.

I recently posted on Michael Webster of Nutanix’ blog about Oracle licensing on VMware clusters and wanted to link back to it here as it’s something I’ve been involved with several times now.

With VMware vSphere 5.5 the vMotion boundary is defined by the individual datacenter object in vCenter, which means that you cannot move an individual VM between datacenters without exporting, removing it from the inventory, and reimporting somewhere else. This currently means that even if you deploy Oracle DB on an ESXi cluster having just two nodes that you could be required by Oracle to license all of the other CPU sockets in the datacenter!

This rule is due to Oracle’s stance that they do not support soft partitioning or any kind of host or CPU affinity rules. Providing that a VM could run on a processor socket, through some kind of administrative operation, then that socket should be licensed. This doesn’t seem fair, and VMware even suggest that this can be counteracted by simply defining host affinity rules – but let’s be clear, the final say so has to be down to Oracle’s licensing agreement and not whether VMware thinks it should be acceptable.


So the only current solution is to build Oracle dedicated clusters with separate shared storage and separate vCenter instances consisting only of Oracle DB servers. This means that you are able to define exactly which CPU sockets should be licensed, in effect all those which make up part of one or more ESXi clusters within the vCenter datacenter object.

Now, with vSphere ESXi 6 there was a new feature introduced called long distance vMotion which facilitates being able to migrate a VM between cities, or even continents – even if they are managed by different vCenter instances. An excellent description of the new features can be found here.

This rather complicates the matter, since Oracle will now need to consider how this effects the ‘reach’ of any particular VM instance, which now would appear to only be limited to the scope of your single sign-on domain, rather than how many hosts or clusters are defined within your datacenter. I will be interested to see how this develops and certainly post back here if anything moves us further towards clarity on this subject.

Permalink to Michael’s original article

Listing Citrix session count by application using PowerShell

You may have found that Citrix Director offers fairly limited set of information regarding the number of users which are connected to each XenApp host, and there was no simple way (until I think XA7.9 update) to view the published app session count for each application.Here’s a useful PowerShell snippet which should help you out if you didn’t upgrade yet. It’s a concatenation of several commands which basically list off all of the sessions and then group and sort them into a convenient list.You’ll need to open PowerShell on a Citrix delivery controller and then type:

Add-PSsnapin Citrix*

Following which, you should enter the following command:

Get-BrokerApplicationInstance | group-Object -Property ApplicationName | sort-object -property Count -Descending | Format-Table -AutoSize -Property Count,Name

The output generated should be as follows:

You can of course tailor the verb Get-BrokerApplicationInstance to select a smaller subset of sessions on which to group and sort using:

Get-BrokerApplicationInstance -MachineName DOM\\HOXENAPP01

Which will simply tell you the distribution of published application sessions for an individual XenApp host. Hope this helps!

How to configure vCloud Connector with vCloud Air Virtual Private Cloud OnDemand

This post starts with a bit of a mouthful, however if you want to configure your private on-prem vSphere environment with vCloud Connector in order to access vCloud Air Virtual Private Cloud OnDemand resources you’ll need the following information.

If like me you have a small lab environment which consists of a single vCenter Standard appliance/server and you have access to credit on vCloud Air Virtual Private Cloud OnDemand then you will need to configure something called the vCloud Connector (referred here as the Server, and then the Node). These are two separate appliances which you’ll deploy via a simple OVF template and then link together with your vCenter instance. VMware’s own documentation is pretty straight forward apart from one specific area which I think needs a little improvement.

First, download and deploy the vCloud Connector Server appliance, followed by the Node. Both of these steps are detailed here in the product documentation and simply require a static IP address, default gateway, DNS and subnet mask during the template deployment. Once the appliances are online, check that the time zone is correct and in agreement between both appliances. Configure the Node first, by entering your Cloud details which in my use-case is simply the vCenter servers URL. Once this is complete, configure the Server component by registering the Node which you just worked with.

This step links the Node to the Server, and completes the following relationshipPrivate Cloud (vCenter) Node <<—>> Server <—>The vCloud Connector server maintains a local content repository which you can then use to synchronise content between the vCloud Air service and your own content catalogue (think templates).

The next step is to configure the Server with a connection to vCloud Airs own Node – were lucky here because its already deployed as a shared resource within the infrastructure layer at VMwares datacentre. Go to the Servers Nodes page and add another connection using the Register Node button.This time, youll need the URL of vCloud Airs On Demand servers, which are documented on the following location:


These URLs are different to the ones which you are redirected to if you select “Want to Migrate Virtual machines?” link in vCloud Air and correspond with the On Demand service.

Configure the appropriate URL for the location of your vCloud Air instance and then select the Public checkbox (this is required if there is a firewall/Internet between you and the datacenter). For some reason I needed to ignore the SSL certificate in order to authenticate correctly, but Im not too worried about these things in a lab environment. The official explanation for this is below

vCloud Connector nodes in vCloud Air have SSL enabled and certificates from DigiCert installed. If you want to use the certificate, you must add a DigiCert High Assurance CA-3 intermediate certificate to your vCloud Connector server trusted keystore. Obtain the certificate, then see Add CA Root Certificate to Trusted Keystore for information on uploading it.

You should select vCloud Director as the cloud type because this is the back-end core of the vCloud Air service, but the rest had me stumped for a little while. The VMware documentation says that you should just go ahead and enter your Organisation ID into the VCD Org Name box. But what is my org ID?Specifically it says:

Specify the name of your vCloud Air virtual data center. (This is also the Organization name in the underlying vCloud Director instance.) You must use a valid name. vCloud Connector validates the name that you provide.

Luckily I noticed that the information was literally staring me in the face! Look in the URL of your vCloud Air management portal and you will find the GUID for it here (highlighted in bold) e.g. https://uk-slough-1-6.vchs.vmware.com/compute/ui/?orgName=63567c98-f839-4632-9df2-b510155fa436&serviceInstanceId=.

It would have been nice had VMware provided a bit of a nudge here in terms of the field description, but I suppose its obvious now after going through the process.

Once this is done, enter your username and password details which you have already used in order to gain access to the vCloud Air portal and you should have a successful connector. Now if youve performed all of the steps as described then you will now have a local vCloud Connector Server coupled with a Node in your private cloud and another in vCloud Hybrid Air, looking something like this

Now that youre done with this well return back to the original end to end connectivity to review the outcome

Private Cloud (vCenter) Node <<—>> Connector Server <—> vCloud Air Node

The two components on the left hand side belong to you and run on your private cloud infrastructure, whilst the right hand side connects you to VMwares cloud platform. Once this is achieved we now have a new icon displayed within the vSphere Client which allows us to access our content library and begin to upload Templates, VMs and vApps to the cloud. Check back for more vCloud fun soon.

Optimising Oracle DB with VMware’s vFlash Read Cache feature

This post is a slightly different one that I’ve usually made simply because it is more notes based than editorial or comment, however I hope that the simple steps and data captured here will be useful. In fact it’s taken me a while to get this data out, but even though it’s about a year old now the performance improvement should be even better with ESXi 6.x. In this test we were interested basically in evaluating whether VMware’s new Flash Read Cache(vFRC)  feature released in ESXi 5.5 would benefit read heavy virtual workloads such as Oracle DB.

Test scenario:

Oracle 11g DB with 4vCPU, 8,192MB RAM and 200GB Oracle ASM disk for database
HP DL380 G7 with 2 x Intel Xeon 5650 6C 2.67GHz CPU and 128GB RAM, locally attached 4 x 7.2K SAS RAID array
VMware ESXi 5.5 Enterprise Plus license with vFlash Read Cache capability.

Creating a baseline (before applying vFRC)

Using esxitop to establish typical baseline values:Disk latency typical across measured virtual machines – 11.97ms latencyCorrelation of baseline latency and command per second values with vCenter Operations Manager:

High and low water disk latency – between 4 and 16ms (using 7.2K RPM drives in 4 disk RAID5 array).

Disk usage was negligible following VM boot and Oracle DB startup:

In order to set the vFlash Read Cache block size correctly we need to find out the typical write block size (so that small writes do not consume too large a cache block if it is set higher than the mean).Using vscsiStats to measure the frequency of different sized I/O commands:

Highlighted frequency values (above) show that 4,096 byte I/Os were the most common across both write and read buckets, and therefore the overall number of operations peaked in the same window.In order to establish the baseline Oracle performance an I/O calibration script was run several times.Oracle DB I/O metrics calculation:

Max IOPs were found to lie between 576 and 608 per second using a 200GB VMDK located on the 4 disk RAID array.The high water mark for disk latency rose to 28ms during the test, versus 12ms when the instance was idle – indicating contention on the spindles during read/write activity.

During the I/O calibration test the high water mark for disk throughput rose to 76,000 KBps, versus 3,450 KBps when the instance was idle. This shows that the array throughput max is around 74MB/s.

Having established that the majority of writes during the above test were in fact using an 8KB block size (not as shown in the screenshot which was taken from a different test (4KB)) the vFRC was enabled only on the 200GB ASM disk using an arbitrary 50GB reservation (25% of total disk size). No reboot was required, VMware inserts the cache in front of the disk storage transparently to the VM.

With Flash Read Cache enabled on 200GB ASM disk

After adding a locally attached 200GB SATA SSD disk to the ESXi server and claiming the storage for Flash Read Cache a 50GB vFRC cache was enabled on the Oracle ASM data disk within the guest OS configuration:

Once the vFRC function was enabled the Oracle I/O calibration script was run again, and surprisingly the first pass was considerably slower than previous runs (max IOPs 268). This is because each read from the SSD cache initially fails, because prior writes have not primed the cache. By writing to SSD before committing to disk (write-through caching), data is continually added to the vFRC cache such that performance should improve over time:

Esxcli was used to view the resulting cache efficiency after running I/O calibration (showing 29% read hit rate via SSD cache vs reads from SAS disk):

In the example above, no blocks have been evicted from the cache yet meaning that the 50GB cache assigned to this VMDK still offers room for growth. When all of the cache blocks are exhausted the ESXi storage stack will begin to remove older blocks in favour of storing more relevant up to date data.The resulting I/O calibration performance is shown below – both before and after enabling the vFRC feature.

In brief conclusion, the vFlash Read Cache feature is an excellent way to add in-line SSD based read caching for specific virtual machines and volumes. You must enable the option on specific VMs only, and then track their usage and cache effectiveness over time in order to make sure that you have allocated not too much, or not too little cache. However, once the cache is primed with data there is a marked and positive improvement to the read throughput, and a much reduced number of IOPS needing to be dealt with by the physical storage array. For Oracle servers which are read biased this should significantly improve performance where non-SSD storage arrays are being utilised.

Updating password field names with multiple NetScaler Gateway virtual servers

Imagine a situation where you want to change your NetScaler Gateway’s logon page to include alternative prompts for the Username, Password 1 and Password 2 fields and need to update the language specific .XML files. This has been documented before, and isn’t too hard to figure out once you’ve found a couple of ‘How to’ guides on the Internet. However I have since come across a limitation in trying to apply the NetScaler’s new ‘Custom’ design template to several different NetScaler Gateway virtual servers at the same time, because essentially whilst you can define your own custom design it is automatically applied to all instances of the virtual server residing on the NetScaler – so if you define custom fields then you’ve defined them for all.

This may not be a problem for some people, but what if the secondary authentication mechanism is an RSA token for one site, and a VASCO token for another? How do you go about configuring alternative sets of custom logon fields? Most of the answers are already out there in one form or another, but I lacked one simple beginning to end description of the solution (I tried several alternate options including rewrite policies which didn’t quite work before I opted for this approach):

Background (NetScaler 10.5.x build)The Citrix NetScaler VPN default logon page has already been modified in order to ask for ‘AD password’ and ‘VASCO token’ values instead of Password 1: and Password 2:, as detailed in http://support.citrix.com/article/CTX126206

This was achieved by editing index.html and login.js files in /var/netscaler/gui/vpn of the NS as per the Citrix article above.

In addition, the resources path which holds the language based .XML files in /var/netscaler/gui/vpn/resources has been backed up into /var/customisations so that the /nsconfig/rc.netscaler file can copy them back into the correct location if they get overwritten or lost following reboot.

Contents of rc.netscaler file

cp /var/customisations/login.js.mod /netscaler/ns_gui/vpn/login.jscp /var/customisations/en.xml.mod /netscaler/ns_gui/vpn/resources/en.xmlcp /var/customisations/de.xml.mod /netscaler/ns_gui/vpn/resources/de.xmlcp /var/customisations/es.xml.mod /netscaler/ns_gui/vpn/resources/es.xmlcp /var/customisations/fr.xml.mod /netscaler/ns_gui/vpn/resources/fr.xml

However, because these values apply globally there is an issue if a second NetScaler virtual server does not use a VASCO token as a secondary authentication mechanism. This causes the normal ‘Password’ entry box to be displayed as ‘VASCO token’. The only suitable workaround for this is to create a parallel set of logon files for each additional NS gateway virtual server and use a responder policy on the NS to redirect incoming requests for the index.html page of the VPN to a different file.

In the following examples, I have created a second configuration for a ‘Training NetScaler’, abbreviated to TrainingNS throughout. In summary,

Create separate login.js and index.html files for the alternate parameters, create a new /resources folder specifically for those and edit references within those before defining a responder action & policy in NS:

  1. Copy existing login.js to loginTrainingNS.js
  2. Copy existing index.html to indexTrainingNS.html
  3. Create a new folder called /netscaler/ns_gui/vpn/resourcesTrainingNS and give it the same owner/group permissions as the /netscaler/ns_gui/vpn/resources folder (use WinSCP to define the permissions, right click Properties on the file)
  4. Copy all of the .XML files from /netscaler/ns_gui/vpn/resources into the new folder
  5. Edit the indexTraining.html file and make the following change to reflect the new location of the resource files

var Resources = new ResourceManager("resourcesTrainingNS/{lang}", "logon");

Edit the indexTrainingNS.html file and make the modifications described in CTX1262067.

Edit the individual .XML files in the new folder as per the explanation in CTX126206

AD Password:
TwoFactorAuth Password:

(this second option will not be used if only a primary authentication mechanism is defined)

When all of the file changes are complete, using https://support.citrix.com/article/CTX123736 as a guide, define the responder action and policy on the NS:

  • Create a responder action using the URL: “https://trainingns.lstraining.ads/vpn/indexTrainingNS.html”
  • Create a responder policy using the expression: HTTP.REQ.HOSTNAME.EQ(“trainingns.lstraining.ads”) && HTTP.REQ.URL.CONTAINS(“index.html”)Bind the policy to the global defaults

Now when you launch the URL for the Training NetScaler it will redirect to the custom index.html file and load a separate logon.js and .xml resource files so that the logon box will be name differently.

In addition, the following article hints at an alternative resolution if the Responder feature cannot be licensed: http://www.carlstalhood.com/netscaler-gateway-virtual-server/#customize

Purple screen halt on ESXi 5.5 with Windows Server 2012 R2

Believe it or not, but it seems that it is possible to crash a clean ESXi 5.5 host right out of the box by installing a Windows Server 2012 R2 virtual machine with an E1000 virtual network adapter and attempting a file copy to another VM located on the same box.

I was trying recently to copy some data from a Windows Server 2003 VM onto a new 2012 R2 VM on the same host. Expecting that the file copy should be extremely fast (due to proximity of network traffic on the same switch) I was left scratching my head when I noticed only 3-10MB/s transfer rate.

Because I was still running ESXi5.0 I thought it would be better to troubleshoot if I upgraded to the latest version of the hypervisor, only to find that the second I hit ‘paste’ to begin the file transfer the entire hypervisor crashed with a purple screen.

Needless to say, this isn’t a fringe case and others would appear to have noticed this behaviour too. The fix is simple enough, just swap out your E1000 vNIC on the 2012 R2 server with a vmxnet3 adaptor, but how is this simple scenario so dangerous that it is able to take out a whole host?

Thankfully, after swapping the vNIC I was then able to achieve 50-60MB/s throughput continuously, which was more than enough of an improvement given where I started before.

I’m going to link to the original post I found here, but nevertheless I’ll  update this page if I find that there is a known issue somewhere that explains how this behaviour has occurred.

Dell MEM, EqualLogic and VMware ESXi, how many iSCSI connections?

I’ve been working on a fairly large cluster recently which has access to a large number of LUNs. All 16 hosts can see all of the available disks, and so the EqualLogic firmware limits have started to present themselves, causing a few datastore disconnections. As part of the research into the issue I came across several helpful documents, which hopefully should prove essential reading in case you haven’t come across the planning side of this before:

A description of Dell MEM parameters, taken from EqualLogic magazine

Dell EqualLogic PS Arrays – Scalability and Growth in Virtual Environments

EqualLogic iSCSI Volume Connection Count … – Dell Community

Best Practices when implementing VMware vSphere in a Dell …

Configuring and installing Dell MEM for EqualLogic PS series SANs on VMware

If you run into problems with iSCSI connection count then you will need to rethink which hosts are connecting and how many connections they maintain.

These factors are detailed within the documents linked to above, but  in brief, you can attempt to resolve the issue by:

  • Reducing number of LUNs by increasing datastore sizes
  • Reduce the number of parallel connections to a LUN that MEM initiates
  • Use access control lists to create sub-cluster groups of VMs that can see fewer LUNs
  • Break your clusters down further in order to separate different groups of disk from each other, e.g. on a per-storage-pool cluster basis