Deploying Citrix Ingress Controller with Kubernetes

Citrix Ingress Controller is a niche but seriously interesting innovation from Citrix – developed in order to bring an enhanced application delivery capability to the Kubernetes container orchestration platform. This article is intended to communicate some basics of Kubernetes and ingress using Citrix ADC, but more-so to highlight some specific gaps in the documentation which are no longer appropriate for Kubernetes 1.16 and above due to API changes .

Many Citrix application and networking users will already be familiar with the hardware based or virtual Citrix NetScaler or ADC platforms, bringing L4 through to L7 load balancing, URL responder and rewrite features (amongst others) to conventional or virtualised networking environments. What you now have with Citrix Ingress Controller with ADC MPX/VPX is the ability to integrate Kubernetes with your existing ADCs, or introduce Citrix ADC CPX containerised NetScaler(s) such that you are able to deploy transient containerised NetScaler ADC instances within your Kubernetes platform enabling per-application networking services.

What is great about this solution is the way that it creates an automated API interface between Kubernetes and Citrix’s Nitro REST API of NetScaler. When a new containerised app is presented to the outside via a specially annotated ingress CIC will instantly create load balancing and content switching vservers along with rewrite rules for you, and even update/remove them when your container is modified or removed. This takes all of the manual work out of updating your ADC configuration on a per-app basis.

There are two basic ways in which to incorporate Citrix ADC into Kubernetes, namely ‘north-south’ and ‘east-west’ options. Familiar ingress solutions such as NGINX are often used within Kubernetes to attach the container networking stack to the outside world, since pod networking is normally completely abstracted from the user network in order to facilitate clean application separation. In a ‘north-south’ implementation you can think of the ingress controller (e.g. NGINX or Citrix ADC) as the front door to your application, with the remaining container based application networking presented through service endpoints within the backend network.

In an ‘east-west’ topology you can implement Citrix ADC CPX as a side-car to your container application in order to provide advanced ADC features within the Kubernetes network to enhance inter-container communication. This is a more advanced topology, but nonetheless directly intended for deployment within the Kubernetes infrastructure as a container. Citrix have a nice series of diagrams which highlight the tier 1 and tier 2 scenarios here.

Prerequisites

I’m going to be talking about bare-metal scenarios here rather than cloud based environments such Azure AKS, however to user these examples you will need to have created a Kubernetes 1.16 cluster first and be able to interact with it using kubectl. I have been using Rancher in order to build my Kubernetes clusters on vSphere, which in itself is a whole other subject which I hope to return to in a different post.. but you could always use something like MiniKube running within a desktop hypervisor (let me know how you get on!).

In order to use the implementation examples below you will need to have deployed a Citrix NetScaler MPX or VPX v12.1 / 13 in your network which is able to communicate with the Kubernetes API and cluster nodes. My lab uses a flat network range of 192.168.0.0/24 for instance, in which case the Kubernetes API is available on the same network as my NetScaler. However the backend pod networks are in the range 10.42.x.0/24 where each node hosts a separate range. Citrix Ingress Controller will take care of adding the network routes to these backend networks so they don’t have to be reachable from your desktop.

For the purposes of a lab type exercise it doesn’t matter if your Citrix ADC is used for other features, e.g. LB, Citrix Gateway because Citrix Ingress Controller will complement your infrastructure without replacing any of the existing configuration. It’s probably not a great idea to launch straight into this using your Production ADC instance though, best stick to the lab environment!

Create a system user on Citrix ADC

Your Citrix Ingress Controller will talk to NetScaler Nitro API directly using a user account which you define within Kubernetes. Perhaps you will use an existing user, or create a new one. For instance the following command will create a new user called cic on the NetScaler and create a new command policy:

add system user cic my-password
add cmdpolicy cic-policy ALLOW “^(?!shell)(?!sftp)(?!scp)(?!batch)(?!source)(?!.*superuser)(?!.*nsroot)(?!install)(?!show\s+system\s+(user|cmdPolicy|file))(?!(set|add|rm|create|export|kill)\s+system)(?!(unbind|bind)\s+system\s+(user|group))(?!diff\s+ns\s+config)(?!(set|unset|add|rm|bind|unbind|switch)\s+ns\s+partition).*|(^install\s*(wi|wf))|(^(add|show)\s+system\s+file)”

NB I’ve seen a problem with the above where the command might error out with an error concerning unexpected quotes character, it doesn’t seem to interfere with the creation of the command policy though.

In case you have any difficulties whilst attempting to recreate the steps in this post you can always try first using the ‘superuser’ command policy and then refine it until it matches the command permissions that you’re comfortable with.

In addition to this you may need to add additional rewrite module permissions if you’re going to use the rewrite CRDs, you can just tack these on to the end of the existing definition before the final quote mark:

(^(?!rm)\S+\s+rewrite\s+\S+)|(^(?!rm)\S+\s+rewrite\s+\S+\s+.*)

Finally, bind the newly created command policy to your new cic user.

bind system user cic cic-policy 0

Deploy Citrix Ingress Controller using YAML

This section is slightly different to that which is outlined in the actual Citrix Ingress Controller instructions. Please take care to understand the differences, they are mainly due to a desire to create better separation between components and configuration settings.

Create a new namespace to hold the secret and other CIC components. The commands below show the namespace entry in bold in case you choose to omit this and just place the components in the default namespace. It’s up to you, but for tidiness I created a namespace.

kubectl create namespace ingress-citrix

Create a new Kubernetes secret to store your Nitro API username and password. Using kubectl connect to your cluster and create a new secret to store the data.

kubectl create secret generic nslogin --from-literal=username=cic --from-literal=password=mypassword -n ingress-citrix

In my testing I ran into what I think is a Citrix documentation error for the above command where they show using single quotes around the name cic and mypassword values. Kubernetes converts these values into base64 encoding before they are stored, and might also include the quotes in the final value if you’re not careful. In fact that messed up my configuration for a while until I converted the secret back into its original content, using:

kubectl get secret nslogin -n ingress-citrix -o=yaml

Take the values for password: and username: from the secret and pass them through a base64 decoder just to check that this hasn’t happened (there are also various web sites which can do this for you) by using the following Linux/MacOS command for either the username or password taken from the YAML form above.

echo bXlwYXNzd29yZA== | base64 --decode

Using this source file as a reference, modify/add the following entries (shown in bold) within the file in order to add the name of your namespace:

kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
   name: cic-k8s-role
 roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: cic-k8s-role
 subjects:
 kind: ServiceAccount
 name: cic-k8s-role
 namespace: ingress-citrix 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cic-k8s-role
  namespace: ingress-citrix
apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: cic-k8s-ingress-controller
   namespace: ingress-citrix
 (entry continues)

Be aware – the default CIC configuration creates a cluster role which will see events across the whole system, however this can be deliberately (or mistakenly) restricted to only watching API events in specific namespaces if your role contains:

kind: Role

instead of:

kind: ClusterRole

or if you add a NAMESPACE environment variable when defining the env: section of your CIC deployment manifest.

Finally, add/edit the following entries to define how to contact your Citrix ADC i.e. the NetScaler management IP (NS_IP) and virtual server IP (NS_VIP) to be used for LB/content switching your ingress (the front door)

env:
         # Set NetScaler NSIP/SNIP, SNIP in case of HA (mgmt has to be enabled) 
         - name: "NS_IP"
           value: "192.168.0.99"
         - name: "NS_VIP"
           value: "192.168.0.110"
         - name: "LOGLEVEL"
           value: "INFO"
args:
           - --ingress-classes
             citrix
           - --feature-node-watch
             true

NB – the --feature-node-watch option allows NetScaler to create routes automatically in order to reach the backend pod network addresses

NB – the LOGLEVEL default value is DEBUG, you might want to leave this as an unspecified value until you’re happy with the functionality, and then change it to INFO as above.

The version of Citrix Ingress Controller is specified within this YAML file, hence if you wish to upgrade your CIC version it can be modified and redeployed (as long as no other changes to your deployment are required)

image: "quay.io/citrix/citrix-k8s-ingress-controller:1.6.1"

After updating the above entries as citrix-k8s-ingress-controller.yaml save the modified YAML file and then deploy it using kubectl

kubectl create -f citrix-k8s-ingress-controller.yaml

Check that your Citrix Ingress Controller container has deployed correctly:

kubectl get pods -n ingress-citrix

NB – in the following examples you can ignore the rancherpart of the above command, the kubectl statements are being proxied through Rancher in order to reach the correct cluster

Validate the installation of Citrix Ingress Controller

Once CIC is online you can access the logs generated by the container by switching the name of your container into the following command:

kubectl logs cic-k8s-ingress-controller-9bdf7f885-hbbjb -n ingress-citrix

You’ll want to see the following highlighted section within the log file which shows that CIC was able to connect to the Nitro interface and create a test vserver (which coincidentally validates that it was able to locate and use the secret which was created to store the credentials!):

2020-01-10 10:45:50,144  - INFO - [nitrointerface.py:_test_user_edit_permission:3729] (MainThread) Processing test user permission to edit configuration
 2020-01-10 10:45:50,144  - INFO - [nitrointerface.py:_test_user_edit_permission:3731] (MainThread) In this process, CIC will try to create a dummy LB VS with name k8s-dummy_csvs_to_test_edit.deleteme
 2020-01-10 10:45:50,174  - INFO - [nitrointerface.py:_test_user_edit_permission:3756] (MainThread) Successfully created test LB k8s-dummy_csvs_to_test_edit.deleteme  in NetScaler
 2020-01-10 10:45:50,188  - INFO - [nitrointerface.py:_test_user_edit_permission:3761] (MainThread) Finished processing test user permission to edit configuration
 2020-01-10 10:45:50,251  - INFO - [nitrointerface.py:_perform_post_configure_operation:575] (MainThread) NetScaler UPTime is recorded as 7225

At this point the Citrix Ingress Controller container will sit there listening out for any Kubernetes API calls which it might be interested to assist with, e.g. creation of an ingress or load balancer object. By default Citrix should pick up any ingress creation event, but in many environments you’ll already have NGINX deployed for various reasons (e.g. it’s a functional part of accessing a dashboard for instance).

The way that you can avoid getting things tangled up is by deliberately using ingress class annotations in your specifications. In this way other ingress controllers will ignore your requests to build an ingress but CIC will jump straight in to help. The annotation which is used for this is called:

kubernetes.io/ingress.class:"Citrix"

Deploying an application

Let’s start by deploying a simple application into the default namespace. The reason we’re going to do this is two-fold, firstly it is simple and most likely to work, and secondly it verifies that CIC is able to see services and ingresses outside of its own namespace. I like to use a hello-world image from Tutum because it tells us a little bit about where it’s running when you access the page.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-world
  namespace: default
spec:
  selector:
    matchLabels:
      run: hello-world
  replicas: 1
  template:
    metadata:
      labels:
        run: hello-world
    spec:
      containers:
      - name: hello-world
        image: tutum/hello-world
        ports:
        - containerPort: 80

Create a new YAML file and save it as deploy-hello-world.yaml, then use kubectl to deploy it to Kubernetes. You’ll see that I’ve prepended rancher in all of my examples but you can omit that if you’re not using Rancher

kubectl apply -f deploy-hello-world.yaml

Creating a service

Now that the application is running in a container you’ll need to create a service using the following YAML. Save it as expose-hello-world.yaml. You could use a type spec of ClusterIP or NodePort – it doesn’t matter when CIC is configured with --feature-node-watch=true although the default is actually ClusterIP.

apiVersion: v1
kind: Service
metadata:
  name: hello-world
  namespace: default
  labels:
    run: hello-world
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: hello-world
kubectl apply -f expose-hello-world.yaml

Defining your ingress

An ingress is a rule which directs incoming traffic to a host address or a given path through to the backend application. It’s quite important to know that an ingress itself is just a rule, there may be load balancers or ingress controllers which receive incoming traffic in your environment but the ingress assists in directing that flow to the backend application.

Again the use of the ingress class kubernetes.io/ingress.class:"Citrix" is an essential component of the below ingress example. It ensures that CIC ‘notices’ the new ingress definition and tells it that it should instruct the Citrix ADC to build load balancing or content switching vservers to make sure your traffic is received when the outside world attempts to talk to your application.

In this ingress example we are going to simulate a scenario where you have a path based entry point into your application, which itself then redirects to the container’s root page. Create a new YAML file with the following content and call it ingress-hello-world.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: hello-world-ingress
  namespace: default
  annotations:
   kubernetes.io/ingress.class: "citrix"
spec:
  rules:
  - host:  www.helloworld.com
    http:
      paths:
      - path: /hello-world
        backend:
          serviceName: hello-world
          servicePort: 80

NB The author, his company and this post has nothing whatsoever to do with any websites or businesses operating on any real domains such as ‘helloworld.com’. It is chosen simply as a convenient example.

kubectl apply -f ingress-hello-world.yaml

At this point, if everything has worked correctly you should be able to make a host file or DNS entry for www.helloworld.com (of course you could use anything else) which points to the same IP address you used to define the NS_VIP address of your load balancer in the Citrix Ingress Controller configuration (citrix-k8s-ingress-controller.yaml). In the examples above the mapping would be:

www.helloworld.com <---> 192.168.0.110

You’ll see the virtual IP now created for you within the Citrix ADC in two places, firstly a new content switch:

A new content switch with the IP address specified in NS_VIP entry, 192.168.0.110

This new content switch has one or more expressions which match traffic to actions (created through ingress definitions):

Therefore any incoming HTTP request matching the www.helloworld.com host where the request URL includes pages starting with the /hello-world location will be sent to the second newly created object – the vserver defined in the action below:

A new load balancing vserver has been created with address 0.0.0.0

This LB vserver includes a service group whose members are actually represented by the pods where the application is currently running. If you changed the deployment specification to include more replicas then you would see more nodes participating in the service group. Citrix ADC will monitor the health of the exposed node ports in order to ensure that traffic is only directed onto running pods.

And now when we visit the page, via the hostname and URL path defined on the ingress we should now see:

Adding a rewrite policy

Let’s say that you have a single ingress controller which is exposing endpoints on a path basis, e.g. /myapproot but the application available on that service is expecting /myapproot/ instead. Some applications I’ve seen won’t respond properly unless you rewrite your request URL to have the trailing forward slash. Fortunately Citrix Ingress Controller and ADC are able to take care of this through a rewrite rule.

Before you can use this you’ll need to deploy the Custom Resource Definitions for rewrite using the following instructions.

Download the CRD for rewrite and responder YAML from this Citrix URL. Save it as rewrite-responder-policies-deployment.yaml and then deploy it using

kubectl create -f rewrite-responder-policies-deployment.yaml

NB One very interesting ‘gotcha’ here is that if you associate a CRD with a namespace then it will only create rewrite policies and actions for services in that namespace, so I would recommend simply using the simplest form of the command shown above without placing the CRD into the ingress-citrix namespace used in this blog’s example.

Now that is deployed you should adapt the following YAML in order to define how the app rewrite should function and then save it as cic-rewrite-example.yaml:

apiVersion: citrix.com/v1
kind: rewritepolicy
metadata:
 name: httpapprootrequestmodify
 namespace: default
spec:
 rewrite-policies:
   - servicenames:
       - hello-world
     rewrite-policy:
       operation: replace
       target: http.req.url
       modify-expression: '"/hello-world/"'
       comment: 'HTTP app root request modify'
       direction: REQUEST
       rewrite-criteria: http.req.url.eq("/hello-world")
kubectl create -f cic-rewrite-example.yaml

Using a Load Balancer service instead of Ingress

In the example above I outlined how to create a hello-world deployment and service in order to correctly present an application via an ADC using ingress. However ingress will only work for HTTP/HTTPS type traffic and cannot be used for other services. One additional method you can use for other traffic is to define a service of type LoadBalancer rather than any other option, e.g. ClusterIP, NodePort.

Citrix Ingress Controller has a specific annotation for this scenario which can be added to the service definition to add the IP address which ADC should use. This is the equivalent of a cloud-provider based load balancer in your on-prem Kubernetes environment where you might not use ingress at all.

apiVersion: v1
kind: Service
metadata:
  name: hello-world
  namespace: default
  annotations:  
    service.citrix.com/frontend-ip: '192.168.0.115'
  labels:
    run: hello-world
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: hello-world

Save the YAML example above into cic-loadbalancer-example.yaml and apply it.

kubectl create -f cic-loadbalancer-example.yaml

If you now examine the service which is created it should be apparent that the type has now changed from NodePort or ClusterIP to LoadBalancer. The external IP address is now shown, as defined within the service.citrix.com/frontend-ip: '192.168.0.115' annotation.

Citrix ADC will now direct traffic arriving at that IP address through to any pods which match the label selector. This method allows you to quite simply plug the outside world in to your Kubernetes application infrastructure at L4 without using ingress or path matching rules.

Summary

Citrix Ingress Controller is well worth investigating if you are beginning to implement on-prem Kubernetes based applications and already have an investment in Citrix ADC. If you need additional features such as DDoS protection, advanced rewrite, TCP optimisations etc. then CIC offers quite a lot of benefits over a simple NGINX proxy. The next article planned in this series will examine the sidecar Citrix ADC CPX deployment and how this can enhance visibility of inter-container communication.

Addendum – Rancher specific ingress issue with Citrix Ingress Controller

This section has been included here in order to highlight a specific issue which is currently occurring in CIC 1.6.1 and Rancher 2.3.4 releases. It seems to be a purely cosmetic issue however it’s been the subject of a recent call I had with some of the Citrix people responsible for CIC who confirmed the behaviour with me. Basically when an ingress is created it is successfully created by CIC but its status does not move from ‘Initializing’ to ‘Active’ in Rancher. This is because Rancher is awaiting the External-IP value to be updated in the Status, but this does not occur because CIC doesn’t mandate that this be actively reported. I’ll update/remove this section from the post if and when this is resolved.

UPDATE – the above issue is now resolved in releases 1.7.6 and above by appending the --update-ingress-status entry into the CIC deployment YAML under the following section:

args:
  - --ingress-classes citrix
  - --feature-node-watch true
  - --update-ingress-status yes

NSX-T Manager appliance high-CPU whilst idle

I run VMware NSX-T in a small lab environment based on Intel NUCs, but I’ve noticed recently that even when not being challenged e.g. following initial boot and being essentially idle, the Manager appliance suffers continual high CPU usage which leads eventually to an uncomfortably warm office.

Even though running the correct minimum virtual machine hardware for the appliance has been configured, i.e. 4 vCPU and 16GB RAM, it was regularly using ~4.5GHz of physical CPU.

Here’s a good example of an otherwise idle appliance showing 40% CPU usage.

~40% CPU on a 4 vCPU virtual appliance

After connecting over SSH as the ‘admin’ user and entering ‘get process monitor’ it’s quickly apparent from the top output that ‘rngd’ is responsible for the majority of the CPU utilisation:

‘get process monitor’ whilst logged in as NSX-T admin console user

But what is this? A quick search of more general Linux resources informs us that it is a random number generator used in ensuring sufficient ‘entropy’ is available during creation of certificates, SSH keys etc.

In order to discover more about the purpose of this daemon we can inspect the description of the installed version (5-0ubuntu4nn1) under the current Ubuntu 18.04.4 LTS release.

apt show rng-tools/now

Description: Daemon to use a Hardware TRNG
The rngd daemon acts as a bridge between a Hardware TRNG (true random number
generator) such as the ones in some Intel/AMD/VIA chipsets, and the kernel’s
PRNG (pseudo-random number generator).
.
It tests the data received from the TRNG using the FIPS 140-2 (2002-10-10)
tests to verify that it is indeed random, and feeds the random data to the
kernel entropy pool.
.
This increases the bandwidth of the /dev/random device, from a source that
does not depend on outside activity. It may also improve the quality
(entropy) of the randomness of /dev/random.
.
A TRNG kernel module such as hw_random, or some other source of true
entropy that is accessible as a device or fifo, is required to use this
package.
.
This is an unofficial version of rng-tools which has been extensively
modified to add multithreading and a lot of new functionality.

So we know that this is a helper daemon which improves the speed of providing near-truly random numbers when applications ask for them. What version do we currently have installed in the NSX-T 3.1.2 manager appliance?

apt search rng-tools
Sorting... Done
Full Text Search... Done
rng-tools/now 5-0ubuntu4nn1 amd64 [installed,local]
  Daemon to use a Hardware TRNG

This appears to be the latest available version. In order to examine the status of the rngd daemon itself, log in to the appliance console as the root user and use:

systemctl list-units rng-tools.service

The service is shown as running,

Name of rngd random number generator service
root@nsx-manager:~# systemctl status rng-tools.service
rng-tools.service
Loaded: loaded (/etc/init.d/rng-tools; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-10-05 09:31:59 UTC; 17min ago
Docs: man:systemd-sysv-generator(8)
Process: 886 ExecStart=/etc/init.d/rng-tools start (code=exited, status=0/SUCCESS)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/rng-tools.service
`-934 /usr/sbin/rngd -r /dev/hwrng
Oct 05 09:31:59 nsx-manager systemd[1]: Starting rng-tools.service…
Oct 05 09:31:59 nsx-manager rng-tools[886]: Starting Hardware RNG entropy gatherer daemon: /etc/init.d/rng-tools: assigning /dev/hwrng to access rdrand on cpu
Oct 05 09:31:59 nsx-manager rng-tools[886]: crw-rw-rw- 1 root root 1, 8 Oct 5 09:31 /dev/random
Oct 05 09:31:59 nsx-manager rng-tools[886]: rngd.
Oct 05 09:31:59 nsx-manager systemd[1]: Started rng-tools.service.

What else can you find out about what it is doing in the background?

rngd -v

Two instances of ‘read error’ are output, followed by two further entropy sources being the Intel/AMG hardware random number generator and AES digital random number generator (RNDG). The ‘read error’ issue appears to be normal behaviour as the package attempts to read sources which don’t exist. Both of the displayed sources indicate that the CPU instruction set includes the necessary flags to tell the VM that it can access hardware random number generation.

Verbose output from rngd daemon

I must say, at this point it’s not clear whether NSX-T requires this service to be running permanently or whether it’s a component which Linux uses as a background service in order only to optimise the generation of a random number feed. It seems that stopping the service does appear to eventually cause problems in my lab – so please attempt the next section with CAUTION.

systemctl stop rng-tools.service

This leads to a significant reduction in CPU consumption and running temperature of my ESXi nodes.

CPU usage decreases after stopping rngd service

It may also be possible to disable the service permanently, but since I don’t have a full explanation of the purpose of this service from an NSX-T point of view I would stop short currently from doing this.

systemctl disable rng-tools.service

In the meantime I am hoping that I can get someone within the NSX-T development team to investigate these findings and provide some more permanent kind of workaround.

Further investigation

Further reading around the subject led me to find an issue has been reported on certain CPUs leading to activity spikes, https://github.com/nhorman/rng-tools/issues/136 and newer versions promise to fix this problem. The article mentioned suggests adding the -x jitter option to the start command but this is not available in the version installed in NSX-T.

RNGD_OPTS="-x jitter -r /dev/hwrng"

You can locate and edit the startup parameters by altering the service definition:

vi /etc/init.d/rng-tools

and potentially altering the default kernel values which are referenced by:

RNGDOPTIONS=
[ -r /etc/default/rng-tools ] && . /etc/default/rng-tools

Edit using:

vi /etc/default/rng-tools

However until the version of rng-tools used in NSX-T is updated to resolve this apparent issue it remains a personal choice as to whether or not the service can be stopped intermittently when a lab environment is not needed.


https://en.wikipedia.org/wiki/RDRAND
https://www.exoscale.com/syslog/random-numbers-generation-in-virtual-machines/

Citrix Gateway 13.0 Registry value EPA scan examples

If you’re having trouble with getting Citrix Endpoint Analysis scans of client device registry values to work properly (on Citrix Gateway) you may come across the following issue I experienced in the latest versions of firmware.

It appears that the EPA scan functionality in the NS 13.0 GUI (this article relates to 13.0.82.45) has been merged so that the numeric/non-numeric registry scan types now coalesce into one type of scan: REG_PATH; whereas in previous versions string values were interpreted using REG_NON_NUM_PATH.

Here’s a screen shot of the new expression editor drop down for Windows client EPA scans

NS13.0.82.45 drop down for Windows EPA scans

In comparison to the previous version (NS13.0.71.44).

NS13.0.71.44 drop down for Windows EPA scans

Here’s a screenshot of the registry scan entry panel where you can enter registry path and value, plus comparison or presence operators. Note the tooltip box which says that numeric comparisons will be done when using <,>,== etc.

NS13.0 registry scan value/comparison entry GUI

The convergence of these two types of scan into one appears to hide a reduction in comparison functionality, which only emerges once you attempt to use a string based registry value comparison using REG_PATH. You cannot use == anymore with string values such as REG_SZ.

This is a quick summary of the new behaviour following my own testing:

Numeric comparisons

Scans based upon REG_DWORD, REG_QWORD, REG_BINARY values will only work when carrying out boolean comparisons on numeric values with operators such as ==, !=, >=

e.g.

sys.client_expr("sys_0_REG_PATH_==HKEY\\_LOCAL\\_MACHINE\\\\SOFTWARE\\\\Classes\\\\YourRegistryKeyLocation\\\\YourRegistryValueName_VALUE==_12345[COMMENT: Registry]")

will result in a successful scan when YourRegistryValueName == 12345.

String comparisons

However when using the newly merged functionality, scans based upon REG_SZ values will only work when carrying out comparisons on string values using operators such as ‘contains’, ‘notcontains’.

If you try to use == as the operator on a string comparison the EPA scan logs will result in:

2021-09-28 09:25:38.883 Boolean compare failed. Value false operator ==
2021-09-28 09:25:38.883 Scan 'REG_PATH_==_HKEY\_LOCAL\_MACHINE\\SOFTWARE\\Classes\\YourRegistryKeyLocation\\YourRegistryValueName_VALUE_==_12345' failed for method 'VALUE'

Therefore modify your EPA action expression to fit the following example using ‘contains’:

sys.client_expr("sys_0_REG_PATH_==_HKEY\\\\_LOCAL\\\\_MACHINE\\\\\\\\SOFTWARE\\\\\\\\Classes\\\\\\\\YourRegistryKeyLocation\\\\\\\\YourRegistryValueName_VALUE_contains_12345[COMMENT: Registry]")

There are several other comparisons which do not appear to work properly, e.g. a numeric registry comparison of a REG_QWORD value which is longer than that allowed by the Citrix EPA plugin BUT is allowed within the 64 bytes of the Windows Registry value.

So my advice would be to consider whether the version of Citrix ADC you’re currently using actually offers the type of scan which you’re intending to use (REG_NON_NUM_PATH, REG_PATH), and NOT to rely upon documented examples without determining if the operator matches the value type correctly.

Further reading

https://support.citrix.com/article/CTX209148 – How to enable client EPA logging/troubleshooting

https://docs.citrix.com/en-us/citrix-gateway/current-release/vpn-user-config/advanced-endpoint-analysis-policies/advanced-endpoint-analysis-policy-expression-reference.html

PowerShell walkthrough – Citrix FAS certificate renewal

Citrix Federated Authentication Service (FAS) allows SAML based authentication tokens to be used when accessing StoreFront resources via Citrix Gateway.

In many established installations the certificates issued to the FAS server(s) will eventually expire, typically after 2 years. A simple GUI tool can be used to ‘Reauthorize’ an expired domain registration authorization certificate in this event, but an alternative PowerShell route is available to Citrix administrators so that certificates can be renewed in advance.

Citrix’s documentation proposes the following sequence of commands, without referencing the required parameters or source of information:

  • Create a new authorization certificate: New-FasAuthorizationCertificate
  • Note the GUID of the new authorization certificate, as returned by: Get-FasAuthorizationCertificate
  • Place the FAS server into maintenance mode: Set-FasServer –Address <FAS server> -MaintenanceMode $true
  • Swap the new authorization certificate: Set-FasCertificateDefinition –AuthorizationCertificate <GUID>
  • Take the FAS server out of maintenance mode: Set-FasServer –Address <FAS server> -MaintenanceMode $false
  • Delete the old authorization certificate: Remove-FasAuthorizationCertificate

Whilst this might be sufficient if you have a fair degree of confidence with PowerShell it might not be enough if you’re faced with an expired certificate and hundreds of users trying to log in.

I have used the following sequence successfully recently and hope that it will be useful to others.

NB – this example is provided ‘as-is’ and you remain responsible for understanding the effect of each command and detecting when the output doesn’t match your own scenario.

The following colourised convention applies throughout, ensure that you do not copy and paste these values without updating them:

Original FAS certificate ID reference
New FAS certificate ID reference
Certificate authority reference

  1. Open PowerShell on the FAS server for which you want to update the registration certificate.
  2. Add the Citrix commandlets into the PowerShell session:

Add-PSSnapin Citrix.Authentication.FederatedAuthenticationService.V1

  1. Create a variable to hold the local FAS server’s address (if this is the second FAS server in a group of more than one, replace [0] with [1] below:

$CitrixFasAddress=(Get-FasServer)[0].Address

Address : yourfasnode01.yourdomain.com
Index : 0
Version : 1
MaintenanceMode : False
AdministrationACL : O:BAG:DUD:P(A;OICI;SW;;;BA)

  1. Get the existing FAS certificate ID

Get-FasAuthorizationCertificate

Id : 1c67270b-d2f4-4543-919b-519cb5470612
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : bb6b4e47-c5b3-4a6a-9a50-eb6a02a05c3c
CertificateRequest :
Status : MaintenanceDue

  1. Generate a new FAS certificate request against the CA. Both the existing certificate and new certificate request IDs will be shown.

New-FasAuthorizationCertificate -CertificateAuthority yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA -CertificateTemplate Citrix_RegistrationAuthority

Id : 1c67270b-d2f4-4543-919b-519cb5470612
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : bb6b4e47-c5b3-4a6a-9a50-eb6a02a05c3c
CertificateRequest :
Status : MaintenanceDue

Id : 2c113327-1c73-2ca4-44a3-3c12da3963b5
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : 66a8d3fe-7bdb-4003-8220-cd11f7685b92
CertificateRequest :
Status : WaitingForApproval

  1. Log in to the certificate authority and locate the pending certificate request. Select the item, right click and choose and choose ‘Issue’. Wait a minute or two then continue.
  2. Repeat the process to retrieve the FAS authorisation certificates and notice that the status of the newly issued one should have changed from ‘WaitingForApproval’ to ‘Ok’.

Get-FasAuthorizationCertificate

Id : 1c67270b-d2f4-4543-919b-519cb5470612
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : bb6b4e47-c5b3-4a6a-9a50-eb6a02a05c3c
CertificateRequest :
Status : MaintenanceDue

Id : 2c113327-1c73-2ca4-44a3-3c12da3963b5
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : 66a8d3fe-7bdb-4003-8220-cd11f7685b92
CertificateRequest :
Status : Ok

  1. Set the local FAS server into maintenance mode:

Set-FasServer -Address $CitrixFasAddress -MaintenanceMode $true

  1. Get the FAS certificate definition rule, this points at the existing FAS authorisation certificate:

Get-FasCertificateDefinition

Name : default_Definition
CertificateAuthorities : {yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA}
MsTemplate : Citrix_SmartcardLogon
AuthorizationCertificate : 1c67270b-d2f4-4543-919b-519cb5470612
PolicyOids : {}
InSession : False

  1. Create a variable to store the FAS certificate authority address:

$DefaultCA=(Get-FasMsCertificateAuthority -Default).Address

  1. Update the existing FAS certificate definition to use the new FAS certificate ID

Set-FasCertificateDefinition -Name default_Definition -AuthorizationCertificate 2c113327-1c73-2ca4-44a3-3c12da3963b5

  1. Get the FAS certificate definition rule, this should now point at the new FAS authorisation certificate:

Get-FasCertificateDefinition

Name : default_Definition
CertificateAuthorities : {yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA}
MsTemplate : Citrix_SmartcardLogon
AuthorizationCertificate : 2c113327-1c73-2ca4-44a3-3c12da3963b5
PolicyOids : {}
InSession : False

  1. Remove the maintenance mode flag on the local FAS server:

Set-FasServer -Address $CitrixFasAddress -MaintenanceMode $false

  1. Remove the original FAS authorisation certificate (no longer required)

Remove-FasAuthorizationCertificate -Id 1c67270b-d2f4-4543-919b-519cb5470612

Citrix Advanced Session policy equivalents of default Classic expressions

A customer of mine recently asked for some help understanding why Citrix Gateway was not allowing external logons anymore, possibly combined with a recent upgrade to Citrix ADC VPX 13.0 Build 82.42.

He pointed out that there was an entry within the ns.log file which complained about a problem with ‘Ica mode status’, shown below:

Aug 6 11:39:59 192.168.200.191 08/06/2021:09:39:59 GMT citrix-netscaler 0-PPE-0 : default SSLVPN Message 586 0 : "Ica mode status is not okay"

Investigating further we could identify both successful LDAP authentication (basic LDAP auth attached directly to the Citrix Gateway vserver) and STA lookup, but the ADC wasn’t actually requesting any pages from the Storefront server URL defined in the session profile.

Searching for the error itself yielded one result which referred in particular to ‘Ica mode status’ :

https://support.citrix.com/article/CTX291268

Point #2 in the solution referred to switching the Classic expression in the session policy to an Advanced policy, however you cannot modify an existing policy without it switching back to the original setting. In order to bypass this limitation, create new session policies which use the Advanced expression equivalents to those created by the Citrix XenApp and XenDesktop ADC wizard available in the appliance.

See below screenshot for the before (first 2) and after (latter 2) Classic/Advanced equivalents.

Before (classic)

add vpn sessionPolicy PL_OS_192.168.200.190 "REQ.HTTP.HEADER User-Agent CONTAINS CitrixReceiver" AC_OS_192.168.200.190
add vpn sessionPolicy PL_WB_192.168.200.190 "REQ.HTTP.HEADER User-Agent NOTCONTAINS CitrixReceiver && REQ.HTTP.HEADER Referer EXISTS" AC_WB_192.168.200.190

After (advanced)

add vpn sessionPolicy PL_OS_192.168.200.190_Advanced "HTTP.REQ.HEADER(\"User-Agent\").CONTAINS(\"CitrixReceiver\")" AC_OS_192.168.200.190
add vpn sessionPolicy PL_WB_192.168.200.190_Advanced "HTTP.REQ.HEADER(\"User-Agent\").CONTAINS(\"CitrixReceiver\").NOT" AC_WB_192.168.200.190

Once the Advanced expression policies are bound to the vserver and the original Classic expressions have been removed – the initial problem is resolved and StoreFront loads successfully.

Whilst Citrix are advising that Citrix classic expression policies will be deprecated in ADC 13.1 it appears that some issues relating to session policies have crept in at/before 13.0 Build 82.42 which need to be carefully managed.

NB. It is possible to use a Citrix Advanced session policy with the Citrix ADC Gateway VPX license in this way. This isn’t the same as enabling nFactor Advanced Authentication policies as detailed by Carl Stalhood here: https://www.carlstalhood.com/nfactor-authentication-for-netscaler-gateway-12/

Lab problems with Intel NUC 11th Generation hardware with VMware ESXi 7.0.1

This is a placeholder posting for ongoing updates as and when new updates/resolutions are found. It isn’t intended to provide any additional detail to the problems outlined but simply to document the areas where bugs or ‘gotchas’ are located.

I have recently acquired several Intel NUC 11th Generation (NUC11TNHv50L) for my lab/testing environment which are being deployed into an existing vSAN/NSX-T environment as a workload domain. The release of these latest NUCs seemed to have generated a lot of interest with different community members discussing the ideal fit with NSX-T (due to the dual 2.5 Gbit/s Intel I225-LM NICs which come in the Pro version), however there are a couple of limitations that make this not a smooth ride currently.

Community networking driver and workarounds

Out of the box these NUCs are not supported with VMware ESXi and rely upon the Community Networking Driver Fling. Therefore before purchasing these devices for your home lab be aware that this fling:

  • Requires a custom ESXi image to be created which includes the Community Networking Driver
  • Does not support jumbo frames (e.g. >1500 byte MTU) – which in my view prevents any serious use with the NSX-T Geneve protocol which is typically 1600 byte minimum
  • Causes the network interface to become disconnected (link layer communication fails) if configured MTU is greater than 1500, which only recovers after a reboot
  • Seems to cause a purple screen (PSOD) failure when the second NIC is connected (under undefined circumstances currently)

Currently I am overcoming the NSX-T frame size issue by using the Startech USB 3.1 1Gbit/s USB network adapters, but this requires an additional fling to be installed. As a compromise it’s not too bad, since there are two Thunderbolt/USB-C ports on these NUCs allow up to two additional 1Gbit/s interfaces to be attached. So I am configuring my ESXi hosts as:

1 x Onboard Intel I225-LM at 2.5 Gbit/s – dvSwitch 1 (Management, vSAN)

1 x StarTech USB 3.1 adapter at 1Gbit/s – dvSwitch 2 (NSX-T, vMotion)

Power off and shut down

In addition it seems that when ‘Shut down’ of an ESXi host is performed the system ignores the BIOS power setting (e.g. to remain off, or power on etc.) and will immediately restart the operating back to a running condition (almost as if a reboot instead of shut down were chosen). This is strange behaviour which needs further experimentation and makes shutting down your lab a lot more time consuming – however it can be worked around currently by:

  1. Shut down the ESXi instances individually using host UI/vCenter
  2. Watch the power light on the front panel (assuming no screen attached) – when the power light turns off for approximately 0.5s it is initiating the actual power off, prior to becoming turned back on again
  3. At this point pull the power supply out of the back of the NUC and plug it back in a couple of seconds later – it will remain off instead of rebooting (even if the BIOS setting says on loss of power – power on)

It’s getting hot in here

Lack of fan speed and temperature within ESXi hardware sensors. This is not a new issue but despite the integrated 3D graphics which is now on-chip there still seems to be a lack of information exposed to the operating system (presumably by Intel). In my bookcase vSAN/NSX-T environment it’s becoming a ‘hot topic’ to say the least ;-). Both new and older NUCs are doing fine on the Balanced performance/fan speed setting, and do a good job of spinning up and down the fan whenever the CPU turbo feature engages (up to 4.1GHz on my units), but it would be good to be able to view this more empirically than just watching how many windows need to be opened!

Good resources to check out in all things NUC are William Lam and Florian Grehl.

Citrix XenApp/Desktop LTSR 7.15 Azure catalog creation issues

I came across this problem whilst trying to build a lab scenario with an older version of LTSR 7.15 and wasn’t able to find any similar issues documented elsewhere. Essentially Citrix Studio would not allow me to browse for .vhd files when creating a new catalog from an unmanaged disk located in an Azure storage account.

Here’s the troubleshooting process and solution at the end (spoiler – it’s TLS 1.1, 1.2!)

Trying to create a catalog following successful creation of a hosting connection:


Machine creation wizard error

You might find for instance when examining other storage accounts that you are even able to view the name of any named containers e.g. ‘logs’ located within the storage account object, but no obvious difference is possible.

You might try even using PowerShell to examine the hypervisor connection, and by following along will eventually reach a dead end in the communication with Azure:

Add-PSsnapin Ci*
cd XDHyp:\
cd HostingUnits
(dir).PSChildName

Determine the name of your hosting connection, and change directory into it

cd .\YourHostingUnitName\
(dir).PSChildName

Determine the name of your resource, and change directory into it

cd .\image.folder\
(dir).PSChildName

Determine the name of your Azure resource group, and change directory into it

cd .\YourResourceGroupName.resourcegroup\
(dir).PSChildName

Determine the name of your storage account, and change directory into it

cd .\YourStorageAccountName.storageaccount\
(dir).PSChildName

At this point if you attempt to use dir or get-childitem you will receive an error saying:

An exception occurred. The associated message was Error: Could not receive inventory contents from path

In summary you don’t receive very much information from Citrix Studio which might provide further assistance at troubleshooting the issue. Citrix Host Service will generate an Event ID 1007 message including the text:

Citrix.MachineCreationAPI.MachineCreationException: Error: Could not retrieve inventory contents from path /UK South.region/image.folder/YourResourceGroup.resourcegroup/YourStorageAccount.storageaccount ---> Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (400) Bad Request. ---> System.Net.WebException: The remote server returned an error: (400) Bad Request.
   at System.Net.HttpWebRequest.GetResponse()

The solution took quite some comparison between different working environments until I happened on the cause and eventual solution. That is, that the storage accounts affected were configured by default to use TLS 1.2 as a minimum rather than TLS 1.0. Clearly this isn’t ideal but even relatively recent LTSR 7.15 CU5 (and presumably earlier) does not seem to support TLS 1.2 for this type of API communication with Azure.

Simply locate the storage account and modify the following switch under the Configuration page:

Finally (after waiting 30 seconds or so for the storage account change to take affect you’ll be able to open the storage account and view the unmanaged disk VHD blob.

Correctly working master image wizard selection

Switching to TLS 1.1 support does not improve the situation, it will begin failing again – even though the browser in Windows Server 2016 (with recent updates) supports TLS 1.1 and 1.2. So it appears that the code somewhere is out of date in LTSR 7.15 (either Citrix Studio or PowerShell perhaps).

I’ll update this post if I manage to resolve it using another method, but in my experience after testing this problem goes away with LTSR 1912.

Intel NUC 7th Generation with Thunderbolt Ethernet on ESXi 7

My 3 host ESXi 6.5 home lab was built a couple of years ago now in order to develop some vSAN test scenarios that I was assisting a client with. Now that lab is in the process of being repurposed to become an NSX-T / vSAN lab running ESXi 7.0.

My three NUCs are based on the 7th Generation Intel NUC7i5BNH with 32GB RAM, onboard M2 NVMe SSD cache disk (256GB) and a 512GB SSD capacity disk.

– By the way, in case you’re wondering – it flies, even over 1Gbit ethernet with all-flash vSAN and deduplication/compression turned on. However – don’t use something like this for production as you’ll need to be prepared to lose ~14GB RAM per host for the dedupe in-memory object map and all-flash requires 10Gbit ethernet!

However, back to the issue in hand. One of the first issues I ran in to when assessing the requirements for NSX-T in a fully collapsed cluster (running vSAN, vCenter and NSX Manager) was a need to have two physical network interfaces (pNICs) but my Intel NUC 7th Gen hosts only have a single on-board gigabit ethernet adapter. This isn’t front page news of course as William Lam has been documenting the use of USB based ethernet in lab scenarios for quite some time, originally resulting in a VMware Fling he coauthored with another device driver engineer (Songtao Zheng) at VMware.

Now I began to question with recent releases such as VMware ESXi 7.0 whether any of the drivers or settings mentioned would even be required in order to get a second ethernet adapter working. This post is really just a signpost for people who might be doing similar things in their own labs.

I decided to jump in with both feet by purchasing three Startech USB-C 1-Gbit Ethernet adapters from Amazon UK. These devices use the Realtek RTL8153 chipset. My NUC devices have Thunderbolt 3 interfaces (with the lightning bolt marking) but I was pretty sure that the USB-C connector would work as the interface supports USB 3.1 Gen2 devices.

UPDATE YOUR BIOS – mine hadn’t been touched since 2017 so I went online first and downloaded the latest update for my NUCs, then flashed using a USB key with the .bio file and F7 key during boot.

UPDATE YOUR ESXi release – reading one of William Lam’s posts I found that there was a recent patch release of ESXi 7.0.0b which includes an updated USB driver:

VMware-vmkusb_0.1-1vmw.700.1.25.16324942

As far as I can tell this includes the ability to detect Thunderbolt connected devices amongst other improvements, but this awareness certainly negates the need to disable any existing VMware USB driver (which older posts I’d read had discussed prior to installing the USB ethernet fling).

INSTALL the VMware ESXi 7.0.0 release of the USB Fling:

esxcli software vib install -d '/vmfs/volumes/QNAP_VMFS_DS01/tmp/ESXi700-VMKUSB-NIC-FLING-34491022-component-15873236.zip'

USB Native Driver Fling for VMware ESXi | 0.1-4vmw.700.1.0.34491022

3. Shut down the ESXi host, you’ll need to go into the BIOS at next boot

ENABLE THUNDERBOLT BOOT in BIOS -until you do this you won’t be able to see any USB 3.x network devices. William Lam again has the lead again, with this linked post concerning Thunderbolt 10Gbit adapters on Intel Skull Canyon devices. Enter the BIOS and enable THUNDERBOLT BOOT.

Before enabling this feature you’ll find that 'lsusb -tv' will only show a single USB XHCI root HUB:

4. Save your BIOS settings, connect the Startech USB device and boot ESXi. Once reloaded compare the ‘lsusb -tv’ result with the previous version.

NB – If you find that your USB adapters are only connected at 100Mbit/s then it’s likely that the default ESXi 7.0.0 drivers have been loaded instead of the ones provided in the Fling. You’ll also see that the adapter name is detected as ‘cdce’ instead of ‘uether’. In this case make sure that the drivers are installed correctly and try a reboot with the adapter connected.

[OPTIONAL] if you have any 10 Gbit Thunderbolt adapters you could also use the following steps to add the Marvell drivers. I haven’t actually acquired any of these yet, but the instructions should be good as I’ve tested the installation process itself.

INSTALL the VMware release of the Marvell Atlantic USB driver:

  1. Download the .zip file and upload it to a datastore that your hosts can access.. e.g. /vmfs/volumes/58134191-c9bf8fe8-d464-d067e5e666da/tmp/MRVL-Atlantic-Driver-Bundle_1.0.2.0-1OEM.670.0.0.8169922-offline-bundle-16081713.zip
  2. Enter maintenance mode and vacate the ESXi host
  3. Install the offline bundle VIB using:
esxcli software component apply -d /vmfs/volumes/QNAP_VMFS_DS01/tmp/MRVL-Atlantic-Driver-Bundle_1.0.2.0-1OEM.670.0.0.8169922-offline-bundle-16081713.zip

Native atlantic network driver for VMware ESXi | 1.0.2.0-1OEM.670.0.0.8169922

Upgrading vyOS VMware appliance to latest release

In order to troubleshoot a vyOS issue which we’ve been experiencing lately I attempted to upgrade to the latest vyOS release on a .OVA deployed appliance that was running the older 1.2.1 release.

The vyOS upgrade documentation shows the command required to install a new version is simply:

add system image https://downloads.vyos.io/rolling/current/amd64/vyos-1.2.0-rolling%2B201810030440-amd64.iso

However later on in the same article the command response shows the error:

We do not have enough disk space to install this image!
We need 344880 KB, but we only have 17480 KB.
Exiting…

So what is using the space on the appliance and how can we resolve this issue?

We do not have enough disk space to install this image!

Basically, using ‘sudo du -hs /var’ shows us that 968MB of data is consumed within the /var folder and most of this relates to the wtmp and wtmp.1 files. What are those files? They are simply large binary rolling log files which are written to in order to record any login attempts, with wtmp.1 being the rolled up previous versions which are being retained.

We don’t need anything close to that level of logging in our lab environments, so the following commands modify the retention period and log interval to 1 hour maximum.

sudo nano /etc/logrotate.conf

Edit the lines to change from ‘weekly’ and ‘4’ represent a month’s worth of logs to:

#rotate log files weekly
 hourly
#keep 4 weeks worth of backlogs
 rotate 1

Which should retain a rolling log of any login attempts during the last hour. Once this is done you can delete the previous wtmp.1 rollup, apply the vyOS update and then reboot (once only) in order to apply the latest code version now that you have sufficient space:

sudo rm /var/log/wtmp.1
add system image http://172.20.12.142:8080/vyos-1.2-rolling-201911110217-amd64.iso
sudo reboot

NB – in my example I’m hosting the .ISO file which I downloaded on a simple HTTP web server on the internal network

After you’ve finished the upgrade you could always revert the logging configuration back to the defaults, but the main sticking point here is the limited available space once a rollup of logs has become quite large and I didn’t want to have to fix this again in the future.

Upgrading Citrix XenApp 7.x VDA version using PowerShell

With the advent of XenApp 7 and more recently experiencing the higher frequency of VDA cumulative updates I would generally recommend implementing Citrix Machine Creation Services or other imaging mechanism (such as Provisioning Server) when rolling out new versions of the Virtual Desktop Agent to a large number of catalogs.

However, what happens when you only require one XA server per catalog, or when each one of those servers is handled manually when new application code is deployed? This is more common than you might imagine, especially in Citrix deployments which have per-customer or per-app specific catalogs. The work involved in maintaining a master image can be significant and the serviceability of such relies upon someone knowing how to treat image updates in a way that won’t introduce problems that could arise weeks or months later.

One customer of mine has at least 80 catalogs running one or more XenApp VMs and so it simply doesn’t make sense to maintain a single master image for each, especially when application code updates are delivered frequently. So I set about creating a simple PowerShell script which works in a VMware environment to attach the Citrix upgrade ISO and then run the setup installer within the context of a remote PowerShell session.

Using this method you can easily carry out a bulk upgrade of tens (possibly hundreds) of statically assigned VDAs individually by attaching the ISO and installing the update automatically. The advantage of this time saving approach is that it can even be run in a loop so that the upgrade is only attempted when a server is idle and not running any sessions.

NB – as always, please validate the behaviour of the script in a non-production environment and adjust where necessary to meet your own needs.

Here’s a walkthrough of the script, along with the complete example version included at the end.

  1. The script will load the required plugins from both Citrix and VMware PowerShell modules/plugins (I generally run things like this on the Citrix Delivery Controller and install PowerCLI alongside for convenience)
  2. Request credentials and connect to vCenter via a popup
  3. Request credentials for use with WinRM connections to remote Windows servers via a popup
  4. Create a collection of objects (XA servers) which are powered on, do not have any active sessions and don’t already have the target VDA version installed (see $targetvda variable)
  5. For each VM, sequentially:
    1. Attach the specified .iso image file to the resulting VMs
    2. Determine the drive letter where the XA ISO file has been mounted
    3. Create a command line for the setup installer, and save the command into c:\upgrade_vda.cmd on the XA server
    4. Connect via PowerShell remoting session to the remote XA server
    5. Adjust the EUEM registry node permissions (as per https://support.citrix.com/article/CTX215992)
    6. Execute the c:\upgrade_vda.cmd upgrade script on remote machine via PS session
    7. Disconnect the PowerShell remote session
    8. Reboot the VM via vCenter in order to restart the XA services

Review the script and edit the following variables to reflect your use-case:

$vcentersrv = "yourvcentersrv.domain.com"
$targetvda = '7.15.4000.653'
$isopath = "[DATASTORE] ParentFolderName\XenApp_and_XenDesktop_7_15_4000.iso"

Edit the selection criteria on the VMs which will be upgraded:

$targetvms = Get-BrokerMachine -DesktopKind Shared | Where-Object {($_.AgentVersion -ne $targetvda) -and ($_.PowerState -eq 'On') -and ($_.HostedMachineName -like 'SRV*')}

All servers in my example environment begin with virtual machine names SRV* so this line can be adapted according to the number of VMs which you would like to upgrade, or simply replace with the actual named servers if you want to be more selective:

($_.HostedMachineName -in 'SRV1','SRV2','SRV3')

Finally, consider modifying the following variable from $true to $false in order to actually begin the process of upgrading the selected VMs. I suggest running it in the default $true mode initially in order to validate the initial selection criteria.

$skiprun = $true

Additional work:

I would like additionally to incorporate the disconnection of previous VDA .ISO files from the VM before attempting to upgrade. I have noticed that the attached volume label search e.g. Get-Volume -FileSystemLabel ‘XA and XD*’ that determines the drive letter selection is too wide, and will erroneously detect both XA_7_15_4000.iso and XA_7_15_2000.iso versions without differentiating between them.

I would also like to do further parsing of the installation success result codes in order to decide whether to stop, or simply carry on – however I have used the script on tens of servers without hitting too many roadblocks.

This script could also be adapted to upgrade XenDesktop VDA versions where statically assigned VMs are provided to users.

Final note:

This script does not allow the Citrix installer telemetry to run during the installation because it requires internet access and this generates errors in PowerShell for XenApp servers which can’t talk outbound. You can choose to remove this command line parameter according to your circumstances:

/disableexperiencemetrics

Citrix also optionally collects and uploads anonymised product usage statistics, but again this requires internet access. In order to disable Citrix Telemetry the following setting is used:

/EXCLUDE "Citrix Telemetry Service"

Additionally the Personal vDisk feature is now deprecated, so the script excludes this item in order for it to be removed if it is currently present (so be aware if you’re using PvD):

/EXCLUDE "Personal vDisk"

PowerShell code example:

# Upgrade VDA on remote Citrix servers

if ((Get-PSSnapin -Name "Citrix.Broker.Admin.V2" -ErrorAction SilentlyContinue) -eq $Null){Add-PSSnapin Citrix.Broker.Admin.V2}
if ((Get-PSSnapin -Name "VMware.VimAutomation.Core" -ErrorAction SilentlyContinue) -eq $Null){Add-PSSnapin VMware.VimAutomation.Core}

$vcentersrv = "yourvcentersrv.domain.com"

if ($vmwarecreds -eq $null) {$vmwarecreds = Connect-VIServer -Server $vcentersrv}            # Authenticate with vCenter, you should enter using format DOMAIN\username, then password
if ($creds -eq $null) {$creds = Get-Credential -Message 'Enter Windows network credentials'} # Get Windows network credentials

clear

$targetvda = '7.15.4000.653' #Add the target VDA version number - anything which isn't correct will be upgraded
$isopath = "[DATASTORE] ParentFolderName\XenApp_and_XenDesktop_7_15_4000.iso" #Path to ISO image in VMware
$skiprun = $true #Set this variable to false in order to begin processing all listed VMs

$targetvms = Get-BrokerMachine -DesktopKind Shared | Where-Object {($_.AgentVersion -ne $targetvda) -and ($_.PowerState -eq 'On') -and ($_.HostedMachineName -like 'SRV*')}
Write-Host The following XA VMs will be targeted
Write-Host $targetvms.HostedMachineName
if ($skiprun -eq $true) {write-host Skip run is still enabled; exit}

foreach ($i in $targetvms){

if ($i.AgentVersion -ne $targetvda) {
    Write-Host Processing $i.HostedMachineName found VDA version $i.AgentVersion
    
    if ($i.sessioncount -ne $null) {Write-Host Processing $i.HostedMachineName found $i.sessioncount users are logged on}

    if ($i.sessioncount -eq 0) {#Only continue if there are no logged-on users

        Write-Host Processing $i.HostedMachineName verifying attachment of ISO image
        $cdstate = Get-VM $i.HostedMachineName | Get-CDDrive
        if (($cdstate.IsoPath -ne $isopath) -and ($cdstate -notcontains 'Connected')) { $cdstate | Set-CDDrive -ISOPath $isopath -Confirm:$false -Connected:$true;Write-Host ISO has been attached}

        $s = New-PSSession -ComputerName ($i.MachineName.split('\')[1]) -Credential $creds
            #Create the upgrade command script using correct drive letters
            Write-Host Processing $i.HostedMachineName -NoNewline
            invoke-command -Session $s {
                $drive = Get-Volume -FileSystemLabel 'XA and XD*'
                $workingdir = ($drive.driveletter + ":\x64\XenDesktop Setup\")
                $switches = " /COMPONENTS VDA /EXCLUDE `"Citrix Telemetry Service`",`"Personal vDisk`" /disableexperiencemetrics /QUIET"
                $cmdscript = "`"$workingdir" + "XenDesktopVDASetup.exe`"" + $switches
                Out-File -FilePath c:\upgrade_vda.cmd -InputObject $cmdscript -Force -Encoding ASCII
                Write-Host " wrote script using path" $workingdir
            }
            
            #Adjust the registry permissions remotely
            Write-Host Processing $i.HostedMachineName updating registry permissions
            Invoke-Command -Session $s {
                $acl = Get-Acl "HKLM:\SOFTWARE\Wow6432Node\Citrix\EUEM\LoggedEvents"
                $person = [System.Security.Principal.NTAccount]"Creator Owner"
                $access = [System.Security.AccessControl.RegistryRights]"FullControl"
                $inheritance = [System.Security.AccessControl.InheritanceFlags]"ContainerInherit,ObjectInherit"
                $propagation = [System.Security.AccessControl.PropagationFlags]"None"
                $type = [System.Security.AccessControl.AccessControlType]"Allow"}
            Invoke-Command -Session $s {$rule = New-Object System.Security.AccessControl.RegistryAccessRule($person,$access,$inheritance,$propagation,$type)}
            Invoke-Command -Session $s {$acl.AddAccessRule($rule)}
            Invoke-Command -Session $s {$acl |Set-Acl}
                
            #Execute the command script
            Write-Host Processing $i.HostedMachineName, executing VDA install script
            Invoke-Command -Session $s {& c:\upgrade_vda.cmd} # Runs the upgrade script on remote server
            Remove-PSSession $s #Disconnect the remote PS session
            Restart-VMGuest -VM $i.HostedMachineName -Confirm:$false #Restart the server following either a successful or unsuccessful upgrade
            }
        }
    }