siteadmin

April 19, 2024

Maintaining long lived Tanzu clusters

Very few resources provide real guidance on what to do after creating a cluster using Tanzu TKG, particularly in terms of ongoing maintenance beyond the initial handoff to a developer.

What often happens next is you eventually learn of a problem once the system has long since become stable and adopted for general use, and this puts you straight away on the back-foot in terms of overcoming the issue.

This post concentrates on the kinds of problems you might run into during operational management of a cluster, however it doesn’t claim to capture every such problem, just those which I’ve personally been involved in troubleshooting.

Of course – the VMware documentation should be your go-to place on the first occasion, so do check through the known issues of the release notes for your specific version first before continuing with any other activity.

You can find a link to the general product documentation for Tanzu TKG in the final section, which include the most recent versions by default, and even an archive for older versions.

Do cluster credentials even expire?

Well, this is something which we’re unfortunately going to have to learn, most likely the hard way, and to my mind this is not sufficiently sign-posted within VMware TKG documentation.

Here’s a description of this scenario and the task to update your credentials, using the TKG 2.1 documentation as an example: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/2.1/tkg-deploy-mc/mgmt-manage-index.html

This approach assumes that you can still access the cluster using kubectl commands in order to retrieve the current kubeconfig data of your CLUSTER-NAME:

kubectl -n tkg-system get secrets CLUSTER-NAME-kubeconfig -o 'go-template={{ index .data "value"}}' | base64 -d > mc_kubeconfig.yaml

Once you have obtained this data you’ll want to know when the credentials expire, so use the following method to decode just the client certificate element first and then use openssl to extract the date elements:

kubectl -n tkg-system get secrets CLUSTER-NAME-kubeconfig -o 'go-template={{ index .data "value"}}' | base64 -d | grep client-certificate-data | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

notBefore=Aug 21 15:26:11 2023 GMT notAfter=Feb 19 03:31:32 2025 GMT

Compare these dates (obtained from the cluster) to what you have stored locally (held within your current kubeconfig context) using:

kubectl config view --raw --minify | grep client-certificate-data | awk '{print $2}' | base64 -d | openssl x509 -noout -dates

The output should be the same, but if it is not then you can update your local kubeconfig file copy of the cluster’s data using the mc_kubeconfig.yaml file outputted earlier.

Now is a good time to make a date in the diary to either upgrade your Tanzu TKG implementation or manually rotate the certificates before this date arrives. Please refer to the general guidance here kb.vmware.com concerning rotation.

Thankfully this issue has been resolved in TKG 2.1.x via the auto-renew feature which can be retrospectively changed by editing the cluster object:

https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/2.1/using-tkg/workload-clusters-secret.html#auto-renew

Misplacing the keys to the castle

Losing cluster admin credentials

Individual client certificates stored within Kubeconfig files generally expire after 6 months, and the kubeadm generated certs (seen below) automatically expire within 365 days of the cluster being built. Only the three certificate-authority certs created within the Kubernetes cluster last 10 years by default.

Here is the output generated from a control-plane node using the command:

kubeadm alpha certs check-expiration

This shows the output from a cluster created a few minutes ago, hence <365d residual time remaining.

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Mar 22, 2025 10:11 UTC   364d            ca                      no
apiserver                  Mar 22, 2025 10:11 UTC   364d            ca                      no
apiserver-etcd-client      Mar 22, 2025 10:11 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Mar 22, 2025 10:11 UTC   364d            ca                      no
controller-manager.conf    Mar 22, 2025 10:11 UTC   364d            ca                      no
etcd-healthcheck-client    Mar 22, 2025 10:11 UTC   364d            etcd-ca                 no
etcd-peer                  Mar 22, 2025 10:11 UTC   364d            etcd-ca                 no
etcd-server                Mar 22, 2025 10:11 UTC   364d            etcd-ca                 no
front-proxy-client         Mar 22, 2025 10:11 UTC   364d            front-proxy-ca          no
scheduler.conf             Mar 22, 2025 10:11 UTC   364d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Mar 25, 2033 11:44 UTC   9y              no
etcd-ca                 Mar 25, 2033 11:44 UTC   9y              no
front-proxy-ca          Mar 25, 2033 11:44 UTC   9y              no

Upcoming kubeadm certs expiry

If you are now approaching the expiry date of the kubeadm certificates and have not automatically renewed these certs, do so by either completing an upgrade to a newer Tanzu version, or by control plane scaling in which case they can be automatically rotated whilst remaining in the current state.

Manual rotation process is described here: https://kb.vmware.com/s/article/86251 – BUT crucially this requires SSH access to at least one of your control plane nodes. See the section below on losing access for some possible recovery options.

Once you have rotated the certificates on the control-plane don’t forget to also refresh the content of the following two sections,

client-certificate-data: [updated data from admin.conf]
client-key-data: [updated data from admin.conf]

in BOTH the files of your tanzu CLI machine below: (used for kubectl workload contexts and tanzu login management contexts respectively)

~/.kube/config
~/.kube-tkg/config

After the management console side of things are refreshed correctly you can then have confidence of another year’s administrative access. Now move on to rotate the certificates on any worker clusters which that management cluster also oversees using the same process (but you don’t need to update the tanzu CLI version of the kubeconfig file for worker clusters since you don’t log in to them).

Retrieve admin credentials before expiry

Approaching the time of your individual kubeconfig client-certificate expiry you simply need to retrieve a new file using tanzu cluster kubeconfig get clustername --admin --export-file new_file_name (or the management cluster equivalent) command. This will provide a new kubeconfig file which will last typically 6 months from the date of issue.

However, if you don’t refresh your admin credentials periodically then you may eventually find that after 365 days of operating a stable cluster that you no-longer have access to it at all via kubectl or tanzu login commands.

In situations where your computers are kept isolated from the internet, you might run into a problem where, without prior notice, you can’t use your kubeconfig file to talk to your cluster. This is especially true if you’re not updating your system more than once a year.

It is recommended to maintain awareness of the cluster’s certificate expiry dates and complete rotation beforehand, including to refresh your kubeconfig file via tanzu CLI. If you are told about a cluster’s access expiring after the fact but if you still have SSH access then all is not lost. You should then connect to the cluster control-plane and carry out manual rotation then retrieve the updated content from the /etc/kubernetes/admin.conf file and place the data into your local kubeconfig file.

Losing SSH access

What if you lose access via SSH? In the TKG 1.6.x release a security hardening issue https://kb.vmware.com/s/article/90368 can cause attempts to logon via capv@controlplaneIPaddress to fail, requiring additional edits to the cluster’s kcp and kubeadmtemplate before you will be able to log in.

An alternative approach might be to scale your control plane nodes using vertical scaling, e.g. by modifying the size of a node’s attached hard disk, CPU or memory spec for instance. This process is described here: https://kb.vmware.com/s/article/91164. By scaling your control plane new VMs will be spun up, each with the possibility of regaining access via 60 day SSH access (Ubuntu) or 90 days (Photon). This security feature can be disabled (see Method 2 in the referenced document) but only once you have regained connectivity.

More desperate measures might be required in the event that both your kubeconfig and tanzu login access is no longer possible. I do not confirm nor recommend the manual removal (via vCenter) of a control-plane VM – in the event that you lose both SSH and tanzu CLI access, but my suspicion is that if there are more than one control-plane nodes the KubeadmControlPlane will no longer match the running spec and a new node will be provisioned, along with SSH access reinstated. This is something I aim to test further.

Contour package certificate expiry

VMware’s documentation for installing the Contour package into a worker cluster is rather generic, but one of the ways in which you can extend your workloads is by installing the Contour/Envoy ingress controller along with some default values. This scenario is described as a CLI managed package, beyond which the basic process for installation is detailed below (for TKG 1.6):

tanzu package available list contour.tanzu.vmware.com
tanzu package available get contour.tanzu.vmware.com/1.20.2+vmware.1-tkg.1 --generate-default-values-file

Within the generated default values file is a snippet which defines a TLS certificate lifetime which is consumed by Envoy and Contour pods when communicating with each other over gRPC protocol.

certificates:
  duration: 8760h
  renewBefore: 360h

The first value defines the period after which these internal-use Contour certificates will expire, and the renewal period before which they should be updated. However this can cause a strange problem with Envoy pods if you installed Contour a couple of days after spinning up a new cluster – you will see something like:

StreamListeners gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE

https://kb.vmware.com/s/article/90811 details the solution, which is simply to remove the secrets and have the package automatically recreate them. I have also deinstalled the package using the tanzu CLI and reinstalled it without any other problems, however you should be aware that your cluster’s certificates might expire before Contour has recreated the certificates.

List of resources and useful pages

https://docs.vmware.com/en/VMware-Tanzu/index.html

https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/index.html

https://docs.vmware.com/en/VMware-Tanzu-Packages/2024.2.1/tanzu-packages/ref.html

https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/services/tkg-doc-archive-2x.zip – this is a ZIP file archive of the TKG 2.x version documentation

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs

Avi NSX licensing resources

https://avinetworks.com/docs/latest/nsx-alb-license-editions

https://docs.vmware.com/en/VMware-NSX-Advanced-Load-Balancer/22.1/Administration_Guide/GUID-B5EC8F3B-A75E-4809-A653-6EBE08CFED81.html – Avi licensing

August 19, 2021

PowerShell walkthrough – Citrix FAS certificate renewal

Citrix Federated Authentication Service (FAS) allows SAML based authentication tokens to be used when accessing StoreFront resources via Citrix Gateway.

In many established installations the certificates issued to the FAS server(s) will eventually expire, typically after 2 years. A simple GUI tool can be used to ‘Reauthorize’ an expired domain registration authorization certificate in this event, but an alternative PowerShell route is available to Citrix administrators so that certificates can be renewed in advance.

Citrix’s documentation proposes the following sequence of commands, without referencing the required parameters or source of information:

Create a new authorization certificate: New-FasAuthorizationCertificate
Note the GUID of the new authorization certificate, as returned by: Get-FasAuthorizationCertificate
Place the FAS server into maintenance mode: Set-FasServer –Address <FAS server> -MaintenanceMode $true
Swap the new authorization certificate: Set-FasCertificateDefinition –AuthorizationCertificate <GUID>
Take the FAS server out of maintenance mode: Set-FasServer –Address <FAS server> -MaintenanceMode $false
Delete the old authorization certificate: Remove-FasAuthorizationCertificate

Whilst this might be sufficient if you have a fair degree of confidence with PowerShell it might not be enough if you’re faced with an expired certificate and hundreds of users trying to log in.

I have used the following sequence successfully recently and hope that it will be useful to others.

NB – this example is provided ‘as-is’ and you remain responsible for understanding the effect of each command and detecting when the output doesn’t match your own scenario.

The following colourised convention applies throughout, ensure that you do not copy and paste these values without updating them:

Original FAS certificate ID reference
New FAS certificate ID reference
Certificate authority reference

Open PowerShell on the FAS server for which you want to update the registration certificate.
Add the Citrix commandlets into the PowerShell session:

Add-PSSnapin Citrix.Authentication.FederatedAuthenticationService.V1

Create a variable to hold the local FAS server’s address (if this is the second FAS server in a group of more than one, replace [0] with [1] below:

$CitrixFasAddress=(Get-FasServer)[0].Address

Address : yourfasnode01.yourdomain.com
Index : 0
Version : 1
MaintenanceMode : False
AdministrationACL : O:BAG:DUD:P(A;OICI;SW;;;BA)

Get the existing FAS certificate ID

Get-FasAuthorizationCertificate

Id : 1c67270b-d2f4-4543-919b-519cb5470612
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : bb6b4e47-c5b3-4a6a-9a50-eb6a02a05c3c
CertificateRequest :
Status : MaintenanceDue

Generate a new FAS certificate request against the CA. Both the existing certificate and new certificate request IDs will be shown.

New-FasAuthorizationCertificate -CertificateAuthority yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA -CertificateTemplate Citrix_RegistrationAuthority

Id : 1c67270b-d2f4-4543-919b-519cb5470612
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : bb6b4e47-c5b3-4a6a-9a50-eb6a02a05c3c
CertificateRequest :
Status : MaintenanceDue

Id : 2c113327-1c73-2ca4-44a3-3c12da3963b5
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : 66a8d3fe-7bdb-4003-8220-cd11f7685b92
CertificateRequest :
Status : WaitingForApproval

Log in to the certificate authority and locate the pending certificate request. Select the item, right click and choose and choose ‘Issue’. Wait a minute or two then continue.
Repeat the process to retrieve the FAS authorisation certificates and notice that the status of the newly issued one should have changed from ‘WaitingForApproval’ to ‘Ok’.

Get-FasAuthorizationCertificate

Id : 1c67270b-d2f4-4543-919b-519cb5470612
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : bb6b4e47-c5b3-4a6a-9a50-eb6a02a05c3c
CertificateRequest :
Status : MaintenanceDue

Id : 2c113327-1c73-2ca4-44a3-3c12da3963b5
Address : yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA
TrustArea : 66a8d3fe-7bdb-4003-8220-cd11f7685b92
CertificateRequest :
Status : Ok

Set the local FAS server into maintenance mode:

Set-FasServer -Address $CitrixFasAddress -MaintenanceMode $true

Get the FAS certificate definition rule, this points at the existing FAS authorisation certificate:

Get-FasCertificateDefinition

Name : default_Definition
CertificateAuthorities : {yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA}
MsTemplate : Citrix_SmartcardLogon
AuthorizationCertificate : 1c67270b-d2f4-4543-919b-519cb5470612
PolicyOids : {}
InSession : False

Create a variable to store the FAS certificate authority address:

$DefaultCA=(Get-FasMsCertificateAuthority -Default).Address

Update the existing FAS certificate definition to use the new FAS certificate ID

Set-FasCertificateDefinition -Name default_Definition -AuthorizationCertificate 2c113327-1c73-2ca4-44a3-3c12da3963b5

Get the FAS certificate definition rule, this should now point at the new FAS authorisation certificate:

Get-FasCertificateDefinition

Name : default_Definition
CertificateAuthorities : {yourdomainca01.yourdomain.com\yourcompany-yourdomainca01-CA}
MsTemplate : Citrix_SmartcardLogon
AuthorizationCertificate : 2c113327-1c73-2ca4-44a3-3c12da3963b5
PolicyOids : {}
InSession : False

Remove the maintenance mode flag on the local FAS server:

Set-FasServer -Address $CitrixFasAddress -MaintenanceMode $false

Remove the original FAS authorisation certificate (no longer required)

Remove-FasAuthorizationCertificate -Id 1c67270b-d2f4-4543-919b-519cb5470612

August 6, 2021

Citrix Advanced Session policy equivalents of default Classic expressions

A customer of mine recently asked for some help understanding why Citrix Gateway was not allowing external logons anymore, possibly combined with a recent upgrade to Citrix ADC VPX 13.0 Build 82.42.

He pointed out that there was an entry within the ns.log file which complained about a problem with ‘Ica mode status’, shown below:

Aug 6 11:39:59 192.168.200.191 08/06/2021:09:39:59 GMT citrix-netscaler 0-PPE-0 : default SSLVPN Message 586 0 : "Ica mode status is not okay"

Investigating further we could identify both successful LDAP authentication (basic LDAP auth attached directly to the Citrix Gateway vserver) and STA lookup, but the ADC wasn’t actually requesting any pages from the Storefront server URL defined in the session profile.

Searching for the error itself yielded one result which referred in particular to ‘Ica mode status’ :

https://support.citrix.com/article/CTX291268

Point #2 in the solution referred to switching the Classic expression in the session policy to an Advanced policy, however you cannot modify an existing policy without it switching back to the original setting. In order to bypass this limitation, create new session policies which use the Advanced expression equivalents to those created by the Citrix XenApp and XenDesktop ADC wizard available in the appliance.

See below screenshot for the before (first 2) and after (latter 2) Classic/Advanced equivalents.

Before (classic)

add vpn sessionPolicy PL_OS_192.168.200.190 "REQ.HTTP.HEADER User-Agent CONTAINS CitrixReceiver" AC_OS_192.168.200.190 add vpn sessionPolicy PL_WB_192.168.200.190 "REQ.HTTP.HEADER User-Agent NOTCONTAINS CitrixReceiver && REQ.HTTP.HEADER Referer EXISTS" AC_WB_192.168.200.190

After (advanced)

add vpn sessionPolicy PL_OS_192.168.200.190_Advanced "HTTP.REQ.HEADER(\"User-Agent\").CONTAINS(\"CitrixReceiver\")" AC_OS_192.168.200.190 add vpn sessionPolicy PL_WB_192.168.200.190_Advanced "HTTP.REQ.HEADER(\"User-Agent\").CONTAINS(\"CitrixReceiver\").NOT" AC_WB_192.168.200.190

Once the Advanced expression policies are bound to the vserver and the original Classic expressions have been removed – the initial problem is resolved and StoreFront loads successfully.

Whilst Citrix are advising that Citrix classic expression policies will be deprecated in ADC 13.1 it appears that some issues relating to session policies have crept in at/before 13.0 Build 82.42 which need to be carefully managed.

NB. It is possible to use a Citrix Advanced session policy with the Citrix ADC Gateway VPX license in this way. This isn’t the same as enabling nFactor Advanced Authentication policies as detailed by Carl Stalhood here: https://www.carlstalhood.com/nfactor-authentication-for-netscaler-gateway-12/

April 12, 2021

Lab problems with Intel NUC 11th Generation hardware with VMware ESXi 7.0.1

This is a placeholder posting for ongoing updates as and when new updates/resolutions are found. It isn’t intended to provide any additional detail to the problems outlined but simply to document the areas where bugs or ‘gotchas’ are located.

I have recently acquired several Intel NUC 11th Generation (NUC11TNHv50L) for my lab/testing environment which are being deployed into an existing vSAN/NSX-T environment as a workload domain. The release of these latest NUCs seemed to have generated a lot of interest with different community members discussing the ideal fit with NSX-T (due to the dual 2.5 Gbit/s Intel I225-LM NICs which come in the Pro version), however there are a couple of limitations that make this not a smooth ride currently.

Community networking driver and workarounds

Out of the box these NUCs are not supported with VMware ESXi and rely upon the Community Networking Driver Fling. Therefore before purchasing these devices for your home lab be aware that this fling:

Requires a custom ESXi image to be created which includes the Community Networking Driver
Does not support jumbo frames (e.g. >1500 byte MTU) – which in my view prevents any serious use with the NSX-T Geneve protocol which is typically 1600 byte minimum
Causes the network interface to become disconnected (link layer communication fails) if configured MTU is greater than 1500, which only recovers after a reboot
Seems to cause a purple screen (PSOD) failure when the second NIC is connected (under undefined circumstances currently)

Currently I am overcoming the NSX-T frame size issue by using the Startech USB 3.1 1Gbit/s USB network adapters, but this requires an additional fling to be installed. As a compromise it’s not too bad, since there are two Thunderbolt/USB-C ports on these NUCs allow up to two additional 1Gbit/s interfaces to be attached. So I am configuring my ESXi hosts as:

1 x Onboard Intel I225-LM at 2.5 Gbit/s – dvSwitch 1 (Management, vSAN)

1 x StarTech USB 3.1 adapter at 1Gbit/s – dvSwitch 2 (NSX-T, vMotion)

Power off and shut down

In addition it seems that when ‘Shut down’ of an ESXi host is performed the system ignores the BIOS power setting (e.g. to remain off, or power on etc.) and will immediately restart the operating back to a running condition (almost as if a reboot instead of shut down were chosen). This is strange behaviour which needs further experimentation and makes shutting down your lab a lot more time consuming – however it can be worked around currently by:

Shut down the ESXi instances individually using host UI/vCenter
Watch the power light on the front panel (assuming no screen attached) – when the power light turns off for approximately 0.5s it is initiating the actual power off, prior to becoming turned back on again
At this point pull the power supply out of the back of the NUC and plug it back in a couple of seconds later – it will remain off instead of rebooting (even if the BIOS setting says on loss of power – power on)

It’s getting hot in here

Lack of fan speed and temperature within ESXi hardware sensors. This is not a new issue but despite the integrated 3D graphics which is now on-chip there still seems to be a lack of information exposed to the operating system (presumably by Intel). In my bookcase vSAN/NSX-T environment it’s becoming a ‘hot topic’ to say the least ;-). Both new and older NUCs are doing fine on the Balanced performance/fan speed setting, and do a good job of spinning up and down the fan whenever the CPU turbo feature engages (up to 4.1GHz on my units), but it would be good to be able to view this more empirically than just watching how many windows need to be opened!

Good resources to check out in all things NUC are William Lam and Florian Grehl.

November 27, 2020

Citrix XenApp/Desktop LTSR 7.15 Azure catalog creation issues

I came across this problem whilst trying to build a lab scenario with an older version of LTSR 7.15 and wasn’t able to find any similar issues documented elsewhere. Essentially Citrix Studio would not allow me to browse for .vhd files when creating a new catalog from an unmanaged disk located in an Azure storage account.

Here’s the troubleshooting process and solution at the end (spoiler – it’s TLS 1.1, 1.2!)

Trying to create a catalog following successful creation of a hosting connection:

You might find for instance when examining other storage accounts that you are even able to view the name of any named containers e.g. ‘logs’ located within the storage account object, but no obvious difference is possible.

You might try even using PowerShell to examine the hypervisor connection, and by following along will eventually reach a dead end in the communication with Azure:

Add-PSsnapin Ci*
cd XDHyp:\
cd HostingUnits
(dir).PSChildName

Determine the name of your hosting connection, and change directory into it

cd .\YourHostingUnitName\
(dir).PSChildName

Determine the name of your resource, and change directory into it

cd .\image.folder\
(dir).PSChildName

Determine the name of your Azure resource group, and change directory into it

cd .\YourResourceGroupName.resourcegroup\
(dir).PSChildName

Determine the name of your storage account, and change directory into it

cd .\YourStorageAccountName.storageaccount\
(dir).PSChildName

At this point if you attempt to use dir or get-childitem you will receive an error saying:

An exception occurred. The associated message was Error: Could not receive inventory contents from path

In summary you don’t receive very much information from Citrix Studio which might provide further assistance at troubleshooting the issue. Citrix Host Service will generate an Event ID 1007 message including the text:

Citrix.MachineCreationAPI.MachineCreationException: Error: Could not retrieve inventory contents from path /UK South.region/image.folder/YourResourceGroup.resourcegroup/YourStorageAccount.storageaccount ---> Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (400) Bad Request. ---> System.Net.WebException: The remote server returned an error: (400) Bad Request.
   at System.Net.HttpWebRequest.GetResponse()

The solution took quite some comparison between different working environments until I happened on the cause and eventual solution. That is, that the storage accounts affected were configured by default to use TLS 1.2 as a minimum rather than TLS 1.0. Clearly this isn’t ideal but even relatively recent LTSR 7.15 CU5 (and presumably earlier) does not seem to support TLS 1.2 for this type of API communication with Azure.

Simply locate the storage account and modify the following switch under the Configuration page:

Finally (after waiting 30 seconds or so for the storage account change to take affect you’ll be able to open the storage account and view the unmanaged disk VHD blob.

Correctly working master image wizard selection

Switching to TLS 1.1 support does not improve the situation, it will begin failing again – even though the browser in Windows Server 2016 (with recent updates) supports TLS 1.1 and 1.2. So it appears that the code somewhere is out of date in LTSR 7.15 (either Citrix Studio or PowerShell perhaps).

I’ll update this post if I manage to resolve it using another method, but in my experience after testing this problem goes away with LTSR 1912.

July 27, 2020

Intel NUC 7th Generation with Thunderbolt Ethernet on ESXi 7

My 3 host ESXi 6.5 home lab was built a couple of years ago now in order to develop some vSAN test scenarios that I was assisting a client with. Now that lab is in the process of being repurposed to become an NSX-T / vSAN lab running ESXi 7.0.

My three NUCs are based on the 7th Generation Intel NUC7i5BNH with 32GB RAM, onboard M2 NVMe SSD cache disk (256GB) and a 512GB SSD capacity disk.

– By the way, in case you’re wondering – it flies, even over 1Gbit ethernet with all-flash vSAN and deduplication/compression turned on. However – don’t use something like this for production as you’ll need to be prepared to lose ~14GB RAM per host for the dedupe in-memory object map and all-flash requires 10Gbit ethernet!

However, back to the issue in hand. One of the first issues I ran in to when assessing the requirements for NSX-T in a fully collapsed cluster (running vSAN, vCenter and NSX Manager) was a need to have two physical network interfaces (pNICs) but my Intel NUC 7th Gen hosts only have a single on-board gigabit ethernet adapter. This isn’t front page news of course as William Lam has been documenting the use of USB based ethernet in lab scenarios for quite some time, originally resulting in a VMware Fling he coauthored with another device driver engineer (Songtao Zheng) at VMware.

Now I began to question with recent releases such as VMware ESXi 7.0 whether any of the drivers or settings mentioned would even be required in order to get a second ethernet adapter working. This post is really just a signpost for people who might be doing similar things in their own labs.

I decided to jump in with both feet by purchasing three Startech USB-C 1-Gbit Ethernet adapters from Amazon UK. These devices use the Realtek RTL8153 chipset. My NUC devices have Thunderbolt 3 interfaces (with the lightning bolt marking) but I was pretty sure that the USB-C connector would work as the interface supports USB 3.1 Gen2 devices.

UPDATE YOUR BIOS – mine hadn’t been touched since 2017 so I went online first and downloaded the latest update for my NUCs, then flashed using a USB key with the .bio file and F7 key during boot.

UPDATE YOUR ESXi release – reading one of William Lam’s posts I found that there was a recent patch release of ESXi 7.0.0b which includes an updated USB driver:

VMware-vmkusb_0.1-1vmw.700.1.25.16324942

As far as I can tell this includes the ability to detect Thunderbolt connected devices amongst other improvements, but this awareness certainly negates the need to disable any existing VMware USB driver (which older posts I’d read had discussed prior to installing the USB ethernet fling).

INSTALL the VMware ESXi 7.0.0 release of the USB Fling:

esxcli software vib install -d '/vmfs/volumes/QNAP_VMFS_DS01/tmp/ESXi700-VMKUSB-NIC-FLING-34491022-component-15873236.zip'

USB Native Driver Fling for VMware ESXi | 0.1-4vmw.700.1.0.34491022

3. Shut down the ESXi host, you’ll need to go into the BIOS at next boot

ENABLE THUNDERBOLT BOOT in BIOS -until you do this you won’t be able to see any USB 3.x network devices. William Lam again has the lead again, with this linked post concerning Thunderbolt 10Gbit adapters on Intel Skull Canyon devices. Enter the BIOS and enable THUNDERBOLT BOOT.

Before enabling this feature you’ll find that 'lsusb -tv' will only show a single USB XHCI root HUB:

4. Save your BIOS settings, connect the Startech USB device and boot ESXi. Once reloaded compare the ‘lsusb -tv’ result with the previous version.

NB – If you find that your USB adapters are only connected at 100Mbit/s then it’s likely that the default ESXi 7.0.0 drivers have been loaded instead of the ones provided in the Fling. You’ll also see that the adapter name is detected as ‘cdce’ instead of ‘uether’. In this case make sure that the drivers are installed correctly and try a reboot with the adapter connected.

[OPTIONAL] if you have any 10 Gbit Thunderbolt adapters you could also use the following steps to add the Marvell drivers. I haven’t actually acquired any of these yet, but the instructions should be good as I’ve tested the installation process itself.

INSTALL the VMware release of the Marvell Atlantic USB driver:

Download the .zip file and upload it to a datastore that your hosts can access.. e.g. /vmfs/volumes/58134191-c9bf8fe8-d464-d067e5e666da/tmp/MRVL-Atlantic-Driver-Bundle_1.0.2.0-1OEM.670.0.0.8169922-offline-bundle-16081713.zip
Enter maintenance mode and vacate the ESXi host
Install the offline bundle VIB using:

esxcli software component apply -d /vmfs/volumes/QNAP_VMFS_DS01/tmp/MRVL-Atlantic-Driver-Bundle_1.0.2.0-1OEM.670.0.0.8169922-offline-bundle-16081713.zip

Native atlantic network driver for VMware ESXi | 1.0.2.0-1OEM.670.0.0.8169922

January 31, 2020March 24, 2023

Deploying Citrix Ingress Controller with Kubernetes

Citrix Ingress Controller is a niche but seriously interesting innovation from Citrix – developed in order to bring an enhanced application delivery capability to the Kubernetes container orchestration platform. This article is intended to communicate some basics of Kubernetes and ingress using Citrix ADC, but more-so to highlight some specific gaps in the documentation which are no longer appropriate for Kubernetes 1.16 and above due to API changes .

Many Citrix application and networking users will already be familiar with the hardware based or virtual Citrix NetScaler or ADC platforms, bringing L4 through to L7 load balancing, URL responder and rewrite features (amongst others) to conventional or virtualised networking environments. What you now have with Citrix Ingress Controller with ADC MPX/VPX is the ability to integrate Kubernetes with your existing ADCs, or introduce Citrix ADC CPX containerised NetScaler(s) such that you are able to deploy transient containerised NetScaler ADC instances within your Kubernetes platform enabling per-application networking services.

What is great about this solution is the way that it creates an automated API interface between Kubernetes and Citrix’s Nitro REST API of NetScaler. When a new containerised app is presented to the outside via a specially annotated ingress CIC will instantly create load balancing and content switching vservers along with rewrite rules for you, and even update/remove them when your container is modified or removed. This takes all of the manual work out of updating your ADC configuration on a per-app basis.

There are two basic ways in which to incorporate Citrix ADC into Kubernetes, namely ‘north-south’ and ‘east-west’ options. Familiar ingress solutions such as NGINX are often used within Kubernetes to attach the container networking stack to the outside world, since pod networking is normally completely abstracted from the user network in order to facilitate clean application separation. In a ‘north-south’ implementation you can think of the ingress controller (e.g. NGINX or Citrix ADC) as the front door to your application, with the remaining container based application networking presented through service endpoints within the backend network.

In an ‘east-west’ topology you can implement Citrix ADC CPX as a side-car to your container application in order to provide advanced ADC features within the Kubernetes network to enhance inter-container communication. This is a more advanced topology, but nonetheless directly intended for deployment within the Kubernetes infrastructure as a container. Citrix have a nice series of diagrams which highlight the tier 1 and tier 2 scenarios here.

Prerequisites

I’m going to be talking about bare-metal scenarios here rather than cloud based environments such Azure AKS, however to user these examples you will need to have created a Kubernetes 1.16 cluster first and be able to interact with it using kubectl. I have been using Rancher in order to build my Kubernetes clusters on vSphere, which in itself is a whole other subject which I hope to return to in a different post.. but you could always use something like MiniKube running within a desktop hypervisor (let me know how you get on!).

In order to use the implementation examples below you will need to have deployed a Citrix NetScaler MPX or VPX v12.1 / 13 in your network which is able to communicate with the Kubernetes API and cluster nodes. My lab uses a flat network range of 192.168.0.0/24 for instance, in which case the Kubernetes API is available on the same network as my NetScaler. However the backend pod networks are in the range 10.42.x.0/24 where each node hosts a separate range. Citrix Ingress Controller will take care of adding the network routes to these backend networks so they don’t have to be reachable from your desktop.

For the purposes of a lab type exercise it doesn’t matter if your Citrix ADC is used for other features, e.g. LB, Citrix Gateway because Citrix Ingress Controller will complement your infrastructure without replacing any of the existing configuration. It’s probably not a great idea to launch straight into this using your Production ADC instance though, best stick to the lab environment!

Create a system user on Citrix ADC

Your Citrix Ingress Controller will talk to NetScaler Nitro API directly using a user account which you define within Kubernetes. Perhaps you will use an existing user, or create a new one. For instance the following command will create a new user called cic on the NetScaler and create a new command policy:

add system user cic my-password

add cmdpolicy cic-policy ALLOW “^(?!shell)(?!sftp)(?!scp)(?!batch)(?!source)(?!.*superuser)(?!.*nsroot)(?!install)(?!show\s+system\s+(user|cmdPolicy|file))(?!(set|add|rm|create|export|kill)\s+system)(?!(unbind|bind)\s+system\s+(user|group))(?!diff\s+ns\s+config)(?!(set|unset|add|rm|bind|unbind|switch)\s+ns\s+partition).*|(^install\s*(wi|wf))|(^(add|show)\s+system\s+file)”

NB I’ve seen a problem with the above where the command might error out with an error concerning unexpected quotes character, it doesn’t seem to interfere with the creation of the command policy though.

In case you have any difficulties whilst attempting to recreate the steps in this post you can always try first using the ‘superuser’ command policy and then refine it until it matches the command permissions that you’re comfortable with.

In addition to this you may need to add additional rewrite module permissions if you’re going to use the rewrite CRDs, you can just tack these on to the end of the existing definition before the final quote mark:

(^(?!rm)\S+\s+rewrite\s+\S+)|(^(?!rm)\S+\s+rewrite\s+\S+\s+.*)

Finally, bind the newly created command policy to your new cic user.

bind system user cic cic-policy 0

Deploy Citrix Ingress Controller using YAML

This section is slightly different to that which is outlined in the actual Citrix Ingress Controller instructions. Please take care to understand the differences, they are mainly due to a desire to create better separation between components and configuration settings.

Create a new namespace to hold the secret and other CIC components. The commands below show the namespace entry in bold in case you choose to omit this and just place the components in the default namespace. It’s up to you, but for tidiness I created a namespace.

kubectl create namespace ingress-citrix

Create a new Kubernetes secret to store your Nitro API username and password. Using kubectl connect to your cluster and create a new secret to store the data.

kubectl create secret generic nslogin --from-literal=username=cic --from-literal=password=mypassword -n ingress-citrix

In my testing I ran into what I think is a Citrix documentation error for the above command where they show using single quotes around the name cic and mypassword values. Kubernetes converts these values into base64 encoding before they are stored, and might also include the quotes in the final value if you’re not careful. In fact that messed up my configuration for a while until I converted the secret back into its original content, using:

kubectl get secret nslogin -n ingress-citrix -o=yaml

Take the values for password: and username: from the secret and pass them through a base64 decoder just to check that this hasn’t happened (there are also various web sites which can do this for you) by using the following Linux/MacOS command for either the username or password taken from the YAML form above.

echo bXlwYXNzd29yZA== | base64 --decode

Using this source file as a reference, modify/add the following entries (shown in bold) within the file in order to add the name of your namespace:

kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
   name: cic-k8s-role
 roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: cic-k8s-role
 subjects:
 kind: ServiceAccount
 name: cic-k8s-role
 namespace: ingress-citrix

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cic-k8s-role
  namespace: ingress-citrix

apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: cic-k8s-ingress-controller
   namespace: ingress-citrix
 (entry continues)

Be aware – the default CIC configuration creates a cluster role which will see events across the whole system, however this can be deliberately (or mistakenly) restricted to only watching API events in specific namespaces if your role contains:

kind: Role

instead of:

kind: ClusterRole

or if you add a NAMESPACE environment variable when defining the env: section of your CIC deployment manifest.

Finally, add/edit the following entries to define how to contact your Citrix ADC i.e. the NetScaler management IP (NS_IP) and virtual server IP (NS_VIP) to be used for LB/content switching your ingress (the front door)

env:
         # Set NetScaler NSIP/SNIP, SNIP in case of HA (mgmt has to be enabled) 
         - name: "NS_IP"
           value: "192.168.0.99"
         - name: "NS_VIP"
           value: "192.168.0.110"
         - name: "LOGLEVEL"
           value: "INFO"
args:
           - --ingress-classes
             citrix
           - --feature-node-watch
             true

NB – the --feature-node-watch option allows NetScaler to create routes automatically in order to reach the backend pod network addresses

NB – the LOGLEVEL default value is DEBUG, you might want to leave this as an unspecified value until you’re happy with the functionality, and then change it to INFO as above.

The version of Citrix Ingress Controller is specified within this YAML file, hence if you wish to upgrade your CIC version it can be modified and redeployed (as long as no other changes to your deployment are required)

image: "quay.io/citrix/citrix-k8s-ingress-controller:1.6.1"

After updating the above entries as citrix-k8s-ingress-controller.yaml save the modified YAML file and then deploy it using kubectl

kubectl create -f citrix-k8s-ingress-controller.yaml

Check that your Citrix Ingress Controller container has deployed correctly:

kubectl get pods -n ingress-citrix

NB – in the following examples you can ignore the rancherpart of the above command, the kubectl statements are being proxied through Rancher in order to reach the correct cluster

Validate the installation of Citrix Ingress Controller

Once CIC is online you can access the logs generated by the container by switching the name of your container into the following command:

kubectl logs cic-k8s-ingress-controller-9bdf7f885-hbbjb -n ingress-citrix

You’ll want to see the following highlighted section within the log file which shows that CIC was able to connect to the Nitro interface and create a test vserver (which coincidentally validates that it was able to locate and use the secret which was created to store the credentials!):

2020-01-10 10:45:50,144  - INFO - [nitrointerface.py:_test_user_edit_permission:3729] (MainThread) Processing test user permission to edit configuration
 2020-01-10 10:45:50,144  - INFO - [nitrointerface.py:_test_user_edit_permission:3731] (MainThread) In this process, CIC will try to create a dummy LB VS with name k8s-dummy_csvs_to_test_edit.deleteme
 2020-01-10 10:45:50,174  - INFO - [nitrointerface.py:_test_user_edit_permission:3756] (MainThread) Successfully created test LB k8s-dummy_csvs_to_test_edit.deleteme  in NetScaler
 2020-01-10 10:45:50,188  - INFO - [nitrointerface.py:_test_user_edit_permission:3761] (MainThread) Finished processing test user permission to edit configuration
 2020-01-10 10:45:50,251  - INFO - [nitrointerface.py:_perform_post_configure_operation:575] (MainThread) NetScaler UPTime is recorded as 7225

At this point the Citrix Ingress Controller container will sit there listening out for any Kubernetes API calls which it might be interested to assist with, e.g. creation of an ingress or load balancer object. By default Citrix should pick up any ingress creation event, but in many environments you’ll already have NGINX deployed for various reasons (e.g. it’s a functional part of accessing a dashboard for instance).

The way that you can avoid getting things tangled up is by deliberately using ingress class annotations in your specifications. In this way other ingress controllers will ignore your requests to build an ingress but CIC will jump straight in to help. The annotation which is used for this is called:

kubernetes.io/ingress.class:"Citrix"

Deploying an application

Let’s start by deploying a simple application into the default namespace. The reason we’re going to do this is two-fold, firstly it is simple and most likely to work, and secondly it verifies that CIC is able to see services and ingresses outside of its own namespace. I like to use a hello-world image from Tutum because it tells us a little bit about where it’s running when you access the page.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-world
  namespace: default
spec:
  selector:
    matchLabels:
      run: hello-world
  replicas: 1
  template:
    metadata:
      labels:
        run: hello-world
    spec:
      containers:
      - name: hello-world
        image: tutum/hello-world
        ports:
        - containerPort: 80

Create a new YAML file and save it as deploy-hello-world.yaml, then use kubectl to deploy it to Kubernetes. You’ll see that I’ve prepended rancher in all of my examples but you can omit that if you’re not using Rancher

kubectl apply -f deploy-hello-world.yaml

Creating a service

Now that the application is running in a container you’ll need to create a service using the following YAML. Save it as expose-hello-world.yaml. You could use a type spec of ClusterIP or NodePort – it doesn’t matter when CIC is configured with --feature-node-watch=true although the default is actually ClusterIP.

apiVersion: v1
kind: Service
metadata:
  name: hello-world
  namespace: default
  labels:
    run: hello-world
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: hello-world

kubectl apply -f expose-hello-world.yaml

Defining your ingress

An ingress is a rule which directs incoming traffic to a host address or a given path through to the backend application. It’s quite important to know that an ingress itself is just a rule, there may be load balancers or ingress controllers which receive incoming traffic in your environment but the ingress assists in directing that flow to the backend application.

Again the use of the ingress class kubernetes.io/ingress.class:"Citrix" is an essential component of the below ingress example. It ensures that CIC ‘notices’ the new ingress definition and tells it that it should instruct the Citrix ADC to build load balancing or content switching vservers to make sure your traffic is received when the outside world attempts to talk to your application.

In this ingress example we are going to simulate a scenario where you have a path based entry point into your application, which itself then redirects to the container’s root page. Create a new YAML file with the following content and call it ingress-hello-world.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: hello-world-ingress
  namespace: default
  annotations:
   kubernetes.io/ingress.class: "citrix"
spec:
  rules:
  - host:  www.helloworld.com
    http:
      paths:
      - path: /hello-world
        backend:
          serviceName: hello-world
          servicePort: 80

NB The author, his company and this post has nothing whatsoever to do with any websites or businesses operating on any real domains such as ‘helloworld.com’. It is chosen simply as a convenient example.

kubectl apply -f ingress-hello-world.yaml

At this point, if everything has worked correctly you should be able to make a host file or DNS entry for www.helloworld.com (of course you could use anything else) which points to the same IP address you used to define the NS_VIP address of your load balancer in the Citrix Ingress Controller configuration (citrix-k8s-ingress-controller.yaml). In the examples above the mapping would be:

www.helloworld.com <---> 192.168.0.110

You’ll see the virtual IP now created for you within the Citrix ADC in two places, firstly a new content switch:

A new content switch with the IP address specified in NS_VIP entry, 192.168.0.110

This new content switch has one or more expressions which match traffic to actions (created through ingress definitions):

Therefore any incoming HTTP request matching the www.helloworld.com host where the request URL includes pages starting with the /hello-world location will be sent to the second newly created object – the vserver defined in the action below:

A new load balancing vserver has been created with address 0.0.0.0

This LB vserver includes a service group whose members are actually represented by the pods where the application is currently running. If you changed the deployment specification to include more replicas then you would see more nodes participating in the service group. Citrix ADC will monitor the health of the exposed node ports in order to ensure that traffic is only directed onto running pods.

And now when we visit the page, via the hostname and URL path defined on the ingress we should now see:

Adding a rewrite policy

Let’s say that you have a single ingress controller which is exposing endpoints on a path basis, e.g. /myapproot but the application available on that service is expecting /myapproot/ instead. Some applications I’ve seen won’t respond properly unless you rewrite your request URL to have the trailing forward slash. Fortunately Citrix Ingress Controller and ADC are able to take care of this through a rewrite rule.

Before you can use this you’ll need to deploy the Custom Resource Definitions for rewrite using the following instructions.

Download the CRD for rewrite and responder YAML from this Citrix URL. Save it as rewrite-responder-policies-deployment.yaml and then deploy it using

kubectl create -f rewrite-responder-policies-deployment.yaml

NB One very interesting ‘gotcha’ here is that if you associate a CRD with a namespace then it will only create rewrite policies and actions for services in that namespace, so I would recommend simply using the simplest form of the command shown above without placing the CRD into the ingress-citrix namespace used in this blog’s example.

Now that is deployed you should adapt the following YAML in order to define how the app rewrite should function and then save it as cic-rewrite-example.yaml:

apiVersion: citrix.com/v1
kind: rewritepolicy
metadata:
 name: httpapprootrequestmodify
 namespace: default
spec:
 rewrite-policies:
   - servicenames:
       - hello-world
     rewrite-policy:
       operation: replace
       target: http.req.url
       modify-expression: '"/hello-world/"'
       comment: 'HTTP app root request modify'
       direction: REQUEST
       rewrite-criteria: http.req.url.eq("/hello-world")

kubectl create -f cic-rewrite-example.yaml

Using a Load Balancer service instead of Ingress

In the example above I outlined how to create a hello-world deployment and service in order to correctly present an application via an ADC using ingress. However ingress will only work for HTTP/HTTPS type traffic and cannot be used for other services. One additional method you can use for other traffic is to define a service of type LoadBalancer rather than any other option, e.g. ClusterIP, NodePort.

Citrix Ingress Controller has a specific annotation for this scenario which can be added to the service definition to add the IP address which ADC should use. This is the equivalent of a cloud-provider based load balancer in your on-prem Kubernetes environment where you might not use ingress at all.

apiVersion: v1
kind: Service
metadata:
  name: hello-world
  namespace: default
  annotations:  
    service.citrix.com/frontend-ip: '192.168.0.115'
  labels:
    run: hello-world
spec:
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: hello-world

Save the YAML example above into cic-loadbalancer-example.yaml and apply it.

kubectl create -f cic-loadbalancer-example.yaml

If you now examine the service which is created it should be apparent that the type has now changed from NodePort or ClusterIP to LoadBalancer. The external IP address is now shown, as defined within the service.citrix.com/frontend-ip: '192.168.0.115' annotation.

Citrix ADC will now direct traffic arriving at that IP address through to any pods which match the label selector. This method allows you to quite simply plug the outside world in to your Kubernetes application infrastructure at L4 without using ingress or path matching rules.

Summary

Citrix Ingress Controller is well worth investigating if you are beginning to implement on-prem Kubernetes based applications and already have an investment in Citrix ADC. If you need additional features such as DDoS protection, advanced rewrite, TCP optimisations etc. then CIC offers quite a lot of benefits over a simple NGINX proxy. The next article planned in this series will examine the sidecar Citrix ADC CPX deployment and how this can enhance visibility of inter-container communication.

Addendum – Rancher specific ingress issue with Citrix Ingress Controller

This section has been included here in order to highlight a specific issue which is currently occurring in CIC 1.6.1 and Rancher 2.3.4 releases. It seems to be a purely cosmetic issue however it’s been the subject of a recent call I had with some of the Citrix people responsible for CIC who confirmed the behaviour with me. Basically when an ingress is created it is successfully created by CIC but its status does not move from ‘Initializing’ to ‘Active’ in Rancher. This is because Rancher is awaiting the External-IP value to be updated in the Status, but this does not occur because CIC doesn’t mandate that this be actively reported. I’ll update/remove this section from the post if and when this is resolved.

UPDATE – the above issue is now resolved in releases 1.7.6 and above by appending the --update-ingress-status entry into the CIC deployment YAML under the following section:

args:
  - --ingress-classes citrix
  - --feature-node-watch true
  - --update-ingress-status yes

November 11, 2019

Upgrading vyOS VMware appliance to latest release

In order to troubleshoot a vyOS issue which we’ve been experiencing lately I attempted to upgrade to the latest vyOS release on a .OVA deployed appliance that was running the older 1.2.1 release.

The vyOS upgrade documentation shows the command required to install a new version is simply:

add system image https://downloads.vyos.io/rolling/current/amd64/vyos-1.2.0-rolling%2B201810030440-amd64.iso

However later on in the same article the command response shows the error:

We do not have enough disk space to install this image!
 We need 344880 KB, but we only have 17480 KB.
 Exiting…

So what is using the space on the appliance and how can we resolve this issue?

We do not have enough disk space to install this image!

Basically, using ‘sudo du -hs /var’ shows us that 968MB of data is consumed within the /var folder and most of this relates to the wtmp and wtmp.1 files. What are those files? They are simply large binary rolling log files which are written to in order to record any login attempts, with wtmp.1 being the rolled up previous versions which are being retained.

We don’t need anything close to that level of logging in our lab environments, so the following commands modify the retention period and log interval to 1 hour maximum.

sudo nano /etc/logrotate.conf

Edit the lines to change from ‘weekly’ and ‘4’ represent a month’s worth of logs to:

#rotate log files weekly
 hourly
#keep 4 weeks worth of backlogs
 rotate 1

Which should retain a rolling log of any login attempts during the last hour. Once this is done you can delete the previous wtmp.1 rollup, apply the vyOS update and then reboot (once only) in order to apply the latest code version now that you have sufficient space:

sudo rm /var/log/wtmp.1
add system image http://172.20.12.142:8080/vyos-1.2-rolling-201911110217-amd64.iso
sudo reboot

NB – in my example I’m hosting the .ISO file which I downloaded on a simple HTTP web server on the internal network

After you’ve finished the upgrade you could always revert the logging configuration back to the defaults, but the main sticking point here is the limited available space once a rollup of logs has become quite large and I didn’t want to have to fix this again in the future.

September 17, 2019

Upgrading Citrix XenApp 7.x VDA version using PowerShell

With the advent of XenApp 7 and more recently experiencing the higher frequency of VDA cumulative updates I would generally recommend implementing Citrix Machine Creation Services or other imaging mechanism (such as Provisioning Server) when rolling out new versions of the Virtual Desktop Agent to a large number of catalogs.

However, what happens when you only require one XA server per catalog, or when each one of those servers is handled manually when new application code is deployed? This is more common than you might imagine, especially in Citrix deployments which have per-customer or per-app specific catalogs. The work involved in maintaining a master image can be significant and the serviceability of such relies upon someone knowing how to treat image updates in a way that won’t introduce problems that could arise weeks or months later.

One customer of mine has at least 80 catalogs running one or more XenApp VMs and so it simply doesn’t make sense to maintain a single master image for each, especially when application code updates are delivered frequently. So I set about creating a simple PowerShell script which works in a VMware environment to attach the Citrix upgrade ISO and then run the setup installer within the context of a remote PowerShell session.

Using this method you can easily carry out a bulk upgrade of tens (possibly hundreds) of statically assigned VDAs individually by attaching the ISO and installing the update automatically. The advantage of this time saving approach is that it can even be run in a loop so that the upgrade is only attempted when a server is idle and not running any sessions.

NB – as always, please validate the behaviour of the script in a non-production environment and adjust where necessary to meet your own needs.

Here’s a walkthrough of the script, along with the complete example version included at the end.

The script will load the required plugins from both Citrix and VMware PowerShell modules/plugins (I generally run things like this on the Citrix Delivery Controller and install PowerCLI alongside for convenience)
Request credentials and connect to vCenter via a popup
Request credentials for use with WinRM connections to remote Windows servers via a popup
Create a collection of objects (XA servers) which are powered on, do not have any active sessions and don’t already have the target VDA version installed (see $targetvda variable)
For each VM, sequentially:
1. Attach the specified .iso image file to the resulting VMs
2. Determine the drive letter where the XA ISO file has been mounted
3. Create a command line for the setup installer, and save the command into c:\upgrade_vda.cmd on the XA server
4. Connect via PowerShell remoting session to the remote XA server
5. Adjust the EUEM registry node permissions (as per https://support.citrix.com/article/CTX215992)
6. Execute the c:\upgrade_vda.cmd upgrade script on remote machine via PS session
7. Disconnect the PowerShell remote session
8. Reboot the VM via vCenter in order to restart the XA services

Review the script and edit the following variables to reflect your use-case:

$vcentersrv = "yourvcentersrv.domain.com"
$targetvda = '7.15.4000.653'
$isopath = "[DATASTORE] ParentFolderName\XenApp_and_XenDesktop_7_15_4000.iso"

Edit the selection criteria on the VMs which will be upgraded:

$targetvms = Get-BrokerMachine -DesktopKind Shared | Where-Object {($_.AgentVersion -ne $targetvda) -and ($_.PowerState -eq 'On') -and ($_.HostedMachineName -like 'SRV*')}

All servers in my example environment begin with virtual machine names SRV* so this line can be adapted according to the number of VMs which you would like to upgrade, or simply replace with the actual named servers if you want to be more selective:

($_.HostedMachineName -in 'SRV1','SRV2','SRV3')

Finally, consider modifying the following variable from $true to $false in order to actually begin the process of upgrading the selected VMs. I suggest running it in the default $true mode initially in order to validate the initial selection criteria.

$skiprun = $true

Additional work:

I would like additionally to incorporate the disconnection of previous VDA .ISO files from the VM before attempting to upgrade. I have noticed that the attached volume label search e.g. Get-Volume -FileSystemLabel ‘XA and XD*’ that determines the drive letter selection is too wide, and will erroneously detect both XA_7_15_4000.iso and XA_7_15_2000.iso versions without differentiating between them.

I would also like to do further parsing of the installation success result codes in order to decide whether to stop, or simply carry on – however I have used the script on tens of servers without hitting too many roadblocks.

This script could also be adapted to upgrade XenDesktop VDA versions where statically assigned VMs are provided to users.

Final note:

This script does not allow the Citrix installer telemetry to run during the installation because it requires internet access and this generates errors in PowerShell for XenApp servers which can’t talk outbound. You can choose to remove this command line parameter according to your circumstances:

/disableexperiencemetrics

Citrix also optionally collects and uploads anonymised product usage statistics, but again this requires internet access. In order to disable Citrix Telemetry the following setting is used:

/EXCLUDE "Citrix Telemetry Service"

Additionally the Personal vDisk feature is now deprecated, so the script excludes this item in order for it to be removed if it is currently present (so be aware if you’re using PvD):

/EXCLUDE "Personal vDisk"

PowerShell code example:

# Upgrade VDA on remote Citrix servers

if ((Get-PSSnapin -Name "Citrix.Broker.Admin.V2" -ErrorAction SilentlyContinue) -eq $Null){Add-PSSnapin Citrix.Broker.Admin.V2}
if ((Get-PSSnapin -Name "VMware.VimAutomation.Core" -ErrorAction SilentlyContinue) -eq $Null){Add-PSSnapin VMware.VimAutomation.Core}

$vcentersrv = "yourvcentersrv.domain.com"

if ($vmwarecreds -eq $null) {$vmwarecreds = Connect-VIServer -Server $vcentersrv}            # Authenticate with vCenter, you should enter using format DOMAIN\username, then password
if ($creds -eq $null) {$creds = Get-Credential -Message 'Enter Windows network credentials'} # Get Windows network credentials

clear

$targetvda = '7.15.4000.653' #Add the target VDA version number - anything which isn't correct will be upgraded
$isopath = "[DATASTORE] ParentFolderName\XenApp_and_XenDesktop_7_15_4000.iso" #Path to ISO image in VMware
$skiprun = $true #Set this variable to false in order to begin processing all listed VMs

$targetvms = Get-BrokerMachine -DesktopKind Shared | Where-Object {($_.AgentVersion -ne $targetvda) -and ($_.PowerState -eq 'On') -and ($_.HostedMachineName -like 'SRV*')}
Write-Host The following XA VMs will be targeted
Write-Host $targetvms.HostedMachineName
if ($skiprun -eq $true) {write-host Skip run is still enabled; exit}

foreach ($i in $targetvms){

if ($i.AgentVersion -ne $targetvda) {
    Write-Host Processing $i.HostedMachineName found VDA version $i.AgentVersion
    
    if ($i.sessioncount -ne $null) {Write-Host Processing $i.HostedMachineName found $i.sessioncount users are logged on}

    if ($i.sessioncount -eq 0) {#Only continue if there are no logged-on users

        Write-Host Processing $i.HostedMachineName verifying attachment of ISO image
        $cdstate = Get-VM $i.HostedMachineName | Get-CDDrive
        if (($cdstate.IsoPath -ne $isopath) -and ($cdstate -notcontains 'Connected')) { $cdstate | Set-CDDrive -ISOPath $isopath -Confirm:$false -Connected:$true;Write-Host ISO has been attached}

        $s = New-PSSession -ComputerName ($i.MachineName.split('\')[1]) -Credential $creds
            #Create the upgrade command script using correct drive letters
            Write-Host Processing $i.HostedMachineName -NoNewline
            invoke-command -Session $s {
                $drive = Get-Volume -FileSystemLabel 'XA and XD*'
                $workingdir = ($drive.driveletter + ":\x64\XenDesktop Setup\")
                $switches = " /COMPONENTS VDA /EXCLUDE `"Citrix Telemetry Service`",`"Personal vDisk`" /disableexperiencemetrics /QUIET"
                $cmdscript = "`"$workingdir" + "XenDesktopVDASetup.exe`"" + $switches
                Out-File -FilePath c:\upgrade_vda.cmd -InputObject $cmdscript -Force -Encoding ASCII
                Write-Host " wrote script using path" $workingdir
            }
            
            #Adjust the registry permissions remotely
            Write-Host Processing $i.HostedMachineName updating registry permissions
            Invoke-Command -Session $s {
                $acl = Get-Acl "HKLM:\SOFTWARE\Wow6432Node\Citrix\EUEM\LoggedEvents"
                $person = [System.Security.Principal.NTAccount]"Creator Owner"
                $access = [System.Security.AccessControl.RegistryRights]"FullControl"
                $inheritance = [System.Security.AccessControl.InheritanceFlags]"ContainerInherit,ObjectInherit"
                $propagation = [System.Security.AccessControl.PropagationFlags]"None"
                $type = [System.Security.AccessControl.AccessControlType]"Allow"}
            Invoke-Command -Session $s {$rule = New-Object System.Security.AccessControl.RegistryAccessRule($person,$access,$inheritance,$propagation,$type)}
            Invoke-Command -Session $s {$acl.AddAccessRule($rule)}
            Invoke-Command -Session $s {$acl |Set-Acl}
                
            #Execute the command script
            Write-Host Processing $i.HostedMachineName, executing VDA install script
            Invoke-Command -Session $s {& c:\upgrade_vda.cmd} # Runs the upgrade script on remote server
            Remove-PSSession $s #Disconnect the remote PS session
            Restart-VMGuest -VM $i.HostedMachineName -Confirm:$false #Restart the server following either a successful or unsuccessful upgrade
            }
        }
    }

June 4, 2019

Using MSDeploy to migrate between IIS6 and IIS7/8.5

A recent server replacement task I worked on recently involved migrating content between an IIS6 instance running on Windows Server 2003 R2 and IIS8.5 on Windows Server 2012 R2.

One of the sticking points that arose was that the source and destination switches included the potential options of -source: and -dest: with the webserver60 or webserver switches, according to whether IIS6 or above was being targeted respectively.

However when attempting a cross-version migration it is not possible to sync the content directly when using mismatched webserver and webserver60 switches. Instead, it’s more appropriate to target the IIS metabase index for the web site using –source:metakey=lm/w3svc/1

Here’s a step by step example of the choices which you would use:

Obtain and install the MSDeploy tool on both source and destination servers, including the remote service.

To check the dependencies on the original IIS6 server
msdeploy -verb:getDependencies -source:webServer60

To check the installed modules on the target IIS8.5 server
msdeploy -verb:getDependencies -source:webServer

To check the dependencies on the original IIS6 metabase
msdeploy -verb:getDependencies -source:metakey=lm/w3svc/1

To check the installed modules on the target IIS8.5 metabase
msdeploy -verb:getDependencies -source:metakey=lm/w3svc/1

Once you have determined which options were previously installed on the source IIS instance and manually aligned the settings on the target server you can proceed with the initial synchronisation of the web server content. One way of doing this between servers connected by a slow connection is using a ZIP archive (termed package mode), which should ideally contain credentials encrypted using a password:

To export the files from the source IIS6 server metabase to an encrypted ZIP file, including Application Pools
msdeploy.exe -verb:Sync -source:metakey=lm/w3svc/1 -dest:package=C:\Output\MS_WebDeploy_WebDeploy\servername_6.0_1.zip,encryptPassword=xxx -enableLink:AppPoolExtension

Copy the package file (.zip) to the target server in order to import it.

NB. The IncludeACLs=true option does not do anything when transferring data using the package option (ZIP archive), but doesn’t generate any error

I found that it was necessary to switch the .NET priority order of the migration tool in “C:\Program Files\IIS\Microsoft Web Deploy V3\msdeploy.exe.config” in order to use the earlier .NET version when importing data on the target server.

To import to IIS metabase on target server using the encrypted package
msdeploy -verb:sync -source:package=c:\Output\MS_WebDeploy_WebDeploy\servername_6.0_1.zip,encryptPassword=xxx -enableLink:AppPoolExtension -dest:metakey=lm/w3svc/1

Once you have completed the initial import of the source data on the target server it is often useful to resynchronise the data at a later point, e.g. after applying small changes at the source. This option can also be used directly without using the package based approach outlined previously if you are copying content between servers with a high-speed connection.

To import to IIS metabase on target server by pulling from the source machine, including ACLs

msdeploy -verb:sync -source:metakey=lm/w3svc/1,computername=servername,includeAcls=True -enableLink:AppPoolExtension -dest:metakey=lm/w3svc/1,includeAcls=True

These are fairly simple examples but they should be sufficient to migrate simple HTML or Classic ASP pages between the source and destination servers. You can find links to useful documentation below:

https://www.iis.net/downloads/microsoft/web-deploy

https://docs.microsoft.com/en-us/iis/publish/using-web-deploy/migrate-a-web-site-from-iis-60-to-iis-7-or-above