Should the /psc URL work on both HA Platform Services nodes?

I recently ran into a strange issue following the enablement of two PSC 6.5 nodes in an HA configuration, as part of a larger rolling upgrade from vCenter 5.5.

NB – all URLs shown are internal, in use within my lab environment only.

During the migration of the existing customers vCenter environment we had to rehearse the externalisation of PSC from an initial embedded SSO instance. As part of this process the first PSC node in a new site was migrated from an original Window vCenter 5.5 SSO to PSC 6.5, and subsequently a second new node was joined to the first site in order for replication to be established.

I used a Citrix NetScaler to load balance the configuration and noticed at some point after the successful HA repointing was done that I was unable to access the https://hosso01.sbcpureconsult.internal/psc URL.

The second node, https://hosso2.sbcpureconsult.internal/psc worked correctly and redirects to the load balanced address psc-ha-vip.sbcpureconsult.internal for authentication before displaying the PSC client UI.

Irrespective of whichever node is selected I was able to log in to vCenter, then choose Administration, System Configuration, select a node then Manage, Settings or CA without receiving any errors.

If I deliberately dropped the first node out of the load balancing config on the NetScaler I didn’t have any issues when accessing the /psc URL by either host name or load balancer name, but if I tried to connect to the first node by its own DNS name or IP I received an HTTP 400 error and the following entry in:

/storage/log/vmware/psc-client/psc-client.log

[2018-10-08 12:05:20.347] [ERROR] tomcat-http--3 com.vmware.vsphere.client.security.websso.MetadataGeneratorImpl - Error when creating idp metadata.
java.lang.RuntimeException: java.io.IOException: HTTPS hostname wrong:  should be <psc-ha-vip.sbcpureconsult.internal>

It appeared that the HTTP 400 error is because the psc-client Tomcat application doesn’t start up correctly on the first node anymore, along with an error in..

/storage/log/vmware/rhttpproxy/rhttpproxy.log

2018-10-08T13:27:10.691Z warning rhttpproxy[7FEA4B941700] [Originator@6876 sub=Default] SSL Handshake failed for stream <SSL(<io_obj p:0x00007fea2c098010, h:27, <TCP '192.168.0.117:443'>, <TCP '192.168.0.121:26417'>>)>: N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:140000DB:SSL routines:SSL routines:short read)

I repeated the same series of steps in my lab environment I had experienced on the customer site, and was able to confirm the same behaviour. Let me explain at this point, that all other vCenter functionality was correct and our issue only affected the /psc URL.

Could this be deemed ‘correct’ behaviour?

If I chose https://psc-ha-vip.sbcpureconsult.internal/psc (which is the load balancer address) I was initially only able to connect if the second node is online and happens to be selected.

I wanted to confirm before signing off on the work that it should be possible to access the /psc URL on each node deliberately?

After what seemed like a lot of internal dialogue between myself and my inner tech support dept. (sleepless nights!) I was left wondering what could be going wrong.. especially if this was the documented procedure from VMware?

Good news, I was able to roll back my lab and re-run the updateSSOConfig.py and UpdateLsEndpoint.py scripts – only to find that the /psc URL did indeed load successfully on both nodes with the NetScaler load balancing in place!

So at least I knew that the correct behaviour is that you should be able to open /psc on both appliances.

By examining my snapshots at different stages I was able to identify a difference between the original migration node and the clean appliance:

When you run the updateSSOconfig.py Python script to repoint the SSO URL to the load balanced address it explains that hostname.txt and server.xml were modified:

# python updateSSOConfig.py --lb-fqdn=psc-ha-vip.sbcpureconsult.internal
script version:1.1.0
executing vmafd-cli command
Modifying hostname.txt
modifying server.xml
Executing StopService --all
Executing StartService --all

I was able to locate hostname.txt files (containing the load balancer address) in:

  • /etc/vmware/service-state/vmidentity/hostname.txt
  • /etc/vmware-sso/keys/hostname.txt (missing on node 2, but contained the local name on node 1)
  • /etc/vmware-sso/hostname.txt

but this second hostname file was missing on the second node. Why is this? I guess that it is used transiently during the script execution in order to inject the correct value into the server.xml file.

The server XML file is located in the folder:

/usr/lib/vmware-sso/vmware-sts/conf/server.xml

my faulty node contained the following certificate entries under the connector definition:

..store="STS_INTERNAL_SSL_CERT"
certificateKeystoreFile="STS_INTERNAL_SSL_CERT"..

my working node contained:

..store="MACHINE_SSL_CERT"
certificateKeystoreFile="MACHINE_SSL_CERT"..

So I was able to simply copy the server.xml file from the working node (overwriting the original on the faulty node) and also remove the /etc/vmware-sso/keys/hostname.txt file to match the configuration.

Following a reboot my first SSO node then responded correctly by redirecting https://hosso01.sbcpureconsult.internal/psc to https://psc-ha-vip.sbcpureconsult.internal/websso to obtain its SAML token before ultimately displaying the PSC client UI.

As a follow up, by examining the STS_INTERNAL_SSL_CERT store I could see that the machine certificate being used was issued by the original Windows vCenter Server 5.5 SSO CA to the subject name:

ssoserver,dc=vsphere,dc=local

This store was not present on the other node, and so the correct load balancing certificate replacement must somehow be omitted by one of the upgrade scripts when this scenario occurs (5.5 SSO to 6.5 PSC).

I hope that this bug gets removed by VMware in due course, particularly as more customers are moving to the appliance based model of vCenter 6.x, but this workaround and method should be considered at least if you run into a similar problem.

NB This post is adapted from a longer discussion on VMware Communities page available under https://communities.vmware.com/thread/598140.

Updating password field names with multiple NetScaler Gateway virtual servers

Imagine a situation where you want to change your NetScaler Gateway’s logon page to include alternative prompts for the Username, Password 1 and Password 2 fields and need to update the language specific .XML files. This has been documented before, and isn’t too hard to figure out once you’ve found a couple of ‘How to’ guides on the Internet. However I have since come across a limitation in trying to apply the NetScaler’s new ‘Custom’ design template to several different NetScaler Gateway virtual servers at the same time, because essentially whilst you can define your own custom design it is automatically applied to all instances of the virtual server residing on the NetScaler – so if you define custom fields then you’ve defined them for all.

This may not be a problem for some people, but what if the secondary authentication mechanism is an RSA token for one site, and a VASCO token for another? How do you go about configuring alternative sets of custom logon fields? Most of the answers are already out there in one form or another, but I lacked one simple beginning to end description of the solution (I tried several alternate options including rewrite policies which didn’t quite work before I opted for this approach):

Background (NetScaler 10.5.x build)The Citrix NetScaler VPN default logon page has already been modified in order to ask for ‘AD password’ and ‘VASCO token’ values instead of Password 1: and Password 2:, as detailed in http://support.citrix.com/article/CTX126206

This was achieved by editing index.html and login.js files in /var/netscaler/gui/vpn of the NS as per the Citrix article above.

In addition, the resources path which holds the language based .XML files in /var/netscaler/gui/vpn/resources has been backed up into /var/customisations so that the /nsconfig/rc.netscaler file can copy them back into the correct location if they get overwritten or lost following reboot.

Contents of rc.netscaler file

cp /var/customisations/login.js.mod /netscaler/ns_gui/vpn/login.jscp /var/customisations/en.xml.mod /netscaler/ns_gui/vpn/resources/en.xmlcp /var/customisations/de.xml.mod /netscaler/ns_gui/vpn/resources/de.xmlcp /var/customisations/es.xml.mod /netscaler/ns_gui/vpn/resources/es.xmlcp /var/customisations/fr.xml.mod /netscaler/ns_gui/vpn/resources/fr.xml

However, because these values apply globally there is an issue if a second NetScaler virtual server does not use a VASCO token as a secondary authentication mechanism. This causes the normal ‘Password’ entry box to be displayed as ‘VASCO token’. The only suitable workaround for this is to create a parallel set of logon files for each additional NS gateway virtual server and use a responder policy on the NS to redirect incoming requests for the index.html page of the VPN to a different file.

In the following examples, I have created a second configuration for a ‘Training NetScaler’, abbreviated to TrainingNS throughout. In summary,

Create separate login.js and index.html files for the alternate parameters, create a new /resources folder specifically for those and edit references within those before defining a responder action & policy in NS:

  1. Copy existing login.js to loginTrainingNS.js
  2. Copy existing index.html to indexTrainingNS.html
  3. Create a new folder called /netscaler/ns_gui/vpn/resourcesTrainingNS and give it the same owner/group permissions as the /netscaler/ns_gui/vpn/resources folder (use WinSCP to define the permissions, right click Properties on the file)
  4. Copy all of the .XML files from /netscaler/ns_gui/vpn/resources into the new folder
  5. Edit the indexTraining.html file and make the following change to reflect the new location of the resource files

var Resources = new ResourceManager("resourcesTrainingNS/{lang}", "logon");

Edit the indexTrainingNS.html file and make the modifications described in CTX1262067.

Edit the individual .XML files in the new folder as per the explanation in CTX126206

AD Password:
TwoFactorAuth Password:

(this second option will not be used if only a primary authentication mechanism is defined)

When all of the file changes are complete, using https://support.citrix.com/article/CTX123736 as a guide, define the responder action and policy on the NS:

  • Create a responder action using the URL: “https://trainingns.lstraining.ads/vpn/indexTrainingNS.html”
  • Create a responder policy using the expression: HTTP.REQ.HOSTNAME.EQ(“trainingns.lstraining.ads”) && HTTP.REQ.URL.CONTAINS(“index.html”)Bind the policy to the global defaults

Now when you launch the URL for the Training NetScaler it will redirect to the custom index.html file and load a separate logon.js and .xml resource files so that the logon box will be name differently.

In addition, the following article hints at an alternative resolution if the Responder feature cannot be licensed: http://www.carlstalhood.com/netscaler-gateway-virtual-server/#customize