Here’s a simple scenario which I came across today. You would like to work with your vSphere environment using the latest PowerCLI but discover that v6.5.1 is the latest downloadable version on VMware’s website. Hearing that the distribution for this code has now moved to the PowerShell Gallery you open a PS prompt and enter:
PS:\> Install-Module VMware.PowerCLI
The modules are downloaded and installed successfully, and you are able to connect to your vCenter environment:
But when you attempt to use a simple command such as:
you receive an error similar to:
get-tag : 11/10/2018 21:06:20 Get-Tag Could not load file or assembly 'Newtonsoft.Json, Version=10.0.0.0,
Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed' or one of its dependencies. The system cannot find the file
At line:1 char:1
In my case I found that other system components on my VM were using an older version of the Newtonsoft.Json.dll (e.g. Citrix Virtual Desktop Agent) that were found in the file search path before the PowerShell module’s location.
Searching for the file conflict using ProcMon I noticed that the Connect-VIServer cmdlet does indeed find and load a version of this .dll during the connection process, e.g. the one located in:
This simple work-around proved successful for me, but you should check of course to verify all other functionality which might depend on this file before making a similar change in a production environment.
I recently ran into a strange issue following the enablement of two PSC 6.5 nodes in an HA configuration, as part of a larger rolling upgrade from vCenter 5.5.
NB – all URLs shown are internal, in use within my lab environment only.
During the migration of the existing customers vCenter environment we had to rehearse the externalisation of PSC from an initial embedded SSO instance. As part of this process the first PSC node in a new site was migrated from an original Window vCenter 5.5 SSO to PSC 6.5, and subsequently a second new node was joined to the first site in order for replication to be established.
The second node, https://hosso2.sbcpureconsult.internal/psc worked correctly and redirects to the load balanced address psc-ha-vip.sbcpureconsult.internal for authentication before displaying the PSC client UI.
Irrespective of whichever node is selected I was able to log in to vCenter, then choose Administration, System Configuration, select a node then Manage, Settings or CA without receiving any errors.
If I deliberately dropped the first node out of the load balancing config on the NetScaler I didn’t have any issues when accessing the /psc URL by either host name or load balancer name, but if I tried to connect to the first node by its own DNS name or IP I received an HTTP 400 error and the following entry in:
I repeated the same series of steps in my lab environment I had experienced on the customer site, and was able to confirm the same behaviour. Let me explain at this point, that all other vCenter functionality was correct and our issue only affected the /psc URL.
I wanted to confirm before signing off on the work that it should be possible to access the /psc URL on each node deliberately?
After what seemed like a lot of internal dialogue between myself and my inner tech support dept. (sleepless nights!) I was left wondering what could be going wrong.. especially if this was the documented procedure from VMware?
Good news, I was able to roll back my lab and re-run the updateSSOConfig.py and UpdateLsEndpoint.py scripts – only to find that the /psc URL did indeed load successfully on both nodes with the NetScaler load balancing in place!
So at least I knew that the correct behaviour is that you should be able to open /psc on both appliances.
By examining my snapshots at different stages I was able to identify a difference between the original migration node and the clean appliance:
When you run the updateSSOconfig.py Python script to repoint the SSO URL to the load balanced address it explains that hostname.txt and server.xml were modified:
I was able to locate hostname.txt files (containing the load balancer address) in:
/etc/vmware-sso/keys/hostname.txt (missing on node 2, but contained the local name on node 1)
but this second hostname file was missing on the second node. Why is this? I guess that it is used transiently during the script execution in order to inject the correct value into the server.xml file.
The server XML file is located in the folder:
my faulty node contained the following certificate entries under the connector definition:
So I was able to simply copy the server.xml file from the working node (overwriting the original on the faulty node) and also remove the /etc/vmware-sso/keys/hostname.txt file to match the configuration.
As a follow up, by examining the STS_INTERNAL_SSL_CERT store I could see that the machine certificate being used was issued by the original Windows vCenter Server 5.5 SSO CA to the subject name:
This store was not present on the other node, and so the correct load balancing certificate replacement must somehow be omitted by one of the upgrade scripts when this scenario occurs (5.5 SSO to 6.5 PSC).
I hope that this bug gets removed by VMware in due course, particularly as more customers are moving to the appliance based model of vCenter 6.x, but this workaround and method should be considered at least if you run into a similar problem.
Following installation of a second Platform Services Controller node in a site how will you know if replication is functioning correctly?
Assuming that you’ve got time to wait 30 seconds for each change to be replicated you could first try creating a test user on each node within the vsphere.local domain to verify bidirectional communication. But if you prefer to be a little more scientific or repeat the process programmatically you can follow a simple sequence of steps.
The following article from VMware explains the process, however it does omit a period (.) character at the beginning of the Linux commands such that the steps can’t be followed verbatim.
Host available: Yes
Status available: Yes
My last change number: 8986
Partner has seen my change number: 8986
Partner is 0 changes behind.
In these examples the change numbers (unique sequence numbers) are specific to the local host, but are not necessarily the same if they were introduced to the site at different times. The important value to pay attention to is whether the replication partner shows that any changes are not yet communicated or if the other partner is unavailable.