Optimising Oracle DB with VMware’s vFlash Read Cache feature

This post is a slightly different one that I’ve usually made simply because it is more notes based than editorial or comment, however I hope that the simple steps and data captured here will be useful. In fact it’s taken me a while to get this data out, but even though it’s about a year old now the performance improvement should be even better with ESXi 6.x. In this test we were interested basically in evaluating whether VMware’s new Flash Read Cache(vFRC)  feature released in ESXi 5.5 would benefit read heavy virtual workloads such as Oracle DB.

Test scenario:

Oracle 11g DB with 4vCPU, 8,192MB RAM and 200GB Oracle ASM disk for database
HP DL380 G7 with 2 x Intel Xeon 5650 6C 2.67GHz CPU and 128GB RAM, locally attached 4 x 7.2K SAS RAID array
VMware ESXi 5.5 Enterprise Plus license with vFlash Read Cache capability.

Creating a baseline (before applying vFRC)

Using esxitop to establish typical baseline values:Disk latency typical across measured virtual machines – 11.97ms latencyCorrelation of baseline latency and command per second values with vCenter Operations Manager:

High and low water disk latency – between 4 and 16ms (using 7.2K RPM drives in 4 disk RAID5 array).

Disk usage was negligible following VM boot and Oracle DB startup:

In order to set the vFlash Read Cache block size correctly we need to find out the typical write block size (so that small writes do not consume too large a cache block if it is set higher than the mean).Using vscsiStats to measure the frequency of different sized I/O commands:

Highlighted frequency values (above) show that 4,096 byte I/Os were the most common across both write and read buckets, and therefore the overall number of operations peaked in the same window.In order to establish the baseline Oracle performance an I/O calibration script was run several times.Oracle DB I/O metrics calculation:

Max IOPs were found to lie between 576 and 608 per second using a 200GB VMDK located on the 4 disk RAID array.The high water mark for disk latency rose to 28ms during the test, versus 12ms when the instance was idle – indicating contention on the spindles during read/write activity.

During the I/O calibration test the high water mark for disk throughput rose to 76,000 KBps, versus 3,450 KBps when the instance was idle. This shows that the array throughput max is around 74MB/s.

Having established that the majority of writes during the above test were in fact using an 8KB block size (not as shown in the screenshot which was taken from a different test (4KB)) the vFRC was enabled only on the 200GB ASM disk using an arbitrary 50GB reservation (25% of total disk size). No reboot was required, VMware inserts the cache in front of the disk storage transparently to the VM.

With Flash Read Cache enabled on 200GB ASM disk

After adding a locally attached 200GB SATA SSD disk to the ESXi server and claiming the storage for Flash Read Cache a 50GB vFRC cache was enabled on the Oracle ASM data disk within the guest OS configuration:

Once the vFRC function was enabled the Oracle I/O calibration script was run again, and surprisingly the first pass was considerably slower than previous runs (max IOPs 268). This is because each read from the SSD cache initially fails, because prior writes have not primed the cache. By writing to SSD before committing to disk (write-through caching), data is continually added to the vFRC cache such that performance should improve over time:

Esxcli was used to view the resulting cache efficiency after running I/O calibration (showing 29% read hit rate via SSD cache vs reads from SAS disk):

In the example above, no blocks have been evicted from the cache yet meaning that the 50GB cache assigned to this VMDK still offers room for growth. When all of the cache blocks are exhausted the ESXi storage stack will begin to remove older blocks in favour of storing more relevant up to date data.The resulting I/O calibration performance is shown below – both before and after enabling the vFRC feature.

In brief conclusion, the vFlash Read Cache feature is an excellent way to add in-line SSD based read caching for specific virtual machines and volumes. You must enable the option on specific VMs only, and then track their usage and cache effectiveness over time in order to make sure that you have allocated not too much, or not too little cache. However, once the cache is primed with data there is a marked and positive improvement to the read throughput, and a much reduced number of IOPS needing to be dealt with by the physical storage array. For Oracle servers which are read biased this should significantly improve performance where non-SSD storage arrays are being utilised.

Updating password field names with multiple NetScaler Gateway virtual servers

Imagine a situation where you want to change your NetScaler Gateway’s logon page to include alternative prompts for the Username, Password 1 and Password 2 fields and need to update the language specific .XML files. This has been documented before, and isn’t too hard to figure out once you’ve found a couple of ‘How to’ guides on the Internet. However I have since come across a limitation in trying to apply the NetScaler’s new ‘Custom’ design template to several different NetScaler Gateway virtual servers at the same time, because essentially whilst you can define your own custom design it is automatically applied to all instances of the virtual server residing on the NetScaler – so if you define custom fields then you’ve defined them for all.

This may not be a problem for some people, but what if the secondary authentication mechanism is an RSA token for one site, and a VASCO token for another? How do you go about configuring alternative sets of custom logon fields? Most of the answers are already out there in one form or another, but I lacked one simple beginning to end description of the solution (I tried several alternate options including rewrite policies which didn’t quite work before I opted for this approach):

Background (NetScaler 10.5.x build)The Citrix NetScaler VPN default logon page has already been modified in order to ask for ‘AD password’ and ‘VASCO token’ values instead of Password 1: and Password 2:, as detailed in http://support.citrix.com/article/CTX126206

This was achieved by editing index.html and login.js files in /var/netscaler/gui/vpn of the NS as per the Citrix article above.

In addition, the resources path which holds the language based .XML files in /var/netscaler/gui/vpn/resources has been backed up into /var/customisations so that the /nsconfig/rc.netscaler file can copy them back into the correct location if they get overwritten or lost following reboot.

Contents of rc.netscaler file

cp /var/customisations/login.js.mod /netscaler/ns_gui/vpn/login.jscp /var/customisations/en.xml.mod /netscaler/ns_gui/vpn/resources/en.xmlcp /var/customisations/de.xml.mod /netscaler/ns_gui/vpn/resources/de.xmlcp /var/customisations/es.xml.mod /netscaler/ns_gui/vpn/resources/es.xmlcp /var/customisations/fr.xml.mod /netscaler/ns_gui/vpn/resources/fr.xml

However, because these values apply globally there is an issue if a second NetScaler virtual server does not use a VASCO token as a secondary authentication mechanism. This causes the normal ‘Password’ entry box to be displayed as ‘VASCO token’. The only suitable workaround for this is to create a parallel set of logon files for each additional NS gateway virtual server and use a responder policy on the NS to redirect incoming requests for the index.html page of the VPN to a different file.

In the following examples, I have created a second configuration for a ‘Training NetScaler’, abbreviated to TrainingNS throughout. In summary,

Create separate login.js and index.html files for the alternate parameters, create a new /resources folder specifically for those and edit references within those before defining a responder action & policy in NS:

  1. Copy existing login.js to loginTrainingNS.js
  2. Copy existing index.html to indexTrainingNS.html
  3. Create a new folder called /netscaler/ns_gui/vpn/resourcesTrainingNS and give it the same owner/group permissions as the /netscaler/ns_gui/vpn/resources folder (use WinSCP to define the permissions, right click Properties on the file)
  4. Copy all of the .XML files from /netscaler/ns_gui/vpn/resources into the new folder
  5. Edit the indexTraining.html file and make the following change to reflect the new location of the resource files

var Resources = new ResourceManager("resourcesTrainingNS/{lang}", "logon");

Edit the indexTrainingNS.html file and make the modifications described in CTX1262067.

Edit the individual .XML files in the new folder as per the explanation in CTX126206

AD Password:
TwoFactorAuth Password:

(this second option will not be used if only a primary authentication mechanism is defined)

When all of the file changes are complete, using https://support.citrix.com/article/CTX123736 as a guide, define the responder action and policy on the NS:

  • Create a responder action using the URL: “https://trainingns.lstraining.ads/vpn/indexTrainingNS.html”
  • Create a responder policy using the expression: HTTP.REQ.HOSTNAME.EQ(“trainingns.lstraining.ads”) && HTTP.REQ.URL.CONTAINS(“index.html”)Bind the policy to the global defaults

Now when you launch the URL for the Training NetScaler it will redirect to the custom index.html file and load a separate logon.js and .xml resource files so that the logon box will be name differently.

In addition, the following article hints at an alternative resolution if the Responder feature cannot be licensed: http://www.carlstalhood.com/netscaler-gateway-virtual-server/#customize