Category Archives: Oracle Weblogic

Optimising Oracle DB with VMware’s vFlash Read Cache feature

This post is a slightly different one that I’ve usually made simply because it is more notes based than editorial or comment, however I hope that the simple steps and data captured here will be useful. In fact it’s taken me a while to get this data out, but even though it’s about a year old now the performance improvement should be even better with ESXi 6.x.

In this test we were interested basically in evaluating whether VMware’s new Flash Read Cache(vFRC)  feature released in ESXi 5.5 would benefit read heavy virtual workloads such as Oracle DB.

Test scenario: Oracle 11g 11.2.0.1 DB with 4vCPU, 8,192MB RAM and 200GB Oracle ASM disk for database.

HP DL380 G7 with 2 x Intel Xeon 5650 6C 2.67GHz CPU and 128GB RAM, locally attached 4 x 7.2K SAS RAID array

VMware ESXi 5.5 Enterprise Plus license with vFlash Read Cache capability.

Creating a baseline (before applying vFRC)

Using esxitop to establish typical baseline values:

Disk latency typical across measured virtual machines – 11.97ms latency

Correlation of baseline latency and command per second values with vCenter Operations Manager:

vFRC1

High and low water disk latency – between 4 and 16ms (using 7.2K RPM drives in 4 disk RAID5 array).

vFRC3

Disk usage was negligible following VM boot and Oracle DB startup:

vFRC2

vFRC4

In order to set the vFlash Read Cache block size correctly we need to find out the typical write block size (so that small writes do not consume too large a cache block if it is set higher than the mean).

Using vscsiStats to measure the frequency of different sized I/O commands:

vFRC5

Highlighted frequency values (above) show that 4,096 byte I/Os were the most common across both write and read buckets, and therefore the overall number of operations peaked in the same window.

In order to establish the baseline Oracle performance an I/O calibration script was run several times.

Oracle DB I/O metrics calculation:

vFRC6

Max IOPs were found to lie between 576 and 608 per second using a 200GB VMDK located on the 4 disk RAID array.

The high water mark for disk latency rose to 28ms during the test, versus 12ms when the instance was idle – indicating contention on the spindles during read/write activity.

vFRC7

During the I/O calibration test the high water mark for disk throughput rose to 76,000 KBps, versus 3,450 KBps when the instance was idle. This shows that the array throughput max is around 74MB/s.

vFRC8

Having established that the majority of writes during the above test were in fact using an 8KB block size (not as shown in the screenshot which was taken from a different test (4KB)) the vFRC was enabled only on the 200GB ASM disk using an arbitrary 50GB reservation (25% of total disk size). No reboot was required, VMware inserts the cache in front of the disk storage transparently to the VM.

With Flash Read Cache enabled on 200GB ASM disk

After adding a locally attached 200GB SATA SSD disk to the ESXi server and claiming the storage for Flash Read Cache a 50GB vFRC cache was enabled on the Oracle ASM data disk within the guest OS configuration:

vFRC9

vFRC10

Once the vFRC function was enabled the Oracle I/O calibration script was run again, and surprisingly the first pass was considerably slower than previous runs (max IOPs 268). This is because each read from the SSD cache initially fails, because prior writes have not primed the cache. By writing to SSD before committing to disk (write-through caching), data is continually added to the vFRC cache such that performance should improve over time:

vFRC11

Esxcli was used to view the resulting cache efficiency after running I/O calibration (showing 29% read hit rate via SSD cache vs reads from SAS disk):

vFRC12

In the example above, no blocks have been evicted from the cache yet meaning that the 50GB cache assigned to this VMDK still offers room for growth. When all of the cache blocks are exhausted the ESXi storage stack will begin to remove older blocks in favour of storing more relevant up to date data.

The resulting I/O calibration performance is shown below – both before and after enabling the vFRC feature.

vFRC13

 

In brief conclusion, the vFlash Read Cache feature is an excellent way to add in-line SSD based read caching for specific virtual machines and volumes. You must enable the option on specific VMs only, and then track their usage and cache effectiveness over time in order to make sure that you have allocated not too much, or not too little cache. However, once the cache is primed with data there is a marked and positive improvement to the read throughput, and a much reduced number of IOPS needing to be dealt with by the physical storage array. For Oracle servers which are read biased this should significantly improve performance where non-SSD storage arrays are being utilised.

Oracle licensing on hyper-converged platforms such as Nutanix, VSAN etc.

I recently posted on Michael Webster of Nutanix’ blog about Oracle licensing on VMware clusters and wanted to link back to it here as it’s something I’ve been involved with several times now.

With VMware vSphere 5.5 the vMotion boundary is defined by the individual datacenter object in vCenter, which means that you cannot move an individual VM between datacenters without exporting, removing it from the inventory, and reimporting somewhere else. This currently means that even if you deploy Oracle DB on an ESXi cluster having just two nodes that you could be required by Oracle to license all of the other CPU sockets in the datacenter! This rule is due to Oracle’s stance that they do not support soft partitioning or any kind of host or CPU affinity rules. Providing that a VM could run on a processor socket, through some kind of administrative operation, then that socket should be licensed. This doesn’t seem fair, and VMware even suggest that this can be counteracted by simply defining host affinity rules – but let’s be clear, the final say so has to be down to Oracle’s licensing agreement and not whether VMware thinks it should be acceptable.

http://www.vmware.com/files/pdf/techpaper/vmw-understanding-oracle-certification-supportlicensing-environments.pdf

So the only current solution is to build Oracle dedicated clusters with separate shared storage and separate vCenter instances consisting only of Oracle DB servers. This means that you are able to define exactly which CPU sockets should be licensed, in effect all those which make up part of one or more ESXi clusters within the vCenter datacenter object.

Now, with vSphere ESXi 6 there was a new feature introduced called long distance vMotion which facilitates being able to migrate a VM between cities, or even continents – even if they are managed by different vCenter instances. An excellent description of the new features can be found here. This rather complicates the matter, since Oracle will now need to consider how this effects the ‘reach’ of any particular VM instance, which now would appear to only be limited to the scope of your single sign-on domain, rather than how many hosts or clusters are defined within your datacenter. I will be interested to see how this develops and certainly post back here if anything moves us further towards clarity on this subject.

Permalink to Michael’s original article

 

Oracle 11g Weblogic Forms install fails due to missing vendor variable

During a fresh install of Weblogic 11g Forms last night I ran into this rather infuriating problem and had to go back to first principles it seems in order to fix it. The issue was that during creation of the domain the installer would fail and complain that it could not find the JDK installation, even though apparently this was defined correctly within the system environment variables, e.g.

JAVA_HOME=C:\Java\jdk1.6 (within which the \bin folder contains java.exe)

However, every time I ran the ..\Oracle\Middleware\as_1\bin\config.bat to begin the domain creation wizard I noticed that afterwards the JAVA_HOME variable did not exist any longer.

Digging a bit further into the log files I found the following entry confirming that the installer could not locate Java correctly:

The JDK wasn’t found in directory .
Please edit the setWLSEnv.cmd script so that the JAVA_HOME
variable points to the location of your JDK.
Your environment has not been set.

After which I took a look into this script and noticed (eventually!) that it calls another script called %WL_HOME%\common\bin\commEnv.cmd which… wait for it… actually resets your JAVA_HOME environment variable if the JAVA_VENDOR variable is not defined!

So, before starting the domain creation wizard, make sure that you have defined both variables, e.g.

JAVA_HOME=C:\Java\jdk1.6

JAVA_VENDOR=Sun

This seems to have done the trick, but sadly is an example of how all of the boxes need to be ticked before starting your installation.