Friday, October 14, 2016

What happens to attached volumes when snapshotting VMs on an OCM

Introduction

With OCM there is the capability to take a "snapshot" of a virtual machine.  As described in the documentation a snapshot will take a copy of the machine image boot disk.  Essentially there are two purposes for this action to take place, firstly if we take a machineimage then we can use that machineimage to create additional VMs from this copy - using it as a template.  Secondly it can be used as a backup copy to recreate the VM if needed.  (Really just the same as the first but using the copy to recreate rather than clone.)

This leads me to think of a couple of questions that this blog posting will be answering.

  1. What happens to storage volumes that have been added to the VM.  Are these copied as well?
  2. Is this a good mechanism to increase the root volume size to make more space for VMs that might want a bit more disk space?

I have tested two specific scenarios out starting from a VM based on the OL6 template with an attached volume.

  1. use the new volume to add to the root disk logical volume
  2. create a new logical volume and attach it to the filesystem, say from /u01.
Snapshot both cases and create a new VM from the machineimage created and look to see what happened.

 Extending root volume

In the first case I simply create a VM from the OL6 base template using a simple orchestration.  I create an additional volume then attach the new volume to the VM.  Having created the VM I log onto it and use the unix commands for LVM to extend the size of the root disk.

The steps taken are:-
  1. Use fdisk to format the attached volume to LVM.
  2. Use lvdisplay and vgdisplay to identify the current root volume (Prob VolGroup00)
  3. Use vgextend to extend the current volume group to add the storage from the attached volume.
  4. Use lvextend to make the root logical volume larger
  5. Use resize2fs the device mapper to make the extra space available to the filesystem.

Some of the key commands and output are shown below.  The result of all these commands is that the root filesystem has grown from 11G to 61G using all the 50Gb in the attached volume.

#df -kh
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00   11G  3.3G  6.9G  33% /
tmpfs                            873M     0  873M   0% /dev/shm
/dev/xvda1                       239M   55M  168M  25% /boot
/dev/mapper/VolGroup00-LogVol02  2.0G  3.0M  1.9G   1% /opt/emagent_instance


# vgextend VolGroup00 /dev/xvdb1
  Volume group "VolGroup00" successfully extended
# pvdisplay
  --- Physical volume ---
  PV Name               /dev/xvda2
  VG Name               VolGroup00
  PV Size               17.75 GiB / not usable 2.12 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              4543
  Free PE               191
  Allocated PE          4352
  PV UUID               VnZ5PZ-8IrP-nggg-B7Fo-9sog-g4bw-skbPBE
  
  --- Physical volume ---
  PV Name               /dev/xvdb1
  VG Name               VolGroup00
  PV Size               50.00 GiB / not usable 3.31 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              12799
  Free PE               12799
  Allocated PE          0
  PV UUID               c1m21x-2IBX-Qrsy-1AJI-3UQ7-apUe-weETeq


# lvextend -L+55G /dev/VolGroup00/LogVol00
  Extending logical volume LogVol00 to 66.00 GiB
  Insufficient free space: 14080 extents needed, but only 12990 available

# lvextend -l+12990 /dev/VolGroup00/LogVol00
  Extending logical volume LogVol00 to 61.74 GiB
  Logical volume LogVol00 successfully resized
 

# resize2fs /dev/mapper/VolGroup00-LogVol00
resize2fs 1.43-WIP (20-Jun-2013)
Filesystem at /dev/mapper/VolGroup00-LogVol00 is mounted on /; on-line resizing required
old_desc_blocks = 4, new_desc_blocks = 4
The filesystem on /dev/mapper/VolGroup00-LogVol00 is now 16185344 blocks long.

# df -kh
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00   61G  3.3G   55G   6% /
tmpfs                            873M     0  873M   0% /dev/shm
/dev/xvda1                       239M   55M  168M  25% /boot
/dev/mapper/VolGroup00-LogVol02  2.0G  3.0M  1.9G   1% /opt/emagent_instance

I now use the UI from EMCC to create a snapshot of the VM.  This could also be done from the command line.

Snapshotting a VM
The snapshot takes a few minutes to complete and once done there is a template available that will allow the creation of new VMs.

Snapshot appearing as a template in the OCM library



Creating a VM based on the snapshot


Once the new VM is created just log on and have a look at the root disk size.  It is immediately clear that it has a root disk of 61Gb and no attached volumes so by expanding the root LVM partition and snapshotting it will effectively increase the size of the disk in the machine image.

Adding the volume as a new partition/logical volume

The same process was completed for a volume being attached but this time rather than extendingVolGroup00 I created a new physical volume, volume group and logical volume which I mounted on /u01.  (fdisk to format volume to LVM [8e type], pvcreate to create the physical volume, vgcreate to create  a volume group using the new volume, lvcreate to create the logical volume and then mount the logical volume of /u01.)  Exactly the same process was used to create a snapshot and then create a new VM from the machineimage that was created.  This time round the resulting disk space on the new VM was just 11Gb - the disk size of the original template.  i.e. The snapshot has ignored the additional volume and done what the docs say, specifically take an image of the machine's boot disk.

Conclusion

As a mechanism to increase the root disk space of a template the approach of creating a VM with an attached volume and using this volume to extend the size of the root volume group/logical volume will enable a new template with larger disk space.

If you are planning to create a VM with an attached volume and think that the snapshot will backup the entire VM then think again.  It will be necessary for you to also snapshot any volumes that are not part of the machine image boot disk to create a full snapshot of your virtual machine.  


Friday, September 23, 2016

Using Reporting in Enterprise Manager

Introduction

A common question that arises when talking to customers is to have visibility on the quantity of compute resource (CPU, Memory and Storage) that is being used.  This information is fairly easy to extract from the oracle-compute command line but it also is surfaced in Enterprise Manager.  As an exercise I wanted to try and produce a report from Enterprise Manager which gives the detail on the compute resource used.  This blog posting is just a capture of my experiences and not necessarily a best practice approach to reporting using EM12c against an OCM.

EM12c Reporting Overview

Enterprise Manager, as a monitoring tool, captures a great deal of information about anything it is monitoring.  Both configuration details and of course some historical information on the usage.  Provided you are logged in to the tool as a user with permissions to access reports and use the BI Publisher to create custom reports you will be able to find the reports under the Enterprise menu which typically is in the top left corner of the screen.


There are two options under reports, Information Publisher Reports which is a list of pre-defined reports that can be run to pull out commonly used reports and the BI Publisher Enterprise Reports.  The BI Publisher approach is the preferred route to use as this is now the report generator of choice for Enterprise Manager, others are deprecated and may eventually be dropped.  Like the Information Publisher Reports there are a series of out-the-box reports you can utilise but for many cases a custom report is the way to go. 

Creating a BI Publisher Report

BI Publisher is an incredibly powerful reporting tool that can query any database (or indeed even other sources of data) and push that data into an on-line report that can then be run on a regular basis, converted into PDF e-mailed out or run ad-hoc as needed.  With Enterprise Manager the main source of data is the underlying database of EM12c.

To create a report the process is essentially a two step process.  Firstly you must create a datamodel where you specify which tables to query, what the associations are between the tables, add filters and conditions to extract the specific data of interest.  Once the model has been defined you can optionally add in additional query parameters which can tune the report at run-time.  Once done you build a report up based on the data in the model, the report is built using simple wizards to produce tables of data, total up columns, display details on various graph types etc.

Building the DataModel for OCM

The Oracle Cloud Machine makes use of an EM12c Virtual Infrastructure plugin for most of the monitoring and management functionality.  This plugin stores much of its data in the tables that are prefixed with "MGMT$VI" and using BI Publisher we can create a new data model  that will pick the data we are interested.  When creating a new data model the default data source is called EMREPOS which is the database used by Enterprise Manager.  We can then simply type in the SQL if we know it in advance or alternatively use a "Query Builder" which allows us to dynamically build up the query using a fairly intuitive web based GUI.




The query builder allows us to drag and drop the tables onto a palate and select the fields we are interested in.  We can add conditions to the query, define linkage between tables etc.



In our specific use case we are looking to understand the resource used by the virtual machines, specifically allocated CPU, Memory and the storage volumes that have been added to the VMs.  This information is available in two tables, MGMT$VI_NM_OSV_CFG_DETAILS and MGMT$VI_NM_STORAGE_CFG.

BI Publisher has a mechanism to allow the report user to specify "parameters" which can be used to filter the data returned by the model.  It seems sensible to be able to query the model by tenancy I have added into the data model a parameter which will allow the user to specify one or more tenancies to report on.  For these tables the tenancy is effectively defined in the Quota. (an alternative breakdown might be per-orchestration)  To build up a parameter we have to create a "list of values" which the user can select from, as with the main data set this is defined via SQL queries against the database.  To show all tenancies I used the following SQL query:-


select "MGMT$VI_NM_OSV_CFG_DETAILS"."QUOTA" as "QUOTA" from "MGMT_VIEW"."MGMT$VI_NM_OSV_CFG_DETAILS" "MGMT$VI_NM_OSV_CFG_DETAILS" 
 where "MGMT$VI_NM_OSV_CFG_DETAILS"."QUOTA" !='RANDOMTEXT' 
   AND VNC_URL=(SELECT MAX(VNC_URL) from MGMT_VIEW.MGMT$VI_NM_OSV_CFG_DETAILS "B" WHERE b.QUOTA=MGMT$VI_NM_OSV_CFG_DETAILS.QUOTA)

This allows me to build up a list of tenancies (quota) which has been de-duplicated via the where clause.  (Could not get select distinct to work....)  This value list (the tenancies) is used as the selection for the Parameter which will be presented on the report to allow the user to narrow the report down to specific tenancies.



As shown in the screnshot I have selected to allow the user to chose multiple tenancies to report on or all the tenancies.  If all then a comma separated list of all tenancies on the rack is passed in as the parameter to the report.

Building the report

Once done we can turn attention to the report.  The first thing to do is to click on the data tab in the data model and press View to have a look at what data is actually returned by your datamodel.  If it looks like the correct information is being returned then click on "Save as Sample Data" and the data you returned is saved and used as the basis for data shown as the report is developed.

The easiest way to create the report from here is to click the "Create Report" button on the top right of the screen, this will open up a wizard to allow you to create the report and add charts and data tables to the report.




Having completed the report design we can then run the report.  In the screenshot below I have picked out three of the tenancies I am interested in and we can see at a moments glance that the JCS Demo tenancy is using most memory and CPU while the DBCS demo account is using most storage space.  Exactly as we would expect for a relatively small application.


Conclusion

Even although the OCM is managed by Oracle as a tenant user of the OCM rack it is fairly easy to use Enterprise Manager to gain insight into the usage of the OCM and BI Publisher provides a way to extract data to put into useful management reports.

Tuesday, June 7, 2016

Encrypting "disks" on Oracle Cloud Machine

Introduction


The Oracle Cloud Machine, like the public cloud, is administered by Oracle.  While the Oracle staff who manage the rack are highly skilled professionals and all their actions audited there is an obvious concern about the security of customer data at rest.  On the OCM the administrators of the rack have no direct access to the customer's virtual machines.  This article demonstrates how storage volumes can be used by a tenant to mount block storage devices that are encrypted and hence further obscured from system administrators.

(As a side effect of demonstrating the security aspect this is also a useful reference for using cryptsetup to encrypt disks.)

Setup


In order to demonstrate that a storage volume is encrypted and hence not visible to cloud administrators we do a very simple setup where two storage volumes are created, one to be encrypted and the other left in plain text.  These volumes are "attached" to a virtual machine and then within the virtual machine we use the linux utility cryptsetup to encrypt one of the volumes the other is simply mounted with an ext4 filesystem on it.  Plain text files are created in both volumes and then we will switch to the cloud administration side of things to see if it is possible to read the content of the two volumes.

Virtual Machine Instance Creation



First of all we create two storage volumes.  This can be done from the command line easily.


# oracle-compute add storagevolume /osc/public/encrypt-storage-001 10G /oracle/public/storage/default --description "A test 10Gb storage volume that we will try to have encrypted" 

# oracle-compute add storagevolume /osc/public/plain-storage-001 10G /oracle/public/storage/default --description "A test 10Gb storage volume that we will try to have encrypted"


Then we create a virtual machine via an orchestration defined in a json file

# cat simple_vm_with_storage.json
{
"name": "/osc/public/encryption-vm",
"oplans": [
{
 "obj_type": "launchplan",
 "ha_policy": "active",
 "label": "encryption volume launch plan",
 "objects": [
 {
 "instances": [
 {
 "label": "encryption-vm001",
 "imagelist": "/oracle/public/linux6_16.1.2_64",
 "networking":
 {
   "net0": { "vnet": "/osc/public/vnet-eoib-1706" }
 },
 "storage_attachments": [
 { "volume": "/osc/public/encrypt-storage-001", "index": 1},{"volume": "/osc/public/plain-storage-001", "index": 2}],
 "shape": "ot1",
 "sshkeys": ["/osc/public/labkey"],
 "attributes":
 {
 "userdata":
 {
 "key1": "value 1",
 "key2": "value 2"
 }
 }
 } ]
 } ]
} ]
}



This json file will create a single instance called encryption-vm001 based on the OL6 base template, connect it to the EoIB public network and attach the two storage volumes that we created earlier.  (Storage volumes created independently of this orchestration in this case.)

We upload the orchestration and start it.  Once up and running then the instance will be listed as running and we can see the IP address assigned to it.

# oracle-compute add orchestration ./simple_vm_with_storage.json 


(see above for json)

# oracle-compute start orchestration /osc/public/encrytption-vm

# oracle-compute list instance /osc -Fname,state,ip


Configuring volumes within instance


Having created and started up our instance we can look at the attached volumes and run through the process using Oracle Linux to setup one of the volumes as an encrypted one.   To see the volumes on the instance we use the fdisk command.



# fdisk -l

Disk /dev/xvda: 19.3 GB, 19327352832 bytes
255 heads, 63 sectors/track, 2349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c520c


    Device Boot      Start         End      Blocks   Id  System
/dev/xvda1   *           1          32      256000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/xvda2              32        2349    18611318+  8e  Linux LVM



Disk /dev/xvdb: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000



Disk /dev/xvdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/VolGroup00-LogVol01: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/VolGroup00-LogVol00: 11.8 GB, 11811160064 bytes
255 heads, 63 sectors/track, 1435 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/VolGroup00-LogVol02: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


With the OCM each volume that is attached gets an index, in the orchestration above we use indexes 1 and 2.  These numbers equate to the xvd<char> devices that appear in the fdisk output.where 1 equates to b, 2 equates to c etc.  Thus in the output above the two attached volumes are /dev/xvdb and /dev/xvdc.  The next step is to setup one of the volumes as an block device encrypted one.  To do this I used the linux command cryptsetup defining cipher information etc.  In the example shown below I show it run twice as the first time I answered the question with a lower case yes.  The command mandated uppercase YES as an answer.  Easy mistake to make!



# cryptsetup --verbose --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-random luksFormat /dev/xvdb



WARNING!
========
This will overwrite data on /dev/xvdb irrevocably.


Are you sure? (Type uppercase yes): yes
Command failed with code 22: Invalid argument

# cryptsetup --verbose --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-random luksFormat /dev/xvdb

WARNING!
========

This will overwrite data on /dev/xvdb irrevocably.
Are you sure? (Type uppercase yes): YES
Enter LUKS passphrase:
Verify passphrase:
Command successful.




Now we can open the encrypted drive such that it appears as normal.  This will create the /dev/mapper/<name> device file and allow it to be mounted by the OS.  The luksOpen command will prompt for the passphrase used earlier.

# cryptsetup luksOpen /dev/xvdb encrypted-drive

# cryptsetup -v status encrypted-drive
/dev/mapper/encrypted-drive is active.
  type:  LUKS1
  cipher:  aes-xts-plain64
  keysize: 512 bits
  device:  /dev/xvdb
  offset:  4096 sectors
  size:    20967424 sectors
  mode:    read/write
Command successful.


This is a new raw volume so we need to put some sort of filesystem onto it.  In this case I use the ext4 filesystem.


# mkfs.ext4 /dev/mapper/encrypted-drive
mke2fs 1.43-WIP (20-Jun-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
655360 inodes, 2620928 blocks
131046 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2684354560
80 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
Allocating group tables: done                           
Writing inode tables: done                           
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now simply create a directory where we can mount the encrHypted drive and create a simple text file.

# mkdir /u01
# mount /dev/mapper/encrypted-drive /u01
# df -kh
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00   11G  3.3G  7.0G  32% /
tmpfs                            3.8G     0  3.8G   0% /dev/shm
/dev/xvda1                       239M   55M  168M  25% /boot
/dev/mapper/VolGroup00-LogVol02  2.0G  3.0M  1.9G   1% /opt/emagent_instance
/dev/mapper/encrypted-drive      9.8G   23M  9.2G   1% /u01


 Having done this we can do a quick check to ensure that we can unmount and close the encrypted disk and re-open it providing the passphrase and mount it for use.


#umount /u01
# cryptsetup luksClose encrypted-drive
# mount /dev/mapper/encrypted-drive /u01
mount: you must specify the filesystem type



# cryptsetup luksOpen /dev/xvdb encrypted-drive
Enter passphrase for /dev/xvdb:
# mount /dev/mapper/encrypted-drive /u01
# df -kh
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00   11G  3.3G  7.0G  32% /
tmpfs                            3.8G     0  3.8G   0% /dev/shm
/dev/xvda1                       239M   55M  168M  25% /boot
/dev/mapper/VolGroup00-LogVol02  2.0G  3.0M  1.9G   1% /opt/emagent_instance
/dev/mapper/encrypted-drive      9.8G   23M  9.2G   1% /u01




Using fdisk we can format the /dev/xvdc volume, create a file system on this volume and mount it into another directory.  Then create a plain text file in this volume as well.   If the encryption has all worked then cloud operations may be able to access the plain text volume and read the content but the encrypted volume content is kept secret unless the passphrase is known.

Testing

As a general rule cloud operations do not have access to the customer's virtual machines unless the customer shares the login credentials or the ssh keys with Oracle.  However, because the OCM stores the volumes as raw disk images on the internal ZFS storage appliance in the EPC_<rack>/storagepool1 filesystem it is possible for cloud operations to access these files and mount the images directly to access the content.


As a cloud operations user I have accessed the ZFS storage device and can copy the storage volume disks off the rack.  In a linux server I attempt to mount these volumes to see the content.

# file plain_storage.raw
plain_storage.raw: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files)
# mount -o loop ./plain_storage.raw /mnt/don
# cat /mnt/don/don-plain

This text is in the unencrypted volume and hence should be readable by anyone.....
# unmount /mnt/don




So it is obviously fairly easy to access the unencrypted storage.  Now lets see what is involved in accessing the encrypted storage volume.

# file encrypted_storage.raw
encrypted_storage.raw: LUKS encrypted file, ver 1 [aes, xts-plain64, sha512] UUID: edff3d80-3813-4abc-a58c-e2f1862

# mount -o loop ./encrypted_storage.raw /mnt/don
mount: unknown filesystem type 'crypto_LUKS'

# losetup /dev/loop0 ./encrypted_storage.raw
# mount /dev/loop0 /mnt/don
mount: unknown filesystem type 'crypto_LUKS'


# cryptsetup luksOpen /dev/loop0 encrypted-dev
Enter passphrase for /dev/loop0:

# mount /dev/mapper/encrypted-dev /mnt/don

# cat /mnt/don/don

some text

#


In the above I attempt to mount the encrypted filesystem using the same mechanism as previously was successful but to no effect.  The only way to mount the disk is to make use of the cryptsetup command which mandated entering the passphrase.  Obviously the passphrase is not something that is shared with cloud operations so they would be unable to access the content of the raw file.

Conclusion

Certainly using the standard linux command of cryptsetup it is a relatively simple task to encrypt any storage volume that is mounted on a VM such that the data is kept private to the end customer/tenant and cloud operations has no mechanism of seeing the content.

The down side of encrypting is that it means that the administrator of the virtual machine (end customer) has to log on and provide the passphrase to mount the volume.  Not a major problem unless you are looking at trying to automatically start up the applications deployed that use the encrypted volume, in this case it becomes necessary to have a manual startup procedure.

Monday, June 6, 2016

Introducing Oracle Cloud Machine


As mentioned in my exablurb blog I am now working beyond the bounds of Oracle Engineered Systems to incorporate the Oracle Cloud Machine as introduced here.  As such this blog will include postings that apply to both Exalogic and the Cloud Machine.


I'll start by quoting one of the PM's from the cloud machine to explain just what the cloud machine actually is:

"Oracle Cloud Machine is a cloud offering which gives you new choices for the Oracle Cloud Platform by bringing the Oracle Cloud to your data center. Leveraging our Public Cloud’s PaaS and IaaS capabilities, it enables the innovation that cloud provides, at the same time meeting the business and regulatory requirements behind your firewall. It provides a stepping-stone in the journey to cloud, as it allows you to get the advantages of cloud faster, easier and with less disruption. As an on- premises implementation of Oracle Cloud, Oracle Cloud Machine lets you run your applications seamlessly wherever you want, as workloads are completely portable between the public cloud and your data center. You can now leverage the latest innovations for rapid development that cloud provides, all while meeting any data sovereignty and residence requirements. It also provides subscription based pricing in your data center, managed by Oracle, with single vendor accountability."

Or to put it simply the cloud machine is a bunch of compute services that Oracle come along and install in your data center and then run it as a service such that you, as a customer, can consume IaaS and PaaS services without having to worry about building up the management infrastructure and on-going operational management of the platform.


Lots more information can be found from the public documentation.

Wednesday, November 25, 2015

Networks that span multiple Engineered Systems/Exalogic Accounts

This blog post is to introduce some functionality that has fairly recently (~Oct 2015) become available that allows additional infiniband shared networks to be defined.  This enables internal networks to span accounts or be extended to other Engineered Systems.

Historically an Exalogic rack is setup with two internal (IPoIB) networks that have IP addresses which can be handed out to vServers in all accounts, these are the vServer Shared Storage and the IPoIB Default networks.  Any vServers on the storage network are limited members and full members of the infiniband default network. It is possible to override the membership of a virtual machine to allow vServers to communicate to each other internally on the Infiniband storage net.

Security concerns about using the IPoIB default network to allow inter-vServer communication alongside access to the database tier has meant that this network tends not to be used to allow cross-account conversations.   The only other mechanism to allow network traffic between accounts was to use a public EoIB network which has the downside of preventing the Infiniband high performance protocols and mandating the smaller MTU sizes and thus is sub-optimal for performance based applications.

Recent changes in Exadata have introduced support for the use of non-default partitions.  Indeed, when Exadata is setup to make use of the database running in a virtual machine the normal configuration will be such that there is no use of the IPoIB_default partition (0x7fff).   This was a problem for Exalogic which historically only had access to Exadata over the IPoIB-default network.

The standard configuration of a virtualised Exadata is to have two IB partitions, one that allows the database server to talk to the storage servers and another that will connect the virtual machine to another virtual machine on the Exadata so that a distributed RAC cluster can be setup and use IB for inter-cluster communications.  Obviously if Exalogic wants to communicate to Exadata using the Infiniband Optimised protocols the Exalogic must be able to link in with the Exadata over a non-default infiniband partition.  This is depicted in figure 1 below.


Figure 1 - Connecting EL and ED using non-default Infiniband Network

This example shows a two tier application deployed to Exalogic, the web tier which has access to the EoIB client network, potentially hosting an application like Oracle Traffic Director.  This can forward requests on to an application tier over an internal private network and then the application tier is linked to another IPoIB internal network but this is what might be considered a "public private network" meaning that this network can be handed out to vServers and provide linkage to the Exadata virtual machines which have had this specific network (partition) allocated to them.  The Exadata also has two other internal IB networks, one to allow the RAC cluster to communicate between the DB servers and another to allow access to the storage cells.

The approach to creating this non-default network that spans both Exalogic and Exadata introduces a couple of potential options.  Firstly to extend a private network from an Exalogic account into the Exadata rack and secondly to create a new Exalogic Service network which can span multiple Exalogic accounts.

Extending a Private Network

In this scenario we create a private network within an Exalogic account and then expand the Infiniband partition into the Exadata.  This means that access to the Exadata is kept purely within an Exalogic account.  The steps to go through are:
  1. Create a private network in an Exalogic Account
  2. Edit the network to reserve the IP addresses in the subnet that the Exadata will use.
  3. Identify the  pkey value that this new network has been assigned
  4. Using the IB command line/Subnet Manager make the new partition extend to the Exadata switches and database servers.
  5. Recreate the Exadata virtual machines adding the new partition key to the virtual machine configuration file used.
  6. Configure the Exadata VM to use an IP address made unavailable to the Exalogic

Creating a new "Service" Network

This is a slightly more flexible approach than the first scenario as we create a new "public private" network and then allocate IP addresses on this network to each account that will need access to it.  This is also useful in the use cases that Exadata is not involved because it allows certain virtual machines to be setup as a service provider and others as service consumers.  A provider being an IB full member of the partition and a consumer a limited member.  Thus all consumers can access and use the service provider functions but the consumers cannot "see" each other.

This example is is for the connected Exadata that we discussed earlier.  In this case the process to follow is:-

  1. Run the command to create the new IPoIB network.  It can be setup such that all vServers will be limited or full members by default.
  2. Manage the subnet in Exalogic so it will not hand out the IP addresses you want to reserve for the Exadata
  3. Allocated a number of IP addresses from this new network to each account that will use it.  Same process that is used for EoIB networks, storage network or the IPoIB Default network today.
  4. Create vServers in the accounts with an IP address on the service network.
  5. Identify the pkey for the service network and extend the partition to the Exadata switches and DB server nodes.  The primary difference here is that if the Exadata was setup first then the first step in this process would have been to specify the pKey that was originally used by the Exadata.  (i.e. Either the Exadata or the Exalogic can be the first to specify the pKey.)
    1. Warning - The pKey being used is defined manually.  Make sure it will not overlap with any pKeys that Exalogic Control will assign.
  6. Recreate the database virtual machines assigning the pkey to their configuration and within the VM specify the IP address you want them to use.  
  7. Test 
Note - The technical details on how to achieve this are fully documented in an Oracle support note.  Get in touch with your local Oracle representative find out more.

Tuesday, July 7, 2015

Oracle Traffic Director - Deployment options, Virtual Servers vs Configurations

Summary

Oracle Traffic Director (OTD) is a powerful software load balancing solution.  As with most good products there is a degree of flexibilty in how it can be deployed with different approaches allowing the solution to be formed.  This article discusses two options that could be used to determine different routing possiblilties.

The scenario that is being considered is a need to perform two separate load balancing activities in the same OTD environment.  For example, load balancing to an older SOA 11g deployment and to SOA 12c for recent integration deployments.  Another possible example would be two routes to the same back end service but one is designed for high priority traffic while the other route will throttle the service at a preset load.   The two options that are discussed are:
  1. Using two separate configurations, one for SOA 11g and one for SOA12c.
  2. Using one configuration that has two virtual servers.  The virtual servers handling the routing for each environment.
Needless to say either option can be appropriate and it will depend on the details of the overall solution and to some extent personal preference to determine the right answer for a particular customer environment.  Of course other options such as more complex routing rules within a single configuration or multiple OTD domains are also options to think about.

OTD Configuration Overview

Simple configuration

An OTD deployment, in its simplest form, consists of an administrative instance which manages the configuration and a deployed instance.  The deployed configuration specifies the HTTP(S)/TCP listening port, routing rules to one or more origin servers, logging setup etc.   In many situations there is a business need to use OTD to manage requests to different business applications or even just to different environments/versions of an application.  It is obviously possible to split these out by using independent deployments of OTD however to minimise the resources required and keep the number of deployed components to a minimum there are options to use one administration server.

The base configuration options


The minimum configuration that will appear for a configuration is a setup which defines things like the listening ports, SSL certificates, logging setup and critically at least one origin server pool and a virtual server.  The origin server pool is a simple enough concept in that it defines the back end services to actually fulfil the client requests. 

Using Virtual Servers

The virtual servers provide a mechanism to isolate traffic sent to the software load balancer.  Each virtual server contains its own set of routing rules which can determine the origin servers to send requests to, caching rules, traffic shaping and overrides for logging and the layer 7 firewall rules.  The virtual server to be used for subsequent processing is identified by either the listening port or the hostname used to send the request.

Virtual Server example - Routing based on otrade-host
Virtual Server example - Routing based on websession-host
So in the above example both hostnames otrade-host and websession-host resolve to the same IP address in DNS (or in the clients local /etc/hosts file).   In this case two virtual servers also use the same listener.  If the client makes a request to access otrade-host then the first virtual server is used and if they request websession-host then the second's rules are used.

There is always at least one virtual server.  By default this is created and the hosts field left blank such that it is used if any traffic hits the listening port.

Solution Variations

Multiple Configurations

Overview

In this setup two configurations can be defined and deployed.  It is quite possible to have both configurations deployed to the same OS instances. (Admin node in OTD talk.)  The result of deploying the configuration to the admin nodes is the creation of another running instance of OTD.
Running multiple configurations
Thus in the example shown above we have three OS instances, one to host the admin server which could be co-located with the actual instances.  There are two OS instances which host two OTD servers, one for each configuration.  I have shown two OS instances to run the configuration to indicate that they can be setup in a failover group to provide HA, each config can utilise a different VIP.

Advantages

  • Each configuration is managed independently of each other.  (within the one administration server) 
    • The settings are independent of each other.
    • The running instances for each configuration are independent of each other. i.e. Can be stopped and started without impacting the other configuration instances running.
  • Simple to understand

Disadvantages

  • Care must be taken to ensure that the configurations do not have clashes with each other.  (eg. Same listenting ports)
  • Results in more processes running on each OS instance.

Multiple Virtual Servers

Overview

In this situation there is one configuration with multiple virtual servers which result in different routing rules being applied to send requests on.   In the diagram below we have deployed a single configuration to two OS instances with the configuration containing two virtual servers.  As per the multi-config option I have shown two OS instances to indicate that the failover group can be used for HA.

OTD Using Two Virtual Servers

Advantages

  • One configuration that provides visibility of all configuration in the environment.
  • Minimal running processes
    • Simplifying the monitoring
    • Reducing resources required to run the system

Disadvantages

  • Introduces dependencies between the environments
    • eg. Can share listeners, origin server pools, logging config etc.  Thus one change can impact all instances
    • eg. Some changes mandate a restart of an instance.  A change for one config may have an impact on load balancing for the other environment.
  • Complexity of a single configuration
  • Dependencies on external factors.  (DNS resolution of hostnames/firewalls for port access.)

Conclusions

There are no hard and fast rules to figure out which approach is the best one for you.  It will ultimately depend on the requirements for the load balancing.  If a configuration is changing frequently and is functionally independent then I would tend to go for the multiple configuration route.  If on the other hand simplicity of monitoring and minimal resource footprint alongside a fairly static configuration was the situation I would tend to use the multiple virtual server approach.

Essentially the classic IT answer of "it depends" will apply.  Only a good understanding of the requirements will clarify which way to go.  (Although if you are using OTD 11.1.1.6 then you might be better with the virtual server approach as there are a few limitations to the VIPs using keepalive for the failover groups)

Friday, October 31, 2014

Disaster Recovery of WLS Applications on Exalogic

Introduction

For many years Oracle Fusion Middleware based on WebLogic server has been capable of being used to provide high availability, fault tolerance and disaster recovery capabilities.  This has been documented as part of the Maximum Availability Architecture whitepapers. Follow this link for all the MAA documentation or follow this link to go directly to the Fusion Middleware Disaster Recovery architecture documentation.

Exalogic/Exadata provides an ideal platform on which these architecture can be realised with all the advantages that come with using Oracle Engineered systems.

This blog posting gives a very high level overview of the principles used in implementing active/passive DR for a fusion middleware application.  Much of the activity involved from an application perspective is identical irrespective of the deployment being on physical or virtual hardware.  In this article we will have a slightly deeper dive on how the Exalogic ZFS storage appliance is used to enable the DR solution.

Basic principles involved in setting up FMW DR

The basic tenet of deploying an application is to follow a set of rules during the deployment/configuration of the application which will make it simple to start the application up on the DR site.  The setup should be:
  1. Deploy all tiers of the application ensuring:-
    1. In the primary environment a set of hostname aliases are used for all configuration, these aliases not linked to the specific host and all configuration in the products specify these names rather than actual IP addresses.
    2. The binary files and application configuration  (normally the domain homes) are all located as shares on the ZFS appliance and mounted via NFS to the Exalogic vServers.
    3. Critical application data that must be persisted goes into the Database.  Specifically thinking of the WebLogic Transaction logs and the JMS messages.  (We will use the Oracle Data Guard product to ensure critical data is synchronously copied to the remote site)
    4. Keep the configuration in the Operating System to an absolutely minimum possible.  Probably no more than /etc/hosts entries and if needed specific service startup commands.  Other OS configuration should be built into the templates used to create the environment in the first place.
  2. Create mirror vServers on the DR site.
    1. These vServers will be used to host the production environment when DR has occurred.  The same minimal OS configuration should be present in this site.  To save time in DR the servers can be started up or they can be started on-demand at DR startup.  If already running then ensure that the application services are all shutdown.  The hosts files must have the same hostname aliases in it that the primary site has but obviously they will be resolving to different IP addresses.
  3. Create a replication agreement for all the shares that host the application binaries and domains.
  4. When DR is to happen   (ignoring DB)
    1. Break the replication agreement
    2. Export the replicated shares so that they can be mounted.
    3. Mount the replicated shares in exactly the same location on the DR vServers
    4. Startup the application on the DR environment
    5. Test and if OK then redirect traffic at the front end into the DR service.
Obviously this is somewhat simplified from most real world situations where you have to cope with managing other external resources, lifecycle management and patching etc.  However the approach is valid and can be worked into the operations run book and change management processes.

All these steps can be automated and put into the control of Enterprise Manager such that the element of human error can be removed from the equation during a disaster recovery activity.

Using the ZFS Storage Appliance for Replication

From the application perspective a key function lies with the NAS storage which has to be able to copy an application from one site to another.  The ZFS Storage appliance within an Exalogic is a fantastic product that provides exactly this functionality.  It is simple to set it up to copy the shares between sites.

Setup a Replication Network between sites

The first activity required when wishing to perform DR between two sites is to create a replication network between the ZFS appliance in both Exalogic racks.  This can be done using the existing 1GbE management network that already exists, however this is not recommended as this network is not fault tolerant, there being only one 1GbE switch in the rack.  However on the ZFS appliance there are two 1/10GbE network connections available on the back of each storage head (NET2 & NET3).  By default one connection goes into the 1GbE switch and the other is a dangling cable, thus two independent routes into the data centre are available.  If a longer wire is required to connect then it is possible to disconnect the existing ones and put in new cables.  (Recommendation - Get Oracle Field Engineers to do this, it is a tight squeeze getting into the ports and the engineers are experts at doing this!)

Once each head is connected via multiple routes to the datacenter and hence on to the remote Exalogic rack then you can use link aggregation to combine the ports on each head and then assign an IP address which can float from head to head so it is always on the active head and hence has access to the data in the disk array. 

Replicating the shares

Having setup the network such that the two storage appliances can access each other we now go through the process of enabling replication. This is a simple case of setting up the replication service and then configuring replication on each project/share that you want coped over.  Initialy setup the remote target where data will be copied to.  This is done via the BUI, selecting Configuration, and then the Remote Replication.  Click on the + symbol beside "Targets" to add the details (IP address and root password) of the remote ZFS appliance.

Adding a replication target
Once the target has been created we now setup the project/share to be replicated.  Generally speaking I would expect a project to be replicated which means that all the shares that are part of the project will be replicated, however it is possible to replicate at the share level only for a finer granularity.
To setup replication using the BUI simply click on the Shares and either pick a share or click on the Projects and edit the project level.  There is then a replication sub-tab and you can click on the "+" symbol to add a new "Action" to replicate. 

Replication of a project
Simply pick the Target that you setup earlier in the Remote Replication agreement, pick the pool - which will always be exalogic - and define the frequency.  Scheduled can be down to every half hour or Continuous means that it will start a new replication cycle as soon as the previous one completes.  There are a couple of other options to consider, Bandwidth limit so that you can prevent replication swamping a network, "SSL encryption" if the network between the two sites is considered insecure and "Include Snapshots" which will copy over the snapshots to the remote site. 

Obviously the latter two options have an impact on the quantity of data copied and the performance is worse if all data has to be travelling encrypted.  However, after the initial replication only changed blocks will be copied across and given that the shares are used primarily for binaries and configuration data there will not be a huge quantity flowing between the sites.

Process to mount the replica copies

Having completed the previous steps we have the binaries and configuration all held at the primary site and a copy on the remote site.  (Although bear in mind that the remote copy may be slightly out of date!  It is NOT synchronous replication.)  For DR we now assume that the primary site has been hit by a gas explosion or slightly less dramatic we are shutting down the primary site for maintenance so want to move all services to the DR environment.  The first thing to do is to stop the replication from the primary site.  If the DR environment is still running then this is as simple as disabling the replication agreement.  Obviously if there is no access to the primary then one must assume that replication has stopped.



Then on the DR site we want to make the replicated shares available to the vServers.  This is acheived by "exporting" the project/share.  To navigate to the replica share simply select the Shares and then the Projects listing or Shares listing appropriately.  Under the "Projects" or "Filesystems : LUNs" title you can click to see the Local or Replica filesystems.  By default the local are shown so click on Replica to see the data coped from a remote ZFS appliance.

Replicated Projects
We can then edit this project as you would for a local project.

Under the General tab there is the option to "Export", simply select this check box and hit apply and the share will be available to mount by the clients.  By default the same mount point that was on the primary site will be used on the DR site.

Health Warning : When you export a project/share then all shares with the same directory mount point are re-mounted on the client systems.  Make sure every project has a unique mount point.  If left at the default of /export then the Exalogic Control shares are also re-mounted which has the impact of rebooting compute nodes.  

Export checkbox to enable share to be mounted

Once the shares have been exported then the DR vServers can mount the shares, start the application services up and be ready to pick up from the primary site.  Finally, create the replication agreement to push data from the DR site back the primary until the failback happens to the primary site.

All the steps for DR once the environment has been correctly setup only take in the order of seconds to complete so the outage for the DR switchover can be taken down to seconds for the technical implementation aspects.