edba: Voting disk

All nodes register their presence/make attendance using diks network heart-beat information on the voting disks.All nodes also make their attendance on voting disk using disk heartbeat.So it has to be on shared storage where all nodes can access them.There is always one "master" node that controls other nodes.

A node must be able to access more than half of the voting disks at any time.A node not able to do so will have to be evicted from the cluster by another node that has more than half the voting disks, to maintain the integrity of the cluster.

In the event of an outage like where private interconnect fails, the largest fully connected subcluster will be maintained/salvaged for further operation

It has to be odd number to determine who in the cluster should survive.Let's say we have 2 voting disks and 2 nodes.During interconnect failure,if node1 is able to access voting disk1 and Node2 is able to access voting disk2 then clusterware will not be able to decide which node to keep to maintain data integrity after interconnect failur(cache fusion). In other words , odd number of disk are required to avoid split brain and helps to decide which nodes to evict during the problem.

Network heartbeat is sent over the cluster interconnect by each node to ensure that all RAC nodes are available.When a node does not respond to a heartbeat signal, the instance is assumed to have crashed.When Nodes in cluster can't talk to each other they run to lock the Voting disk and whoever lock the more disk will survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4) then how to decide which node to evict .This situation is called split brain. When number is odd, one will be higher than other and each for cluster to evict the node with less number.

The number of voting disks that are created depend on the redundancy level of the diskgroup.If NORMAL redundancy , 3 voting disks will be created. For HIGH redundancy , 5 voting disks will be created. So minimum 3 disks are required in Normal redundancy diskgroup and 5 disks required in High redundancy diskgroup

Redundancy of the diskgroup #of copies of voting disk ( Minimum # of disks in the diskgroup)
External 1 1
Normal 3 3
High 5 5

NORMAL redundancy : diskgroup require at least two failgroups.File will be copied to disks in both failgroups for 2 way mirroring.
A normal redundancy diskgroup having voting disk should have minimum 3 disks in each failgroup.

HIGH redundancy : diskgroup require at least three failgroups.File will be copied to disks in all the three failgroups for 3 way mirroring.
A high redundancy diskgroup having voting disk should have at least 5 disks in each failgroup.

EXTERNAL REDUNDANCY : You cannot specify the FAILGROUP clause if you specify EXTERNAL REDUNDANCY.

NOTES :

$crsctl query css votedisk

Backup : Prior to 11g R2 , take backup using dd. 11gR2 onwards backup is taken automatically

Backup the voting disk file every time
- you add or remove a node from the cluster or
- immediately after you configure or upgrade a cluster.

You should have at least three voting disks.You can have up to a maximum of 15 voting disks
On ASM disk group , maximum voting disks that can be stored is 5 even if you have diskgroup with say 7 disks.The maximum number of voting disks that is supported is 15 which is posible only if you are using non-ASM storage

One storage: No mirroring - Create 2 Luns of 500Mb each

More than one storage: 1 Lun (500M) in each storage and Creating a diskgroup with normal redundancy.

+VOTE(Create 3 Luns of 500Mb each) – Storing Voting files and OCR mirror : Voting files can be stored in only one diskgroup.

+CRS – Storing OCR and ASM Spfile.

If the voting disk was earlier on disk(external redundancy) and you want to move to test disk group (Normal redundancy), you need to have 3 disks inthe test disk group before moving the voting disk to test disk group.

$ crsctl replace votedisk +VOTE

To move voting disk , spfile or OCR
explaining-how to store ocr voting disks and asm spfile on asm diskgroup

$ crsctl query css votedisk
To move : $ crsctl replace votedisk +VOTE

$ asmcmd spget
To move : $ asmcmd spmove '+CRSTMP/tstcluster/ASMPARAMETERFILE/REGISTRY.253.772133609' '+CRS/tstcluster/spfileASM.ora'

# /u01/app/11.2.0/grid/bin/ocrcheck
To move : # /u01/app/11.2.0/grid/bin/ocrconfig -add +CRS
# /u01/app/11.2.0/grid/bin/ocrconfig -add +VOTE

NOTE : If using only one diskgroup , it's better to have four disk in disk group. Since 3 voting disks are required in normal redendancy ,even if one disk fails oracle will silently create the voting disk from the failed disk into the spare disk. http://blog.oracle-ninja.com/2012/01/voting-disk-redundancy-in-asm/

If using failgroups,we can drop ASM Disks that contain Voting Disks as long as there are enough Disks left in the Diskgroup to retain the same number of Voting Disks (each inside a separate Failure Group)

Voting disk

1. Backup voting disk

10g :[root ]# ./crsctl query css votedisk

0. 0 /dev/raw/raw2

located 1 votedisk(s).

[root]# dd if=/dev/raw/raw2 of=/backup/ocrvot_bkp/voting.dmp

41419+0 records in

41419+0 records out

11g : 11gR2 onwards voting disk contents are now backed up automatically in OCR

2. Adding voting disk

10g : Shutdown the Oracle Clusterware and then do any modifications to add/delete/move
[ root # ]crsctl stop crs

[root]# ./crsctl query css votedisk

0. 0 /dev/raw/raw2

[root]# ./crsctl add css votedisk /dev/raw/raw4 ( Do not use -force , may corrupt OCR )

[root]# ./crsctl add css votedisk /dev/raw/raw5 ( Do not use -force , may corrupt OCR )

[root]# ./crsctl query css votedisk

0. 0 /dev/raw/raw2

1. 0 /dev/raw/raw4 ==>new addded Voting disk

2. 0 /dev/raw/raw5 ==>new addded Voting disk

11gR1 : Oracle Clusterware can be online while performin add/delete/move

[root]# $GRID_HOME/bin/crsctl add css votedisk /dev/raw/raw5
or crsctl add css votedisk <cluster_fs/filename>

11gR2 : Oracle Clusterware can be online while performin add/delete/move and can be stored in ASM also.
With ASM diskgroup, no add/delete voting disk option available.
The number of votedisk is determined by the diskgroup redundancy.To add move to higher redundancy.

3. Remove votedisk :

10g : crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl delete css votedisk /dev/raw/raw1 -force or crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl delete css votedisk <cluster_fs/filename>

4. Move votedisk :

10g : crsctl add css votedisk /dev/raw/raw4 -force and then crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl add css votedisk /dev/raw/raw4 and then crsctl delete css votedisk /dev/raw/raw1
11R2 : crsctl add css votedisk <cluster_fs/filename> and then crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl replace votedisk +CRS

$ crsctl replace votedisk +CRS ( From external redundancy to Normal redundancy +CRS disk group )

$ crsctl replace votedisk /rac_shared/oradata/vote.test3 ( From Normal redundancy +CRS disk group to external redundancy )

5. Restore Votedisk :

10g : restore using dd command
11.2+ : The voting disk contents are restored from a backup automatically when a new voting disk is added or replaced

OCR

1. OCR Backup

10g: [root]#$CRS_HOME/bin/ocrconfig -export /backup/ocrvot_bkp/ocr.dmp -s online

11gR1 : $CRS_HOME/bin/ocrconfig -manualbackup

11gR2 : CRSD automatically creates OCR backups every 4 hours, each full day and at the end of each week and retains the last three copies of OCR.

2. Add ocr mirror

[root]# ./ocrconfig -add /dev/raw/raw3

Add OCRmirror:

10g : ocrconfig -replace ocrmirror /dev/raw/raw2
11R1 : ocrconfig -replace ocrmirror /dev/raw/raw2
11R2 : ocrconfig -add +OCRVOTE2

3. Remove OCR

10g : ocrconfig -replace ocr
11R1 : ocrconfig -replace ocr
11R2 : ocrconfig -delet +OCRVOTE2

4. Move OCR

10g : ocrconfig -replace ocr /dev/sdd1 ocrconfig -replace ocrmirror /dev/raw/raw4
11R1 : ocrconfig -replace ocr ocrconfig -replace ocrmirror /dev/raw/raw4
11R2 : ocrconfig -replace /cluster_file/ocr.dat -replacement +OCRVOTE
ocrconfig -replace +CRS -replacement +OCRVOTE

5. Restore OCR :

ocrconfig -restore <path/filename of OCR backup>

Pages

Voting disk

Voting disk

OCR