All nodes register their presence/make attendance using diks network heart-beat information on the voting disks.All nodes also make their attendance on voting disk using disk heartbeat.So it has to be on shared storage where all nodes can access them.There is always one "master" node that controls other nodes.
A node must be able to access more than half of the voting disks at any time.A node not able to do so will have to be evicted from the cluster by another node that has more than half the voting disks, to maintain the integrity of the cluster.
In the event of an outage like where private interconnect fails, the largest fully connected subcluster will be maintained/salvaged for further operation
It has to be odd number to determine who in the cluster should survive.Let's say we have 2 voting disks and 2 nodes.During interconnect failure,if node1 is able to access voting disk1 and Node2 is able to access voting disk2 then clusterware will not be able to decide which node to keep to maintain data integrity after interconnect failur(cache fusion). In other words , odd number of disk are required to avoid split brain and helps to decide which nodes to evict during the problem.
Network heartbeat is sent over the cluster interconnect by each node to ensure that all RAC nodes are available.When a node does not respond to a heartbeat signal, the instance is assumed to have crashed.When Nodes in cluster can't talk to each other they run to lock the Voting disk and whoever lock the more disk will survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4) then how to decide which node to evict .This situation is called split brain. When number is odd, one will be higher than other and each for cluster to evict the node with less number.
The number of voting disks that are created depend on the redundancy level of the diskgroup.If NORMAL redundancy , 3 voting disks will be created. For HIGH redundancy , 5 voting disks will be created. So minimum 3 disks are required in Normal redundancy diskgroup and 5 disks required in High redundancy diskgroup
Redundancy of the diskgroup #of copies of voting disk ( Minimum # of disks in the diskgroup)
External 1 1
Normal 3 3
High 5 5
NORMAL redundancy : diskgroup require at least two failgroups.File will be copied to disks in both failgroups for 2 way mirroring.
A normal redundancy diskgroup having voting disk should have minimum 3 disks in each failgroup.
HIGH redundancy : diskgroup require at least three failgroups.File will be copied to disks in all the three failgroups for 3 way mirroring.
A high redundancy diskgroup having voting disk should have at least 5 disks in each failgroup.
EXTERNAL REDUNDANCY : You cannot specify the FAILGROUP clause if you specify EXTERNAL REDUNDANCY.
NOTES :
$crsctl query css votedisk
Backup : Prior to 11g R2 , take backup using dd. 11gR2 onwards backup is taken automatically
Backup the voting disk file every time
- you add or remove a node from the cluster or
- immediately after you configure or upgrade a cluster.
You should have at least three voting disks.You can have up to a maximum of 15 voting disks
On ASM disk group , maximum voting disks that can be stored is 5 even if you have diskgroup with say 7 disks.The maximum number of voting disks that is supported is 15 which is posible only if you are using non-ASM storage
One storage: No mirroring - Create 2 Luns of 500Mb each
More than one storage: 1 Lun (500M) in each storage and Creating a diskgroup with normal redundancy.
+VOTE(Create 3 Luns of 500Mb each) – Storing Voting files and OCR mirror : Voting files can be stored in only one diskgroup.
+CRS – Storing OCR and ASM Spfile.
If the voting disk was earlier on disk(external redundancy) and you want to move to test disk group (Normal redundancy), you need to have 3 disks inthe test disk group before moving the voting disk to test disk group.
$ crsctl replace votedisk +VOTE
To move voting disk , spfile or OCR
explaining-how to store ocr voting disks and asm spfile on asm diskgroup
$ crsctl query css votedisk
To move : $ crsctl replace votedisk +VOTE
$ asmcmd spget
To move : $ asmcmd spmove '+CRSTMP/tstcluster/ASMPARAMETERFILE/REGISTRY.253.772133609' '+CRS/tstcluster/spfileASM.ora'
# /u01/app/11.2.0/grid/bin/ocrcheck
To move : # /u01/app/11.2.0/grid/bin/ocrconfig -add +CRS
# /u01/app/11.2.0/grid/bin/ocrconfig -add +VOTE
NOTE : If using only one diskgroup , it's better to have four disk in disk group. Since 3 voting disks are required in normal redendancy ,even if one disk fails oracle will silently create the voting disk from the failed disk into the spare disk. http://blog.oracle-ninja.com/2012/01/voting-disk-redundancy-in-asm/
If using failgroups,we can drop ASM Disks that contain Voting Disks as long as there are enough Disks left in the Diskgroup to retain the same number of Voting Disks (each inside a separate Failure Group)
Voting disk
1. Backup voting disk
11g : 11gR2 onwards voting disk contents are now backed up automatically in OCR
10g :[root ]# ./crsctl query css votedisk
0. 0 /dev/raw/raw2
located 1 votedisk(s).
[root]# dd if=/dev/raw/raw2 of=/backup/ocrvot_bkp/voting.dmp
41419+0 records in
41419+0 records out
11g : 11gR2 onwards voting disk contents are now backed up automatically in OCR
2. Adding voting disk
10g : Shutdown the Oracle Clusterware and then do any modifications to add/delete/move
[ root # ]crsctl stop crs
11gR1 : Oracle Clusterware can be online while performin add/delete/move
[root]# $GRID_HOME/bin/crsctl add css votedisk /dev/raw/raw5
or crsctl add css votedisk <cluster_fs/filename>
11gR2 : Oracle Clusterware can be online while performin add/delete/move and can be stored in ASM also.
With ASM diskgroup, no add/delete voting disk option available.
The number of votedisk is determined by the diskgroup redundancy.To add move to higher redundancy.
3. Remove votedisk :
10g : crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl delete css votedisk /dev/raw/raw1 -force or crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl delete css votedisk <cluster_fs/filename>
4. Move votedisk :
10g : crsctl add css votedisk /dev/raw/raw4 -force and then crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl add css votedisk /dev/raw/raw4 and then crsctl delete css votedisk /dev/raw/raw1
11R2 : crsctl add css votedisk <cluster_fs/filename> and then crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl replace votedisk +CRS
$ crsctl replace votedisk +CRS ( From external redundancy to Normal redundancy +CRS disk group )
$ crsctl replace votedisk /rac_shared/oradata/vote.test3 ( From Normal redundancy +CRS disk group to external redundancy )
10g : Shutdown the Oracle Clusterware and then do any modifications to add/delete/move
[ root # ]crsctl stop crs
[root]# ./crsctl query css votedisk
0. 0 /dev/raw/raw2
[root]# ./crsctl add css votedisk /dev/raw/raw4 ( Do not use -force , may corrupt OCR )
[root]# ./crsctl add css votedisk /dev/raw/raw5 ( Do not use -force , may corrupt OCR )
[root]# ./crsctl query css votedisk
0. 0 /dev/raw/raw2
1. 0 /dev/raw/raw4 ==>new addded Voting disk
2. 0 /dev/raw/raw5 ==>new addded Voting disk
11gR1 : Oracle Clusterware can be online while performin add/delete/move
[root]# $GRID_HOME/bin/crsctl add css votedisk /dev/raw/raw5
or crsctl add css votedisk <cluster_fs/filename>
11gR2 : Oracle Clusterware can be online while performin add/delete/move and can be stored in ASM also.
With ASM diskgroup, no add/delete voting disk option available.
The number of votedisk is determined by the diskgroup redundancy.To add move to higher redundancy.
3. Remove votedisk :
10g : crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl delete css votedisk /dev/raw/raw1 -force or crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl delete css votedisk <cluster_fs/filename>
4. Move votedisk :
10g : crsctl add css votedisk /dev/raw/raw4 -force and then crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl add css votedisk /dev/raw/raw4 and then crsctl delete css votedisk /dev/raw/raw1
11R2 : crsctl add css votedisk <cluster_fs/filename> and then crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl replace votedisk +CRS
$ crsctl replace votedisk +CRS ( From external redundancy to Normal redundancy +CRS disk group )
$ crsctl replace votedisk /rac_shared/oradata/vote.test3 ( From Normal redundancy +CRS disk group to external redundancy )
5. Restore Votedisk :
10g : restore using dd command
11.2+ : The voting disk contents are restored from a backup automatically when a new voting disk is added or replaced
OCR
1. OCR Backup
10g: [root]#$CRS_HOME/bin/ocrconfig -export /backup/ocrvot_bkp/ocr.dmp -s online
11gR1 : $CRS_HOME/bin/ocrconfig -manualbackup
11gR2 : CRSD automatically creates OCR backups every 4 hours, each full day and at the end of each week and retains the last three copies of OCR.
2. Add ocr mirror
[root]# ./ocrconfig -add
/dev/raw/raw3
Add OCRmirror:
10g : ocrconfig -replace ocrmirror /dev/raw/raw2
11R1 : ocrconfig -replace ocrmirror /dev/raw/raw2
11R2 : ocrconfig -add +OCRVOTE2
3. Remove OCR
10g : ocrconfig -replace ocr
11R1 : ocrconfig -replace ocr
11R2 : ocrconfig -delet +OCRVOTE2
4. Move OCR
10g : ocrconfig -replace ocr /dev/sdd1 ocrconfig -replace ocrmirror /dev/raw/raw4
11R1 : ocrconfig -replace ocr ocrconfig -replace ocrmirror /dev/raw/raw4
11R2 : ocrconfig -replace /cluster_file/ocr.dat -replacement +OCRVOTE
ocrconfig -replace +CRS -replacement +OCRVOTE
5. Restore OCR :
ocrconfig -restore <path/filename of OCR backup>
Add OCRmirror:
10g : ocrconfig -replace ocrmirror /dev/raw/raw2
11R1 : ocrconfig -replace ocrmirror /dev/raw/raw2
11R2 : ocrconfig -add +OCRVOTE2
3. Remove OCR
10g : ocrconfig -replace ocr
11R1 : ocrconfig -replace ocr
11R2 : ocrconfig -delet +OCRVOTE2
4. Move OCR
10g : ocrconfig -replace ocr /dev/sdd1 ocrconfig -replace ocrmirror /dev/raw/raw4
11R1 : ocrconfig -replace ocr ocrconfig -replace ocrmirror /dev/raw/raw4
11R2 : ocrconfig -replace /cluster_file/ocr.dat -replacement +OCRVOTE
ocrconfig -replace +CRS -replacement +OCRVOTE
5. Restore OCR :
ocrconfig -restore <path/filename of OCR backup>