Pages

Voting disk



All nodes register their presence/make attendance  using diks network heart-beat information on the voting disks.All nodes also  make their attendance  on voting disk using disk heartbeat.So it has to be on shared storage where all nodes can access them.There is always one "master" node that controls other nodes.

A node must be able to access more than half of the voting disks at any time.A node not able  to do so will have to be evicted from the cluster by another node that has more than half the voting disks, to maintain  the integrity of the cluster.

In the event of an outage like where private interconnect  fails,  the largest fully connected subcluster will be maintained/salvaged  for further operation

It has to be odd number to determine who in the cluster should survive.Let's say we have  2 voting disks and  2 nodes.During interconnect failure,if node1 is able to access voting disk1 and Node2 is able to access voting disk2 then clusterware will not be able to decide which node to keep to maintain data integrity after interconnect failur(cache fusion).  In other words , odd number of disk are required to avoid split brain and helps to decide which nodes to evict during the problem.

Network heartbeat is sent over the cluster interconnect by each node to ensure that all RAC nodes are available.When a node does not respond to a heartbeat  signal, the instance is assumed to have crashed.When Nodes in cluster can't talk to each other they run to lock the Voting disk and whoever lock the more disk will survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4) then how to decide which node to evict .This situation is called split brain. When number is odd, one will be higher than other and each for cluster to evict the node with less number.


The number of voting disks that are created depend on the redundancy level of the diskgroup.If NORMAL redundancy , 3 voting disks will be created. For HIGH   redundancy , 5 voting disks will be created. So minimum 3 disks are required in  Normal redundancy diskgroup and 5 disks required in  High redundancy diskgroup

Redundancy of the diskgroup       #of copies of voting disk           ( Minimum # of disks in the diskgroup)
External                                               1                                                  1
Normal                                                 3                                                  3
High                                                   5                                                  5

NORMAL redundancy : diskgroup require at least two failgroups.File will be copied to disks in both failgroups for 2 way mirroring.
                                     A normal redundancy diskgroup having voting disk should have minimum 3 disks in each failgroup.

HIGH   redundancy : diskgroup require at least three failgroups.File will be copied to disks in all the three failgroups for 3 way mirroring.
                                A high redundancy diskgroup  having voting disk should have at least 5 disks in each failgroup.

EXTERNAL REDUNDANCY : You cannot specify the FAILGROUP clause if you specify EXTERNAL REDUNDANCY.


NOTES :



$crsctl query css votedisk

Backup : Prior to 11g R2 , take backup using dd. 11gR2 onwards backup is taken automatically

Backup the voting disk file every time
- you add or remove a node from the cluster or
- immediately after you configure or upgrade a cluster.


You should have at least three voting disks.You can have up to a maximum of 15 voting disks
On ASM disk group , maximum voting disks that can be stored is 5 even if you have diskgroup with say 7 disks.The maximum number of voting disks that is supported is 15 which is posible only if you are using non-ASM storage


One storage: No mirroring - Create 2 Luns of 500Mb each


More than one storage: 1 Lun (500M) in each storage and Creating a diskgroup with normal redundancy.

+VOTE(Create 3 Luns of 500Mb each)   – Storing Voting files and OCR mirror : Voting files can be stored in only one diskgroup.

+CRS                                – Storing OCR and ASM Spfile.


If the voting disk was earlier on disk(external redundancy) and you want to move to test disk group (Normal redundancy), you need to have 3 disks inthe test disk group before moving the voting disk to test disk group.

$ crsctl replace votedisk +VOTE


To move voting disk , spfile or OCR
explaining-how to store ocr voting disks and asm spfile on asm diskgroup

$ crsctl query css votedisk
  To move  : $ crsctl replace votedisk +VOTE


$ asmcmd spget
  To move  : $ asmcmd spmove '+CRSTMP/tstcluster/ASMPARAMETERFILE/REGISTRY.253.772133609' '+CRS/tstcluster/spfileASM.ora'


# /u01/app/11.2.0/grid/bin/ocrcheck
  To move  : # /u01/app/11.2.0/grid/bin/ocrconfig -add +CRS
                    # /u01/app/11.2.0/grid/bin/ocrconfig -add +VOTE


NOTE : If using only one diskgroup , it's better to have four disk in disk group. Since 3 voting disks are required in normal redendancy ,even if one disk fails oracle will silently create the voting disk from the failed disk into the spare disk. http://blog.oracle-ninja.com/2012/01/voting-disk-redundancy-in-asm/


If using failgroups,we can drop ASM Disks that contain Voting Disks as long as there are enough Disks left in the Diskgroup to retain the same number of Voting Disks (each inside a separate Failure Group)



Voting disk 


1. Backup  voting disk 

10g :[root ]# ./crsctl query css votedisk
0. 0 /dev/raw/raw2
located 1 votedisk(s).

[root]# dd  if=/dev/raw/raw2  of=/backup/ocrvot_bkp/voting.dmp
41419+0 records in
41419+0 records out

11g :  11gR2 onwards voting disk contents are now backed up automatically in OCR


2. Adding voting disk 

10g : Shutdown the Oracle Clusterware and then do any modifications to add/delete/move
[ root # ]crsctl stop crs


[root]# ./crsctl query css votedisk
0. 0 /dev/raw/raw2

[root]# ./crsctl add css votedisk /dev/raw/raw4   ( Do not use -force , may corrupt OCR )
[root]# ./crsctl add css votedisk /dev/raw/raw5   ( Do not use -force , may corrupt OCR )

[root]# ./crsctl query css votedisk
0. 0 /dev/raw/raw2
1. 0 /dev/raw/raw4 ==>new addded Voting disk
2. 0 /dev/raw/raw5 ==>new addded Voting disk


11gR1 : Oracle Clusterware can be online while performin add/delete/move

 [root]# $GRID_HOME/bin/crsctl add css votedisk /dev/raw/raw5   
            or crsctl add css votedisk <cluster_fs/filename> 


11gR2 : Oracle Clusterware can be online while performin add/delete/move and can be stored in ASM also.
              With ASM diskgroup, no add/delete voting disk option available.
            The number of votedisk is determined by the diskgroup redundancy.To add move to higher redundancy.

3. Remove votedisk :

10g  : crsctl delete css votedisk /dev/raw/raw1 -force
11R1 : crsctl delete css votedisk /dev/raw/raw1 -force   or crsctl delete css votedisk <cluster_fs/filename>
11R2 : crsctl delete css votedisk <cluster_fs/filename>


4. Move votedisk :


10g  :  crsctl add css votedisk /dev/raw/raw4 -force    and then crsctl delete css votedisk /dev/raw/raw1 -force
11R1 :  crsctl add css votedisk /dev/raw/raw4           and then crsctl delete css votedisk /dev/raw/raw1
11R2 :  crsctl add css votedisk <cluster_fs/filename>   and then crsctl delete css votedisk <cluster_fs/filename>
11R2 :  crsctl replace votedisk +CRS

            $ crsctl replace votedisk +CRS      ( From external redundancy to Normal redundancy +CRS disk group )

          $ crsctl replace votedisk /rac_shared/oradata/vote.test3  ( From Normal redundancy +CRS disk group to external redundancy )

5. Restore Votedisk :

10g :      restore using dd command
11.2+ :   The voting disk contents are restored from a backup automatically when a new voting disk is added or replaced



OCR 


1. OCR Backup

10g:      [root]#$CRS_HOME/bin/ocrconfig  -export  /backup/ocrvot_bkp/ocr.dmp -s online
11gR1 :  $CRS_HOME/bin/ocrconfig  -manualbackup
11gR2 : CRSD automatically creates OCR backups every 4 hours, each full day and at the end of each week and  retains the last three copies of OCR.

2. Add ocr mirror
[root]# ./ocrconfig -add /dev/raw/raw3


Add OCRmirror:

10g  : ocrconfig -replace ocrmirror /dev/raw/raw2
11R1 : ocrconfig -replace ocrmirror /dev/raw/raw2
11R2 : ocrconfig -add +OCRVOTE2


3. Remove OCR

10g  : ocrconfig -replace ocr
11R1 : ocrconfig -replace ocr
11R2 : ocrconfig -delet +OCRVOTE2

4. Move OCR

10g  : ocrconfig -replace ocr /dev/sdd1 ocrconfig -replace ocrmirror /dev/raw/raw4
11R1 : ocrconfig -replace ocr ocrconfig -replace ocrmirror /dev/raw/raw4
11R2 : ocrconfig -replace  /cluster_file/ocr.dat  -replacement +OCRVOTE
            ocrconfig -replace  +CRS -replacement +OCRVOTE

5. Restore OCR :

ocrconfig -restore <path/filename of OCR backup>