Failure of Voting disk
Move voting disk to OCR_VOTE
=============================
CORRUPTION : Create the new diskgroup since the current diskgroup is corrupted
1. Stop on all nodes and start in exclusive mode(root user)
# crsctl stop crs -f
# crsctl start crs -excl -nocrs
2. Start using pfile
SQL>startup pfile='/u01/app/oracle/init+ASM1.ora';
If all the voting disks are corrupted, create the new diskgroup
CREATE DISKGROUP OCR_VOTE NORMAL REDUNDANCY
FAILGROUP controller01 DISK '/dev/asm-ocr_vote1'
FAILGROUP controller02 DISK '/dev/asm-ocr_vote2'
FAILGROUP controller03 DISK '/dev/asm-ocr_vote3'
ATTRIBUTE
'au_size'='1M',
'compatible.asm' = '12.1';
SQL> ! srvctl start diskgroup -g ocr_vote -n node2 --mount diskgroup on other nodes
or
ASMCMD>lsdg
mount data
$GRID_HOME/bin/crsctl query css votedisk -- check current location
$GRID_HOME/bin/crsctl replace votedisk +OCR_VOTE -- moves to OCR_VOTE
$GRID_HOME/bin/crsctl query css votedisk
Else , if we have the surviving copyy (i.e only 2 out of 3 are corrupted ) , create the new diskgroup and drop the old diskgroup
CREATE DISKGROUP OCR_VOTE NORMAL REDUNDANCY
FAILGROUP controller01 DISK '/dev/asm-ocr_vote1'
FAILGROUP controller02 DISK '/dev/asm-ocr_vote2'
FAILGROUP controller03 DISK '/dev/asm-ocr_vote3'
ATTRIBUTE
'au_size'='1M',
'compatible.asm' = '12.1';
$GRID_HOME/bin/crsctl query css votedisk -- check current location
$GRID_HOME/bin/crsctl replace votedisk +OCR_VOTE -- moves to OCR_VOTE
$GRID_HOME/bin/crsctl query css votedisk
SQL>drop diskgroup test force including contents;
3. Stop and start (root user) on all nodes
# crsctl stop crs -f
# crsctl start crs -- Run on other nodes as well
#crsctl start cluster -all
# $GRID_HOME/bin/crsctl status resource –t
Failure of OCR
Corruption : Restore OCR
===============================
1. Check if corrupted
# ocrcheck
2. Stop and start in exclusive mode(root user)
# crsctl stop crs -f
# crsctl start crs -excl -nocrs
3. Check OCR location
$ cat /etc/oracle/ocr.loc
$GRID_HOME /log/<hostname>/client/ocrcheck_<pid>.log
4. Check latest OCR backup
$GRID_HOME\bin\ocrconfig –showbackup
5. Restore as a root user
# ocrconfig -restore $GRID_HOME/cdata/bhurac/backup00.ocr
6. Stop and start (root user)
# crsctl stop crs -f
# crsctl start crs
# $GRID_HOME/bin/crsctl status resource –t
7. Check for corruption
# ocrcheck
No Corruption : Create new OCR_VOTE diskgroup and move OCR from +CRS_TMP to OCR_VOTE
====================================================================
/u01/app/12.1.0/grid/bin/ocrcheck
It shows Device/File Name : +CRS_TMP
/u01/app/12.1.0/grid/bin/ocrconfig -add +OCR_VOTE
# /u01/app/12.1.0/grid/bin/ocrcheck
It shows Device/File Name : +CRS_TMP and
and shows Device/File Name : +OCR_VOTE
/u01/app/12.1.0/grid/bin/ocrconfig -delete +CRS_TMP
/u01/app/12.1.0/grid/bin/ocrcheck
It will now show only Device/File Name : +OCR_VOTE
Check on other modes as well with ocrcheck command.
Failure of VIP : VIP Failover
Failure of disk
======================== case I : ASM detects read/write error ==================================
ASM detects READ_ERRS/WRITE_ERRS and updates these columns in v$asm_disk for the ASM disk
1. Check for failed disk
select path,name,mount_status,header_status from v$asm_disk where WRITE_ERRS > 0
select path,name,mount_status,header_status from v$asm_disk where READ_ERRS > 0;
Note : header_status column may still be shown as "MEMBER"
2. Drop the disk
alter diskgroup #name# drop disk #disk name#;
state,power,group_number,EST_MINUTES from v$asm_operation;
Run until no rows returned
Note : Physically remove the disk only after the header_status for the failed disk becomes "FORMER"
3. Add new disk
SELECT NVL(a.name, '[CANDIDATE]') disk_group_name , b.path disk_file_path, b.name disk_file_name , b.failgroup disk_file_fail_group
FROM v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
ORDER BY a.name;
ALTER DISKGROUP testdb_data1 ADD FAILGROUP controller1 DISK '/dev/raw/raw5'
FAILGROUP controller2 DISK '/dev/raw/raw6' REBALANCE POWER 11;
OR
select distinct header_status from v$asm_disk where name = '/dev/sdk1'; (New disk must show as CANDIDATE)
select state,power,group_number,EST_MINUTES from v$asm_operation;
Run until no rows returned
================================ case II : ASm drop disk =============================================
When ASM drop the disk on its own, in the ASM alert log it will give alerts as below
ORA-27061: waiting for async I/Os failed
WARNING: IO Failed. subsys:System dg:0, diskname:/dev/sds1
ASM will automatically rebalnce the data which can be checked using
select state,power,group_number,EST_MINUTES from v$asm_operation;
Failure of Node : Node Eviction
Failure of Instance : Instance Recovery