Vip resource diagnostic



In $ORA_CRS_HOME/bin/racgvip script ,  CHECK_TIMES determine seconds in whichit should get response. If the gateway was slow in responding to a ping request , then racgvip would assume that the Interface is down and failover VIP

Action plan:

./runcluvfy.sh stage -post crsinst -n all  -verbose

./runcluvfy.sh stage -pre crsinst -n all  -verbose

or

cluvfy stage -post crsinst -n all -verbose

cluvfy stage -pre  crsinst -n all -verbose

1. Please upload the following logs of all two nodes:

$CRS_HOME/log/nodename/*.log
$CRS_HOME/log/nodename/crsd/*.log
$CRS_HOME/log/nodename/cssd/*.log
$CRS_HOME/log/nodename/racg/*.log             -- logfiles for VIP and ONS
$CRS_HOME/log/nodename/client/*.log
$CRS_HOME/log/nodename/evmd/*.log

/etc/oracle/oprocd/*.log.* or /var/opt/oracle/oprocd/*.log.* (If have)

$crs_stat –t
$crsctl check crs
$crsctl check boot






Vip relocated to the second sever :


Sometime you may find   vip relocated to second server

In this case ,  ping <vip> may work but ipconfig <vip> om Server1 will fail.



oracle@Server1> crsctl status resource -t

NAME           TARGET   STATE         SERVER         STATE_DETAILS       Local Resources
----------------------------------------------------------------------------------------------------------
ora.svrb1hr.vip 1   ONLINE   INTERMEDIATE svrb2hr     FAILED OVER      


In listener log  or while starting listener it may give

 TNS-00515: Connect failed because target host or object does not exist
   Linux Error: 99: Cannot assign requested address


In OS logs , you may see

kernel: igb: eth2 NIC Link is Down


In crs log , you may see

Received state change for ora.net1.network exadb01 1 [old state = ONLINE, new state = OFFLINE]

CRS-0215: Could not start resource 'ora.net1.network'.


Set State Details to [FAILED OVER] from [ ] for [ora.exadb01.vip 1 1]

CRS-2676: Start of 'ora.exadb01.vip' on 'server2' succeeded



Solution :

oracle@Server2>$CRS_HOME/bin/crs_relocate  ora.grac1.vip


or

Check current VIP status:

Server1 > $  crsctl status resource ora.grac1.vip
NAME=ora.grac1.vip
TYPE=ora.cluster_vip_net1.type
TARGET=ONLINE
STATE=INTERMEDIATE on grac2

Stop the VIP resource:
Server1 >$ crsctl stop resource ora.grac1.vip
CRS-2673: Attempting to stop 'ora.grac1.vip' on 'grac2'
CRS-2677: Stop of 'ora.grac1.vip' on 'grac2' succeeded

Start the VIP resource:
Server1 >$ crsctl start resource ora.grac1.vip
CRS-2672: Attempting to start 'ora.grac1.vip' on 'grac1'
CRS-2676: Start of 'ora.grac1.vip' on 'grac1' succeeded

Verify VIP resource:
Server1 > $  crsctl status resource ora.grac1.vip
NAME=ora.grac1.vip
TYPE=ora.cluster_vip_net1.type
TARGET=ONLINE
STATE=ONLINE on grac1



Data Collection

Please consult your sysadmin and make sure that the gateway is pingable all the time

1- test the gw on every node

consult your sysadmin to create a crontab unix shell script to ping the
gateway of your public interface every 2 seconds for example and the result is to be
spooled in /tmp/test_gw_.log

ping your gateway  and upload the ping log

2- increase the tracing level of the vip resource  as root user

  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res "ora.prorarac3.vip:5"

Once Vip goes down , change it back to original
 # crsctl debug log res "ora.prorarac3.vip"

3- restart the clusterware

4- execute this test on both nodes at the same time

   $ script /tmp/testvip_.log
   $ cd $ORA_CRS_HOME/bin
   $ hostname
   $ date
   $ cat /etc/hosts
   $ ifconfig -a
   $ oifcfg getif
   $ netstat -rn
   $ oifcfg iflist
   $ srvctl config nodeapps -n -a -g -s -l               (repeate it for all nodes)
   $ crs_stat –t
   $ exit

5- reset the tracing level of the vip resource  as root user

  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res
  # crsctl debug log res :1

Up on the next occurence, please upload the following information from all nodes

  a-  /tmp/test_gw_.log

  b- /tmp/testvip_.log

  c- the crsd log

  d. The resource racg
     $ORA_CRS_HOME/log//racg/vip*

  e. the racgvip script from
     $ORA_CRS_HOME/bin/racgvip

  f- RDA from all the nodes


   g- the o/s message file (From 11gR2, OS logs are part of diagcollection/TAF Linux, Solaris, HP-UX)
      IBM:     /bin/errpt -a > messages.out
      Linux:   /var/log/messages
      Solaris: /var/adm/messages


RAC11G - Change VIP IP


Current Status


[root@rac1 bin]# ./srvctl config nodeapps -n rac1 -a
VIP exists.:rac1
VIP exists.: /rac1-vip/192.168.2.111/255.255.255.0/eth0

[root@rac1 bin]# ./srvctl config nodeapps -n rac2 -a
VIP exists.:rac2
VIP exists.: /rac2-vip/192.168.2.112/255.255.255.0/eth0


[oracle@rac1 ~]$ srvctl config nodeapps -a
VIP exists.:rac1
VIP exists.: /rac1-vip/192.168.2.111/255.255.255.0/eth0
VIP exists.:rac2
VIP exists.: /rac2-vip/192.168.2.112/255.255.255.0/eth0
[oracle@rac1 ~]$

[oracle@rac1 ~]$ srvctl config nodeapps -a
VIP exists.:rac1
VIP exists.: /rac1-vip/192.168.2.111/255.255.255.0/eth0
VIP exists.:rac2
VIP exists.: /rac2-vip/192.168.2.112/255.255.255.0/eth0


Stop the VIP only or all nodeapps

[oracle@rac1 ~]$ srvctl stop vip -n rac1 -f
PRCC-1017 : rac1-vip was already stopped on rac1


or

srvctl stop database -d TEST
srvctl stop nodeapps -n rac1
srvctl stop nodeapps -n rac2

crs_stat -t



Modification 

[root@rac1 bin]# ./srvctl modify nodeapps -n rac1 -A 192.168.0.73/255.255.255.0/eth0
[root@rac1 bin]# ./srvctl modify nodeapps -n rac2 -A 192.168.0.74/255.255.255.0/eth0


Verification 


[root@rac1 bin]# ./srvctl config nodeapps -n rac1 -a
VIP exists.:rac1
VIP exists.: /rac1-vip/192.168.0.73/255.255.255.0/eth0

[root@rac1 bin]# ./srvctl config nodeapps -n rac2 -a
VIP exists.:rac2
VIP exists.: /rac2-vip/192.168.0.74/255.255.255.0/eth0

[root@rac1 bin]# ./srvctl config nodeapps  -a
VIP exists.:rac1
VIP exists.: /rac1-vip/192.168.0.73/255.255.255.0/eth0
VIP exists.:rac2
VIP exists.: /rac2-vip/192.168.0.74/255.255.255.0/eth0

[root@rac1 bin]# ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS    
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                      
               ONLINE  ONLINE       rac2                                      
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                      
               ONLINE  ONLINE       rac2                                      
ora.asm
               ONLINE  ONLINE       rac1                     Started          
               ONLINE  ONLINE       rac2                     Started          
ora.eons
               ONLINE  ONLINE       rac1                                      
               ONLINE  ONLINE       rac2                                      
ora.gsd
               OFFLINE OFFLINE      rac1                                      
               OFFLINE OFFLINE      rac2                                      
ora.net1.network
               ONLINE  ONLINE       rac1                                      
               ONLINE  ONLINE       rac2                                      
ora.ons
               ONLINE  ONLINE       rac1                                      
               ONLINE  ONLINE       rac2                                      
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                      
ora.oc4j
      1        OFFLINE OFFLINE                                                
ora.rac.db
      1        OFFLINE OFFLINE                                                
      2        OFFLINE OFFLINE                                                
ora.rac1.vip
      1        ONLINE  INTERMEDIATE rac2                     FAILED OVER      
ora.rac2.vip
      1        OFFLINE OFFLINE                                                
ora.scan1.vip
      1        ONLINE  OFFLINE                                                



If only Vip was stopped earlier ,  stop and restart listener resource

[root@rac1 bin]# ./crsctl stop resource ora.LISTENER.lsnr
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded


[root@rac1 bin]# ./crsctl stop resource ora.rac1.vip
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac2'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac2' succeeded

[root@rac1 bin]# ./crsctl stop resource ora.rac2.vip
CRS-2500: Cannot stop resource 'ora.rac2.vip' as it is not running
CRS-4000: Command Stop failed, or completed with errors.

[root@rac1 bin]# ./crsctl start resource ora.LISTENER.lsnr
CRS-2672: Attempting to start 'ora.rac2.vip' on 'rac2'
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac1'
CRS-2676: Start of 'ora.rac2.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac2'
CRS-2676: Start of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac1'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac2' succeeded
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded



[root@rac1 bin]# ./srvctl start vip -n rac1
PRKO-2420 : VIP is already started on node(s): rac1

[root@rac1 bin]# ./srvctl start vip -n rac2
PRKO-2420 : VIP is already started on node(s): rac2


If nodeapps were stopted, Restart the cluster services

#crsctl stop crs
#crsctl start crs

[root@rac1 bin]# ./crsctl stat res -t

Note 330358.1 - CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide    
Note 298895.1 - Modifying the default gateway address used by the Oracle 10g VIP
Note 399213.1 - VIP Going Offline Intermittantly - Slow Response from Default Gateway
Note 401783.1 - Changes in Oracle Clusterware after applying 10.2.0.3 Patchset