Pages

RAC basics and components

RAC is a configuration where two or more instances are connected to one single shared physical database and users access the database from any of the available instances to retrieve and process data.

Components in RAC

  • Node
  • Interconnect
  • Shared Storage : ASM
  • Clusterware : CRS    Cluster Ready Services (CRS)  is oracle own clusterware. It consists of four processes (crsd, ocssd, evmd, and evmlogger)
                                      and two disks OCR and voting disk. It manages ASM , databases, instance and Oracle services.

     Each instance  has it's own  redo log files and undo tablespaces stored in the shared storage

https://dbasanthosh.wordpress.com/tag/what-is-ocssd/

Startup Sequence :

Clusterware needs ASM to access its key data structure: OCR and voting disk.
ASM needs clusterware up to access shared storage (where OCR and voting disk reside)

Oracle resolve this problem by starting some of  CRS components such as CSSD, CTSS before ASM, while other components such as CRSD,  EVEMD, ACFS are up after the ASM starts

1)      CSSD and CTSSD are up before ASM
2)      Votingdisks used by CSSD are discovered by reading the header of the disks, not through ASM
3)      Startup of CRS service has to wait until ASM instance is up and the diskgroup for OCR and votingdisk is mounted.



To ensure each Oracle RAC database instance obtains the block it requires to satisfy a query or transaction, Oracle RAC instances use two components:

Global Cache Service (GCS) : Maintaining coherency of the Buffer caches i.e allows only  one instance to modify a block at any single point in time.
Global Enqueue Service (GES) : Coordinates global locks between the instances in the cluster.

The GCS and GES maintain status of each data file and each cached block using a Global Resource Directory (GRD).Global Resource Directory (GRD) is the internal in-memory database
that stores the current status of the data blocks.The GRD contents are distributed across all of the active instances.

When a request is made for block image , there are

Requesting instance :  Instance that placed the request for block image to it's local LMS process.
Owning instance     :  Instance that currently own the block image
Mastering instance. :  Instance that own the metadata in GRD for the particular block number and file number




1. When a node start/restarts, OHASD is first to be started by O.S init (/etc/init.d/init.ohasd run)
   OHASD has access to the OLR (Oracle Local Registry) stored on the local file system. 
   OLR provides needed data to complete OHASD initialization.

2. OHASD brings up GPNPD and CSSD. 
   CSSD then access GPNP Profile stored on the local file system which contains
        a. ASM Diskgroup Discovery String
        b. ASM SPFILE location (Diskgroup name)
        c. Name of the ASM Diskgroup containing the Voting Files

3. CSSD access the Voting Files locations in the ASM Disk headers with well-known pointers and CSSD is able to complete initialization and start or join an existing cluster.

4. OHASD starts an ASM instance.
   The ASM instance locate the contents of the ASM SPFILE (uses special code).
   Diskgroups mounted and
   Access to OCR is now available


5. OHASD starts then CRSD ( with access to the OCR in an ASM Diskgroup.)

6. Clusterware completes initialization and brings up other services under its control.


init--ohad(/etc/init.d/init.ohasd run)--
(/etc/inittab  : /etc/init.d/init.crsd run   /etc/init.d/init.cssdd run  /etc/init.d/init.evmd run)
--- level 1
cssdagent -- ora.cssd
cssdmonitor -- ora.cssdmonitor
oraagent         -- (level 2)ora.mdnsd , ora.gipcd , ora.gpnpd , ora.asm ,ora.evmd   (dipae)
orarootagent -- (level 2)ora.crsd , ora.ctssd , ora.diskmo , ora.drivers.acfs    (rtd)
                                 |_ _  orarootagent -- (level 4) ora.nodep.vip , ora.scan.vip , ora.gns.vip , ora.gns , ora.netid.network, ora.rgistry.acfs
                                 |_ _  oraagent        -- (level 4)ora.DB.db , ora.asm , ora.dg , ora.listener, ora.scanid.lsnr , ora.DB.svc , ora.ons , ora.gsd


Grid Plug and Play (GPNPD)                     : ensure each node have the most recent profile.
Multicast Domain Name Service (mDNS) : Allows DNS requests.
Oracle Grid Naming Service (GNS)           : gateway between cluster mDNS and external DNS servers. The gnsd process performs name resolution within the cluster
Grid Interprocess Communication (GIPC): A helper daemon



OLR ( /etc/oracle/olr.loc) :
stores clusterware configuration,version information ,information about the local node only i.e local resources required by OHASd
not shared by any other node in the cluster. ,
OLR and GPnP profile are required to start the HA service stack.

# ocrdump -xml -local OLRDUMPFILE.root.xml


OCR (/etc/oracle/ocr.loc  bkup : $ORA_CRS_HOME/cdata/<cluster_name> ) :
Reside on shared disk , store information about the resources, their location, permissions, current value, type, state,etc.
help maintain dependencies in the startup of various resources e.g asm then DB then services
OCR cache is updated by the master CRSd proces , maintain local copy of OCR.
In 11g R2, you can have up to five OCR copies.
It's backup is taken by Master Node automatically every four hours. Backup location : GRID_HOME/cdata/<cluster name> directory.
The OCR is replicated across all the underlying disks of the diskgroup; so failure of a disk does not bring the failure of the diskgroup.

# ocrdump OCRDUMPFILE.root
# ocrdump -xml OCRDUMPFILE.root.xml


Commands :

#ocrcheck
ocrconfig –add +DATA
ocrconfig –replace /u01/app/oracle/ocr –replacement +DATA


How to restore OCR :

#ocrconfig -showbackup

#crsctl start crs -excl        -- only on one node .
Check if crsd is running  : # crsctl stop resource ora.crsd -init
                                          # crsctl start crs -excl -nocrs

#ocrconfig –restore {path_to_backup/backup_file_to_restore}





#crsctl stop crs        --  on all nodes
#crsctl start crs -- run on all nodes
#crsctl stop crs        -- #crsctl stop crs -f

#cluvfy comp ocr –n all -- verbose

# crsctl status resource -init -t


How it Starts :

1. When a node start/restarts, OHASD is first to be started by O.S INIT
   OHASD has access to the OLR (Oracle Local Registry) stored on the local file system.
   OLR provides needed data to complete OHASD initialization.

2. OHASD brings up  CSSD and GPNPD.
   CSSD then access GPNP Profile stored on the local file system which contains
        a. ASM Diskgroup Discovery String
        b. ASM SPFILE location (Diskgroup name)
        c. Name of the ASM Diskgroup containing the Voting Files

3. CSSD access the Voting Files locations by reading the ASM Disk headers with well-known pointers and CSSD is able to complete initialization and start or join an existing cluster.

4. OHASD starts an ASM instance.
   The ASM instance locate the contents of the ASM SPFILE (uses special code).
   Diskgroups mounted and
   Access to OCR is now available

5. OHASD starts then CRSD ( with access to the OCR in an ASM Diskgroup.)

6. Clusterware completes initialization and brings up other services under its control.



Wait Events :


CR disk read
CR immediate 2-way transfer
CR immediate 3-way transfer

LMS Lock Manager Server also called the GCS (Global Cache Services) process transport blocks across the nodes cache fusion supportability.


Current buffer :  Buffer contains latest changes but no pending transactions.This mode buffer that can be written to the disk

CR(Consistent Read buffers) :
Before shipping the block to requestor, remote instance will apply undo records to create a consistent version of the block.



gc current request : Buffer contains latest changes but no pending transactions.LGWR must complete a log flush before LMS process can send the block to the remote instance.

gc cr request   : Time to retrieve the data from the remote cache.Before shipping the block to requestor, remote instance will apply undo records to create a consistent version of the block.The more blocks requested from the buffer cache the more likelihood of a session having to wait for other sessions
gc cr disk read  : Same as gc cr request but inthis case undo records to create a consistent version of the block is not able available in the cache and thus  has to be read from the disk.

gc cr multiblock request :


gc buffer busy wait     : Session is trying to access a buffer,but there's open request for Global cache lock for that block already, and so, the session must wait for the GC lock request to complete before proceeding
gc buffer busy acquire  : when the global cache open request originated from the local instance i.e waiting for another process in the local instance itself to release the GC lock.
gc buffer busy release  : when the global cache open request originated from a remote instance i.e waiting for another session from remote instance to release the GC lock.
(In 10g, gc buffer busy  = gc buffer busy acquire + gc buffer busy release)

gc cr grant 2-way : Block not in any cache. Permission granted to read from the disk.
gc current grant 2-way : Block not in any cache. Permission granted to read from the disk.



gc cr block 2-way
gc cr block 3-way
gc cr block busy
gc cr block congested
gc cr block lost
gc cr block unknown
gc cr cancel
gc cr disk read
gc cr disk request
gc cr failure
gc cr grant 2-way
gc cr grant busy
gc cr grant congested
gc cr grant unknown
gc cr multi block request





Load Balance Advisory and FAN:
=========================

Oracle RAC constantly monitors the workload being  executed by each instance which is published to the Automatic Workload Repository and
 to the  application using FAN events to let them know of what percentage of connections can be directed to each instance or any state change.

Using FAN events from the load balancing advisory, the client connection pool will select the connection currently providing the best service.


For DOWN events, application interruption is minimized by cleaning up connections to the failed instance, in-flight transactions are
interrupted with an error returned to the application. Applications making connections are directed to
active instances only.

For UP events, new connections are created to allow the application to immediately take advantage of the
extra resources available.


Connection Load Balancing :
=====================

Clien Side balancing is achieved by using the SCAN on the address list of the client connect string. SQL*NET will randomly select one of the
SCAN ip addresses. If the server chosen is not available, the next server in the list is tried

On server side , each SCAN listener is aware of all instances in the cluster
providing each service. Based on goal defined for the service, the listener chooses the instance that will
best meet the goal and the connection is routed to that instance through the local listener.




























--------------------------------------------------------- Others ----------------------------------------------------------



Which is the Master Node


  • Master node has the least Node-id in the cluster.
  • Master node is responsible to initiate the OCR backup
  • In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster containing fewer no. of nodes is evicetd.But, in case both the sub-clusters have same no. of nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted


To know scann ocssd logs from various node :cat $ORACLE_HOME/log/host01/cssd/ocssd.log |grep ‘master node’ |tail -1


Which is Resource Master /Dynamic remastering

Before reading the block, a user process must request master node of the block to access that block.
Master instance keeps track of the state of the block.GCS can decide to give the mastership of the object to the instance that is heavily requesting blocks from the object.Excessive DRM cam lead to 'gcs drm freeze' which can also freese the Intances.

Before reading the block, a user process must request master node of the block to access that block.
Statistics ‘gc remote grants’ keeps track number of remote requests.

If the local node is the master of
that block, then GC affinity locks acquired, which is more efficient than remote grants.

Typically, a batch process will access many blocks aggressively. Performance can be improved if
those blocks are mastered in the local node avoiding costly remote grants. That is exactly what
‘dynamic remastering’ feature is designed to achieve. If an object is detected to be accessed
excessively from



_gc_policy_limit  and _gc_policy_time (10g:_gc_affinity_limit_gc_affinity_time) are undocumented parameter to control remastering.

_gc_affinity_limit  and _gc_affinity_time :

Cache Fusion reads  : One instance require the same block which has been just read by another instance and so it's in the memory

Cache Fusion writes : Block( previously changed by another instance) needs to be written to disk in response to a checkpoint  or cache aging.


Startup Sequence

Level 1: OHASD Spawns:
 cssdagent         
 Agent responsible for spawning CSSD.
Orarootagent-->CRSD-->crsdroot and crsdoraagent   
 Agent responsible for managing all root owned ohasd resources.
oraagent     -->    
 Agent responsible for managing all oracle owned ohasd resources.
cssdmonitor        
 Monitors CSSD and node health (along wth the cssdagent).
   
Level 2: OHASD rootagent spawns:
    CSDD (ora.cssd)     
 Cluster Synchronization Services
    CRSD(ora.crsd) ----> crsdroot and crsdoraagent   
 Primary daemon responsible for managing cluster resources.
    CTSSD(ora.ctssd)     
 Cluster Time Synchronization Services Daemon
    Diskmon(ora.diskmon)
    ACFS (ASM Cluster File System) Drivers
Level 2: OHASD oraagent spawns:
    MDNSD(ora.mdnsd)     
 Used for DNS lookup
    GIPCD(ora.gipcd)     
 Used for inter
    GPNPD(ora.gpnpd)     
 Grid Plug & Play Profile Daemon
    EVMD(ora.evmd)     
 Event Monitor Daemon
    ASM(ora.asm)     
 Resource for monitoring ASM instances





CRSD spawns: --> crsdroot and crsdoraagent   
    orarootagent     
 Agent responsible for managing all root owned crsd resources.
    oraagent         
 Agent responsible for managing all oracle owned crsd resources.

Level 4: CRSD rootagent spawns:

    Network resource    ora.net<id>.network
 To monitor the public network

    SCAN VIP(s)             ora.SCAN<id>.vip
 Single Client Access Name Virtual IPs

    Node VIPs                 ora.<nodename>.vip
 One per node

    ACFS Registery     
 For mounting ASM Cluster File System

    GNS VIP (optional)     
 VIP for GNS


Level 4: CRSD oraagent spawns:

    ASM Resouce     ora.asm
 ASM Instance(s) resource

    Diskgroup           ora.dg
 Used for managing/monitoring ASM diskgroups. 

    DB Resource       ora.DB.db
 Used for monitoring and managing the DB and instances

    SCAN Listener   ora.LISTENER_SCAN
 Listener for single client access name, listening on SCAN VIP

    Listener              ora.listener
 Node listener listening on the Node VIP

    Services              ora.database.svc
 Used for monitoring and managing services

    ONS                     ora.ons
 Oracle Notification Service

    eONS                   ora.eons
 Enhanced Oracle Notification Service

    GSD                     ora.gsd
 For 9i backward compatibility

    GNS (optional)     
 Grid Naming Service . Performs name resolution
Name of
the Process
OwnerFunctionalityComponent
ohasdinit,
root
Oracle
High Availability Service
Cluster Ready Services , crsdrootstart,
stop, monitor, and failover cluster resources using OCR and generate events on changes
Cluster
Ready Service (CRS)
ocssd,cssd monitor, cssdagentgrid
owner
node membership and notifications when a node joins or leave clusterCluster
Synchronization Service (CSS)
evmd, evmloggergrid
owner
Publishes Oracle Cluster ware eventsEvent
Manager (EVM)
octssdroottime synchronization between nodesCluster
Time Synchronization Service (CTSS)
ons, eonsgrid
owner
communicate Fast Application Notification (FAN) eventsOracle
Notification Service (ONS)
oragentgrid
owner
It
runs server callout scripts when FAN events occur
Oracle
Agent
orarootagentrootManage resources owned by root (network , VIP)Oracle
Root Agent
gnsdrootPerforms name resolution within the clusterGrid
Naming Service (GNS)
mdnsdgrid
owner
Allows DNS requestsMulticast
domain name service (mDNS)
diskmonMonitors and performs input/output fencing for Oracle Exadata Storage ServerDisk
Monitor daemon 
gpnpdgrid ownerGrid
Plug and Play (GPnP)
lmon enable cache fusion along with othersGlobal
Enqueue Service Monitor
lmd enable cache fusion along with othersGlobal
Enqueue Service Daemon
lms enable cache fusion along with othersGlobal
Cache Service Process
RBALCoordinates
the rebalance activity and Opens all device files as part of discovery
GMONManaging
the disk-level activities 
OnnnForming a pool of
connections to the ASM instance for exchanging messages
PZ9nFetching data from  GV$ views





Startup Sequence

Level 0 :  INIT starts OHASD


Level 1: OHASD Spawns:
    cssdagent          Agent responsible for spawning CSSD.
    cssdmonitor    Monitors CSSD and node health 
    OHASD oraagent           Agent responsible for managing all oracle owned ohasd resources
    OHASD orarootagent          Agent responsible for managing all root owned ohasd resources
   
Level 2: OHASD  oraagent spawns:
    MDNSD(ora.mdnsd)  Used for DNS lookup
    GIPCD(ora.gipcd)  Used for inter
    GPNPD(ora.gpnpd)  Grid Plug & Play Profile Daemon
    EVMD(ora.evmd) Event Monitor Daemon
    ASM(ora.asm)    Resource for monitoring ASM instances
Level 2: OHASD rootagent spawns:
     CSDD (ora.cssd)  Cluster Synchronization Services
     CRSD(ora.crsd)  Primary daemon responsible for managing cluster resources.
     CTSSD(ora.ctssd)  Cluster Time Synchronization Services Daemon 
     Diskmon(ora.diskmon)
    ACFS (ASM Cluster File System) Drivers  
Level 3: CRSD spawns:
    CRSD  orarootagent      Agent responsible for managing all root owned crsd resources.
    CRSD  oraagent          Agent responsible for managing all oracle owned crsd resources.
Level 4: CRSD rootagent spawns:
    Network resource    ora.net<id>.network To monitor the public network
    SCAN VIP(s)             ora.SCAN<id>.vip Single Client Access Name Virtual IPs
    Node VIPs                 ora.<nodename>.vip One per node
    ACFS Registery      For mounting ASM Cluster File System
    GNS VIP (optional)      VIP for GNS
Level 4: CRSD oraagent spawns:
    ASM Resouce     ora.asm ASM Instance(s) resource
    Diskgroup           ora.dg Used for managing/monitoring ASM diskgroups. 
    DB Resource       ora.DB.db Used for monitoring and managing the DB and instances
    SCAN Listener   ora.LISTENER_SCAN Listener for single client access name, listening on SCAN VIP
    Listener              ora.listener Node listener listening on the Node VIP
    Services              ora.database.svc Used for monitoring and managing services
    ONS                     ora.ons Oracle Notification Service
    eONS                   ora.eons Enhanced Oracle Notification Service
    GSD                     ora.gsd For 9i backward compatibility
    GNS (optional)      Grid Naming Service . Performs name resolution