edba: RAC basics and components

RAC is a configuration where two or more instances are connected to one single shared physical database and users access the database from any of the available instances to retrieve and process data.

Components in RAC

Node
Interconnect
Shared Storage : ASM
Clusterware : CRS Cluster Ready Services (CRS) is oracle own clusterware. It consists of four processes (crsd, ocssd, evmd, and evmlogger)
and two disks OCR and voting disk. It manages ASM , databases, instance and Oracle services.

Each instance has it's own redo log files and undo tablespaces stored in the shared storage

https://dbasanthosh.wordpress.com/tag/what-is-ocssd/

Startup Sequence :

Clusterware needs ASM to access its key data structure: OCR and voting disk.
ASM needs clusterware up to access shared storage (where OCR and voting disk reside)

Oracle resolve this problem by starting some of CRS components such as CSSD, CTSS before ASM, while other components such as CRSD, EVEMD, ACFS are up after the ASM starts

1) CSSD and CTSSD are up before ASM
2) Votingdisks used by CSSD are discovered by reading the header of the disks, not through ASM
3) Startup of CRS service has to wait until ASM instance is up and the diskgroup for OCR and votingdisk is mounted.

To ensure each Oracle RAC database instance obtains the block it requires to satisfy a query or transaction, Oracle RAC instances use two components:

Global Cache Service (GCS) : Maintaining coherency of the Buffer caches i.e allows only one instance to modify a block at any single point in time.
Global Enqueue Service (GES) : Coordinates global locks between the instances in the cluster.

The GCS and GES maintain status of each data file and each cached block using a Global Resource Directory (GRD).Global Resource Directory (GRD) is the internal in-memory database
that stores the current status of the data blocks.The GRD contents are distributed across all of the active instances.

When a request is made for block image , there are

Requesting instance : Instance that placed the request for block image to it's local LMS process.
Owning instance : Instance that currently own the block image
Mastering instance. : Instance that own the metadata in GRD for the particular block number and file number

1. When a node start/restarts, OHASD is first to be started by O.S init (/etc/init.d/init.ohasd run)

OHASD has access to the OLR (Oracle Local Registry) stored on the local file system.

OLR provides needed data to complete OHASD initialization.

2. OHASD brings up GPNPD and CSSD.

CSSD then access GPNP Profile stored on the local file system which contains

a. ASM Diskgroup Discovery String

b. ASM SPFILE location (Diskgroup name)

c. Name of the ASM Diskgroup containing the Voting Files

3. CSSD access the Voting Files locations in the ASM Disk headers with well-known pointers and CSSD is able to complete initialization and start or join an existing cluster.

4. OHASD starts an ASM instance.

The ASM instance locate the contents of the ASM SPFILE (uses special code).

Diskgroups mounted and

Access to OCR is now available

5. OHASD starts then CRSD ( with access to the OCR in an ASM Diskgroup.)

6. Clusterware completes initialization and brings up other services under its control.

init--ohad(/etc/init.d/init.ohasd run)--
(/etc/inittab : /etc/init.d/init.crsd run /etc/init.d/init.cssdd run /etc/init.d/init.evmd run)
--- level 1
cssdagent -- ora.cssd
cssdmonitor -- ora.cssdmonitor
oraagent -- (level 2)ora.mdnsd , ora.gipcd , ora.gpnpd , ora.asm ,ora.evmd (dipae)
orarootagent -- (level 2)ora.crsd , ora.ctssd , ora.diskmo , ora.drivers.acfs (rtd)
|_ _ orarootagent -- (level 4) ora.nodep.vip , ora.scan.vip , ora.gns.vip , ora.gns , ora.netid.network, ora.rgistry.acfs
|_ _ oraagent -- (level 4)ora.DB.db , ora.asm , ora.dg , ora.listener, ora.scanid.lsnr , ora.DB.svc , ora.ons , ora.gsd

Grid Plug and Play (GPNPD) : ensure each node have the most recent profile.
Multicast Domain Name Service (mDNS) : Allows DNS requests.
Oracle Grid Naming Service (GNS) : gateway between cluster mDNS and external DNS servers. The gnsd process performs name resolution within the cluster
Grid Interprocess Communication (GIPC): A helper daemon

OLR ( /etc/oracle/olr.loc) :
stores clusterware configuration,version information ,information about the local node only i.e local resources required by OHASd
not shared by any other node in the cluster. ,
OLR and GPnP profile are required to start the HA service stack.

# ocrdump -xml -local OLRDUMPFILE.root.xml

OCR (/etc/oracle/ocr.loc bkup : $ORA_CRS_HOME/cdata/<cluster_name> ) :
Reside on shared disk , store information about the resources, their location, permissions, current value, type, state,etc.
help maintain dependencies in the startup of various resources e.g asm then DB then services
OCR cache is updated by the master CRSd proces , maintain local copy of OCR.
In 11g R2, you can have up to five OCR copies.
It's backup is taken by Master Node automatically every four hours. Backup location : GRID_HOME/cdata/<cluster name> directory.
The OCR is replicated across all the underlying disks of the diskgroup; so failure of a disk does not bring the failure of the diskgroup.

# ocrdump OCRDUMPFILE.root
# ocrdump -xml OCRDUMPFILE.root.xml

Commands :

#ocrcheck
ocrconfig –add +DATA
ocrconfig –replace /u01/app/oracle/ocr –replacement +DATA

How to restore OCR :

#ocrconfig -showbackup

#crsctl start crs -excl -- only on one node .
Check if crsd is running : # crsctl stop resource ora.crsd -init
# crsctl start crs -excl -nocrs

#ocrconfig –restore {path_to_backup/backup_file_to_restore}

#crsctl stop crs -- on all nodes
#crsctl start crs -- run on all nodes
#crsctl stop crs -- #crsctl stop crs -f

#cluvfy comp ocr –n all -- verbose

# crsctl status resource -init -t

How it Starts :

1. When a node start/restarts, OHASD is first to be started by O.S INIT
OHASD has access to the OLR (Oracle Local Registry) stored on the local file system.
OLR provides needed data to complete OHASD initialization.

2. OHASD brings up CSSD and GPNPD.
CSSD then access GPNP Profile stored on the local file system which contains
a. ASM Diskgroup Discovery String
b. ASM SPFILE location (Diskgroup name)
c. Name of the ASM Diskgroup containing the Voting Files

3. CSSD access the Voting Files locations by reading the ASM Disk headers with well-known pointers and CSSD is able to complete initialization and start or join an existing cluster.

4. OHASD starts an ASM instance.
The ASM instance locate the contents of the ASM SPFILE (uses special code).
Diskgroups mounted and
Access to OCR is now available

5. OHASD starts then CRSD ( with access to the OCR in an ASM Diskgroup.)

6. Clusterware completes initialization and brings up other services under its control.

Wait Events :

CR disk read
CR immediate 2-way transfer
CR immediate 3-way transfer

LMS Lock Manager Server also called the GCS (Global Cache Services) process transport blocks across the nodes cache fusion supportability.

Current buffer : Buffer contains latest changes but no pending transactions.This mode buffer that can be written to the disk

CR(Consistent Read buffers) :
Before shipping the block to requestor, remote instance will apply undo records to create a consistent version of the block.

gc current request : Buffer contains latest changes but no pending transactions.LGWR must complete a log flush before LMS process can send the block to the remote instance.

gc cr request : Time to retrieve the data from the remote cache.Before shipping the block to requestor, remote instance will apply undo records to create a consistent version of the block.The more blocks requested from the buffer cache the more likelihood of a session having to wait for other sessions
gc cr disk read : Same as gc cr request but inthis case undo records to create a consistent version of the block is not able available in the cache and thus has to be read from the disk.

gc cr multiblock request :

gc buffer busy wait : Session is trying to access a buffer,but there's open request for Global cache lock for that block already, and so, the session must wait for the GC lock request to complete before proceeding
gc buffer busy acquire : when the global cache open request originated from the local instance i.e waiting for another process in the local instance itself to release the GC lock.
gc buffer busy release : when the global cache open request originated from a remote instance i.e waiting for another session from remote instance to release the GC lock.
(In 10g, gc buffer busy = gc buffer busy acquire + gc buffer busy release)

gc cr grant 2-way : Block not in any cache. Permission granted to read from the disk.
gc current grant 2-way : Block not in any cache. Permission granted to read from the disk.

gc cr block 2-way
gc cr block 3-way
gc cr block busy
gc cr block congested
gc cr block lost
gc cr block unknown
gc cr cancel
gc cr disk read
gc cr disk request
gc cr failure
gc cr grant 2-way
gc cr grant busy
gc cr grant congested
gc cr grant unknown
gc cr multi block request

Load Balance Advisory and FAN:
=========================

Oracle RAC constantly monitors the workload being executed by each instance which is published to the Automatic Workload Repository and
to the application using FAN events to let them know of what percentage of connections can be directed to each instance or any state change.

Using FAN events from the load balancing advisory, the client connection pool will select the connection currently providing the best service.

For DOWN events, application interruption is minimized by cleaning up connections to the failed instance, in-flight transactions are
interrupted with an error returned to the application. Applications making connections are directed to
active instances only.

For UP events, new connections are created to allow the application to immediately take advantage of the
extra resources available.

Connection Load Balancing :
=====================

Clien Side balancing is achieved by using the SCAN on the address list of the client connect string. SQL*NET will randomly select one of the
SCAN ip addresses. If the server chosen is not available, the next server in the list is tried

On server side , each SCAN listener is aware of all instances in the cluster
providing each service. Based on goal defined for the service, the listener chooses the instance that will
best meet the goal and the connection is routed to that instance through the local listener.

--------------------------------------------------------- Others ----------------------------------------------------------

Which is the Master Node

Master node has the least Node-id in the cluster.
Master node is responsible to initiate the OCR backup
In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster containing fewer no. of nodes is evicetd.But, in case both the sub-clusters have same no. of nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted

To know scann ocssd logs from various node :cat $ORACLE_HOME/log/host01/cssd/ocssd.log |grep ‘master node’ |tail -1

Which is Resource Master /Dynamic remastering

Before reading the block, a user process must request master node of the block to access that block.
Master instance keeps track of the state of the block.GCS can decide to give the mastership of the object to the instance that is heavily requesting blocks from the object.Excessive DRM cam lead to 'gcs drm freeze' which can also freese the Intances.

Before reading the block, a user process must request master node of the block to access that block.
Statistics ‘gc remote grants’ keeps track number of remote requests.

If the local node is the master of
that block, then GC affinity locks acquired, which is more efficient than remote grants.

Typically, a batch process will access many blocks aggressively. Performance can be improved if
those blocks are mastered in the local node avoiding costly remote grants. That is exactly what
‘dynamic remastering’ feature is designed to achieve. If an object is detected to be accessed
excessively from

_gc_policy_limit and _gc_policy_time (10g:_gc_affinity_limit_gc_affinity_time) are undocumented parameter to control remastering.

_gc_affinity_limit and _gc_affinity_time :

Cache Fusion reads : One instance require the same block which has been just read by another instance and so it's in the memory

Cache Fusion writes : Block( previously changed by another instance) needs to be written to disk in response to a checkpoint or cache aging.

Startup Sequence

Level 1: OHASD Spawns:
cssdagent		Agent responsible for spawning CSSD.
Orarootagent-->CRSD-->crsdroot and crsdoraagent		Agent responsible for managing all root owned ohasd resources.
oraagent -->		Agent responsible for managing all oracle owned ohasd resources.
cssdmonitor		Monitors CSSD and node health (along wth the cssdagent).


Level 2: OHASD rootagent spawns:
CSDD (ora.cssd)		Cluster Synchronization Services
CRSD(ora.crsd) ----> crsdroot and crsdoraagent		Primary daemon responsible for managing cluster resources.
CTSSD(ora.ctssd)		Cluster Time Synchronization Services Daemon
Diskmon(ora.diskmon)
ACFS (ASM Cluster File System) Drivers

Level 2: OHASD oraagent spawns:
MDNSD(ora.mdnsd)		Used for DNS lookup
GIPCD(ora.gipcd)		Used for inter
GPNPD(ora.gpnpd)		Grid Plug & Play Profile Daemon
EVMD(ora.evmd)		Event Monitor Daemon
ASM(ora.asm)		Resource for monitoring ASM instances

CRSD spawns: --> crsdroot and crsdoraagent
orarootagent		Agent responsible for managing all root owned crsd resources.
oraagent		Agent responsible for managing all oracle owned crsd resources.

	Level 4: CRSD rootagent spawns:
	Network resource ora.net<id>.network		To monitor the public network
	SCAN VIP(s) ora.SCAN<id>.vip		Single Client Access Name Virtual IPs
	Node VIPs ora.<nodename>.vip		One per node
	ACFS Registery		For mounting ASM Cluster File System
	GNS VIP (optional)		VIP for GNS

	Level 4: CRSD oraagent spawns:
	ASM Resouce ora.asm		ASM Instance(s) resource
	Diskgroup ora.dg		Used for managing/monitoring ASM diskgroups.
	DB Resource ora.DB.db		Used for monitoring and managing the DB and instances
	SCAN Listener ora.LISTENER_SCAN		Listener for single client access name, listening on SCAN VIP
	Listener ora.listener		Node listener listening on the Node VIP
	Services ora.database.svc		Used for monitoring and managing services
	ONS ora.ons		Oracle Notification Service
	eONS ora.eons		Enhanced Oracle Notification Service
	GSD ora.gsd		For 9i backward compatibility
	GNS (optional)		Grid Naming Service . Performs name resolution

Name of the Process	Owner	Functionality	Component
ohasd	init, root		Oracle High Availability Service
Cluster Ready Services , crsd	root	start, stop, monitor, and failover cluster resources using OCR and generate events on changes	Cluster Ready Service (CRS)
ocssd,cssd monitor, cssdagent	grid owner	node membership and notifications when a node joins or leave cluster	Cluster Synchronization Service (CSS)
evmd, evmlogger	grid owner	Publishes Oracle Cluster ware events	Event Manager (EVM)
octssd	root	time synchronization between nodes	Cluster Time Synchronization Service (CTSS)
ons, eons	grid owner	communicate Fast Application Notification (FAN) events	Oracle Notification Service (ONS)
oragent	grid owner	It runs server callout scripts when FAN events occur	Oracle Agent
orarootagent	root	Manage resources owned by root (network , VIP)	Oracle Root Agent
gnsd	root	Performs name resolution within the cluster	Grid Naming Service (GNS)
mdnsd	grid owner	Allows DNS requests	Multicast domain name service (mDNS)
diskmon		Monitors and performs input/output fencing for Oracle Exadata Storage Server	Disk Monitor daemon
gpnpd	grid owner		Grid Plug and Play (GPnP)
lmon		enable cache fusion along with others	Global Enqueue Service Monitor
lmd		enable cache fusion along with others	Global Enqueue Service Daemon
lms		enable cache fusion along with others	Global Cache Service Process
RBAL		Coordinates the rebalance activity and Opens all device files as part of discovery
GMON		Managing the disk-level activities
Onnn		Forming a pool of connections to the ASM instance for exchanging messages
PZ9n		Fetching data from GV$ views

Startup Sequence

Level 0 : INIT starts OHASD

Level 1: OHASD Spawns:
cssdagent	Agent responsible for spawning CSSD.
cssdmonitor	Monitors CSSD and node health
OHASD oraagent	Agent responsible for managing all oracle owned ohasd resources
OHASD orarootagent	Agent responsible for managing all root owned ohasd resources


Level 2: OHASD oraagent spawns:
MDNSD(ora.mdnsd)	Used for DNS lookup
GIPCD(ora.gipcd)	Used for inter
GPNPD(ora.gpnpd)	Grid Plug & Play Profile Daemon
EVMD(ora.evmd)	Event Monitor Daemon
ASM(ora.asm)	Resource for monitoring ASM instances

Level 2: OHASD rootagent spawns:
CSDD (ora.cssd)	Cluster Synchronization Services
CRSD(ora.crsd)	Primary daemon responsible for managing cluster resources.
CTSSD(ora.ctssd)	Cluster Time Synchronization Services Daemon
Diskmon(ora.diskmon)
ACFS (ASM Cluster File System) Drivers

Level 3: CRSD spawns:
CRSD orarootagent	Agent responsible for managing all root owned crsd resources.
CRSD oraagent	Agent responsible for managing all oracle owned crsd resources.

Level 4: CRSD rootagent spawns:
Network resource ora.net<id>.network	To monitor the public network
SCAN VIP(s) ora.SCAN<id>.vip	Single Client Access Name Virtual IPs
Node VIPs ora.<nodename>.vip	One per node
ACFS Registery	For mounting ASM Cluster File System
GNS VIP (optional)	VIP for GNS

Level 4: CRSD oraagent spawns:
ASM Resouce ora.asm	ASM Instance(s) resource
Diskgroup ora.dg	Used for managing/monitoring ASM diskgroups.
DB Resource ora.DB.db	Used for monitoring and managing the DB and instances
SCAN Listener ora.LISTENER_SCAN	Listener for single client access name, listening on SCAN VIP
Listener ora.listener	Node listener listening on the Node VIP
Services ora.database.svc	Used for monitoring and managing services
ONS ora.ons	Oracle Notification Service
eONS ora.eons	Enhanced Oracle Notification Service
GSD ora.gsd	For 9i backward compatibility
GNS (optional)	Grid Naming Service . Performs name resolution

Pages

RAC basics and components