11.2 RAC 的启动过程

联系:QQ(5163721)

标题:11.2 RAC 的启动过程

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

从11.2 GI(Grid Infrastructure)开始,RAC的结构跟10.2有翻天覆地的变化,在MOS的经典文档“11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”中有详细的解释。
其中有一副经典大图可以一目了然的告诉我们这些d.bin进程之间的依赖关系(也就是启动和关闭,谁启动重启谁等等):


11.2_clusterware


从CRS的启动过程,我们也可以清晰的看到进程的启动顺序。
下面是一个11.2.0.3环境的CRS启动过程:

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  4 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4338     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      4342     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
root      4348     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
root      4370     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4428  3507  0 20:37 pts/2    00:00:00 grep d.bin
[root@dm01db01 ~]# 

最先启动的是/u01/app/11.2.0.3/grid/bin/ohasd.bin ,他后面呆着reboot,表示它被kill后会被自动reboot。
/etc/init.d/init.ohasd进程就是重启/u01/app/11.2.0.3/grid/bin/ohasd.bin进程的守护进程。
他们的内容都来源于$GRID_HOME/crs/init/init.ohasd,后续blog会模拟丢失这个文件到处理,这里不赘述了。
会自动启动这个进程,并在/var/log/message中记录下这个启动过程。
/u01/app/11.2.0.3/grid/bin/ohasd.bin被kill 后,,系统会有几分钟的重启服务的时间,/var/log/message中记录下这个启动过程:

Jan 11 20:36:18 lunarlib clsecho: /etc/init.d/init.ohasd: ohasd is restarting 1/10.
Jan 11 20:36:18 lunarlib logger: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunarlib_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "restart"

这个重启的过程在空闲系统大概需要不到2分钟,$GRID_HOME/`hostname -s`/alert`hostname -s`.log中会ohasd.bin被kill和重启后执行检查(check)和恢复(recovery)各种资源的日志如下:

2016-01-11 20:36:18.500:
[/u01/app/11.2.0.4/grid/bin/cssdagent(16784)]CRS-5822:Agent '/u01/app/11.2.0.4/grid/bin/cssdagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:5:31} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oracssdagent_grid/oracssdagent_grid.log.
2016-01-11 20:36:18.504:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(16852)]CRS-5822:Agent '/u01/app/11.2.0.4/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:7:7} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:18.789:
[ohasd(17048)]CRS-2112:The OLR service started on node lunarlib.
2016-01-11 20:36:18.796:
[ohasd(17048)]CRS-1301:Oracle High Availability Service started on node lunarlib.
2016-01-11 20:36:49.574:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.CRSDG.dg'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:49.583:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.DATADG1.dg'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:49.594:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.DATADG2.dg'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:51.608:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:37:52.943:
[ohasd(17048)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2016-01-11 20:37:52.943:
[ohasd(17048)]CRS-2769:Unable to failover resource 'ora.diskmon'.

好了,继续回到我们刚才的启动过程的讨论。接下来,我们看到orarootagent.bin cssdagent cssdmonitor不见了,增加 mdnsd.bin

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  4 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1 10 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
root      4452  3507  0 20:37 pts/2    00:00:00 grep d.bin            
[root@dm01db01 ~]# 

然后是增加了 ocssd.bin gpnpd.bin orarootagent.bin gipcd.bin osysmond.bin cssdmonitor cssdagent diskmon.bin

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  5 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  5 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  5 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4571  3507  4 20:37 pts/2    00:00:00 grep d.bin  
[root@dm01db01 ~]# 

然后是增加了 ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01。
ologgerd(Cluster Logger Service)进程是随着11.2.0.2安装过程自动安装的(11.2.0.2的新特性,以前的版本需要单独下载和安装),属于Cluster Health Monitor(以下简称CHM)组件。
CHM主要用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。
CHM会自动安装在下面的软件:
11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
.
注意上面的osysmond.bin进程跟这里的ologgerd(Cluster Logger Service)进程进程是CHM的主要工作进程。
osysmond会将每个节点的资源使用情况发送给ologgerd(Cluster Logger Service),然后ologgerd将会把所有节点的信息都接收并保存到CHM的资料库。
而这个CHM的资料库在11.2是缺省保存在$GRID_HOME/crf/db/`hostname -s`目录下,大概需要1G的空间。
在12.1.0.1,CHM的资料库是单独保存在GI的数据库中,在安装时可以选择是否安装GIMR(Grid Infrastructure Management Repository )。
在12.1.0.2,CHM的资料库还是单独保存在GI的数据库中,但是GIMR(Grid Infrastructure Management Repository )已经是必选项了。
在12.2,GIMR(Grid Infrastructure Management Repository )使用的数据库MGMTDB可以选择是否跟CRS放在一个磁盘组,还是单独放在一个磁盘组中。
.

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  3 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4590     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4614  3507  0 20:38 pts/2    00:00:00 grep d.bin    
[root@dm01db01 ~]# 

继续看下面的启动过程。
在启动ocssd.bin以后,就会启动 octssd.bin :

[root@dm01db01 ~]# 
[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4590     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4685     1  4 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
root      4694  3507  0 20:38 pts/2    00:00:00 grep d.bin     
[root@dm01db01 ~]# 

接下来,启动evmd.bin:

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4590     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4685     1  1 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  2 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      4732  3507  0 20:38 pts/2    00:00:00 grep d.bin 
[root@dm01db01 ~]# 

然后是crsd.bin 和 tnslsnr:

[root@dm01db01 ~]#
[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4685     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      5080     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      5100     1  1 20:39 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid      5189  4710  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid      5229     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      5242     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      5368     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      5376     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle    5466     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      5487     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
root      5897  3507  0 20:40 pts/2    00:00:00 grep d.bin
[root@dm01db01 ~]# 

当crsd.bin启动后,就可以使用crsctl status res -t来查看CRS状态了。
如果crsd.bin没启动,那么需要使用crsctl status res -t -init查看。

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4685     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      5080     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      5100     1  0 20:39 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid      5189  4710  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid      5229     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      5242     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      5368     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      5376     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle    5466     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      5487     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
grid      6061  5487  0 20:41 ?        00:00:00 /bin/sh /u01/app/11.2.0.3/grid/bin/oc4jctl check
grid      6072  6061  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/perl/bin/perl /u01/app/11.2.0.3/grid/bin/oc4jctl.pl check 8888
grid      6086  5229  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER_SCAN1
grid      6088  5229  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER
root      6096  3507  0 20:41 pts/2    00:00:00 grep d.bin     
[root@dm01db01 ~]# 

最后启动了lsnrctl 和 oc4jctl ,至此,CRS启动完毕。

此条目发表在 RAC 分类目录,贴了 , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注