11.2中,如何手工kill所有的CRS进程而不导致主机重启?

联系:QQ(5163721)

标题:11.2中,如何手工kill所有的CRS进程而不导致主机重启?

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

我们都知道,在RAC环境中,如果kill ocssd.bin进程,会引起主机重启。
但是有时候系统已经异常了了,且CRS不能正常关闭,而主机可能是几年没重启的老系统,没人敢重启,现在怎么办?
我们只能尝试手工kill进程的方式,然后手工修复CRS(注意,在10.2 RAC中,只有3个d.bin进程)。
测试环境:操作系统是OEL 6.6

[root@lunar1 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.6
[root@lunar1 ~]# 
[root@lunar1 ~]# uname -a
Linux lunar1 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@lunar1 ~]# 

这套RAC的CRS版本是11.2.0.4:

[root@lunar1 ~]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]
[root@lunar1 ~]# crsctl query crs releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.4.0]
[root@lunar1 ~]# crsctl query crs softwareversion
Oracle Clusterware version on node [lunar1] is [11.2.0.4.0]
[root@lunar1 ~]# 

注意,由于12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。
查看当前CRS的状态:

[root@lunar1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.DATADG1.dg
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.DATADG2.dg
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.asm
               ONLINE  ONLINE       lunar1                   Started             
               ONLINE  ONLINE       lunar2                   Started             
ora.gsd
               OFFLINE OFFLINE      lunar1                                       
               OFFLINE OFFLINE      lunar2                                       
ora.net1.network
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.ons
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       lunar2                                       
ora.cvu
      1        ONLINE  ONLINE       lunar2                                       
ora.lunar.db
      1        ONLINE  ONLINE       lunar1                   Open                
      2        ONLINE  OFFLINE                               STARTING            
ora.lunar1.vip
      1        ONLINE  ONLINE       lunar1                                       
ora.lunar2.vip
      1        ONLINE  ONLINE       lunar2                                       
ora.oc4j
      1        ONLINE  ONLINE       lunar1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       lunar2                                       
[root@lunar1 ~]# 

查看当前所有的CRS进程:

[root@lunar1 ~]# ps -ef|grep d.bin
root      3860     1  0 19:31 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      3972     1  0 19:31 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      3983     1  0 19:31 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      3994     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
root      4004     1  0 19:31 ?        00:00:15 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      4007     1  0 19:31 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      4019     1  0 19:31 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      4032     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      4051     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      4063     1  0 19:31 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      4157     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      4180     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4343  4180  0 19:32 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5385     1  1 19:39 ?        00:00:17 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      5456     1  0 19:39 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      5473     1  0 19:39 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      5475     1  0 19:39 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle    7132     1  0 20:04 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7350  7273  0 20:04 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

这么多进程,他们的关系参见:11.2 RAC 的启动过程
好吧,我们开始模拟kill进程。首先kill 掉/u01/app/11.2.0.4/grid/bin/ohasd.bin(会自动重启,参见11.2 RAC 的启动过程

[root@lunar1 ~]# kill -9 3860
[root@lunar1 ~]# ps -ef|grep d.bin
grid      3983     1  0 19:31 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      3994     1  0 19:31 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      4007     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      4019     1  0 19:31 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      4032     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      4157     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      4180     1  0 19:31 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4343  4180  0 19:32 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5385     1  1 19:39 ?        00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      5456     1  0 19:39 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      5473     1  0 19:39 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      5475     1  0 19:39 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle    7132     1  0 20:04 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7490     1  0 20:06 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root      7534  2487 14 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  6 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  8 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  2 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  3 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      7676  7273  0 20:07 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

然后,我们kill cssdmonitor:

[root@lunar1 ~]# kill -9 4032
-bash: kill: (4032) - No such process
[root@lunar1 ~]# 

这里没有这个集成,表示cssdmonitor进程被重启过了:
(参见11.2 RAC 的启动过程

[root@lunar1 ~]# ps -ef|grep d.bin
grid      3983     1  0 19:31 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      3994     1  0 19:31 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      4007     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      4019     1  0 19:31 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      4157     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      4180     1  0 19:31 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4343  4180  0 19:32 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5385     1  1 19:39 ?        00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      5456     1  0 19:39 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      5473     1  0 19:39 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      5475     1  0 19:39 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle    7132     1  0 20:04 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7490     1  0 20:06 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root      7534  2487  3 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      7740  7273  0 20:07 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

上面进程启动时间在20:04~20:07之间的,都是被/u01/app/11.2.0.4/grid/bin/ohasd.bin进程重启后,自动后台重启的。
现在,我们kill mdnsd gpnpd gipcd osysmond。
这4个进程中,前面3个是CRS启动除了ohasd以外,最早启动的几个进程。
如果kill这些进程,ohasd都会重启的:

[root@lunar1 ~]# kill -9 3983 3994 4007 4019
[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      7490     1  0 20:06 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root      7534  2487  2 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      7756     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      7758     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
root      7776  7273  0 20:07 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

这里我们看到,刚才kill 的4 进程都没起来,怎么回事?
别急,还没到时间,ohasd需要check后才启动,O(∩_∩)O哈哈~
然后,我们kill 监听:

[root@lunar1 ~]# kill -9 6535 7490 
[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      7534  2487  2 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      7756     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      7758     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      7783     1  2 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      7785     1  2 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      7844     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      7853     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin
grid      7873     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
root      7874     1 14 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      7944  7873  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid      7979     1  9 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7982     1  3 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
oracle    7986     1  4 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      8001     1  3 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      8025  7979  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER
grid      8028  7979  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER_SCAN1
root      8083  7273  0 20:08 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

好吧,看看,刚才kill的进程都被重启了,11.2的RAC真强悍啊。
现在我们kill /etc/init.d/init.ohasd进程:

[root@lunar1 ~]# ps -ef|grep ohasd
root      2487     1  0 19:20 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      7534  2487  1 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
root      8191  7273  0 20:08 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# kill -9 2487 7534
[root@lunar1 ~]# ps -ef|grep ohasd
root      8239     1  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8257  8239  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8258  8257  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8267  7273  0 20:08 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# ps -ef|grep ohasd
root      8239     1  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8299  7273  0 20:08 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# 

这里我们看到的就是/etc/init.d/init.ohasd被系统自动重启的过程。这些信息会记录在/var/log/message/中:

[root@lunar1 ~]# tail -f /var/log/messages
Jan 24 19:45:31 lunar1 kernel: e1000 0000:00:03.0 eth0: Reset adapter
Jan 24 20:03:50 lunar1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jan 24 20:03:52 lunar1 kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jan 24 20:07:01 lunar1 clsecho: /etc/init.d/init.ohasd: ohasd is restarting 1/10.
Jan 24 20:07:01 lunar1 logger: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "restart"
Jan 24 20:08:26 lunar1 init: oracle-ohasd main process (2487) killed by KILL signal
Jan 24 20:08:26 lunar1 init: oracle-ohasd main process ended, respawning
Jan 24 20:13:58 lunar1 init: oracle-ohasd main process (8239) killed by KILL signal
Jan 24 20:13:58 lunar1 init: oracle-ohasd main process ended, respawning
Jan 24 20:14:12 lunar1 root: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "reboot"
^C
[root@lunar1 ~]# 

而且他进程都被自动重启了(注意这是crsd进程还没被重启):

[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      7756     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      7758     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      7783     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      7785     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      7844     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      7853     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin
grid      7873     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
root      7874     1  3 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      7944  7873  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid      7979     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7982     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
oracle    7986     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      8001     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      8119     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      8120     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
root      8321  8319  1 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has
root      8325  7273  0 20:08 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

现在我们依次kill:evmlogger.bin gpnpd.bin mdnsd.bin gipcd.bin evmd.bin oraagent.bin scriptagent.bin oraagent.bin orarootagent.bin和两个lisnterner

[root@lunar1 ~]# kill -9 7944 7756 7758 7783 7873 7979 7982 7986 8001 8119 8120
[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      7785     1  1 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      7844     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      8593  8591  0 20:09 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has
root      8597  7273  0 20:09 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

然后,kill osysmond.bin ologgerd cssdmonitor cssdagent :

[root@lunar1 ~]# kill -9 7785 7844 7588 7578  
[root@lunar1 ~]# 

好吧,现在就剩下一个ocssd.bin了:

[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      8629  7273  0 20:10 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

现在我们kill 传说中一旦被kill就会引起主机重启的进程 ocssd.bin :

[root@lunar1 ~]# kill -9 4063
[root@lunar1 ~]# 

好了,我们的系统都还好好的,没有重启,资源也都释放干净了:

[root@lunar1 ~]# ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00000000 0          root       600        1         
0x00000000 65537      root       600        1         

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

[root@lunar1 ~]# 
[root@lunar1 ~]# 

如果要恢复,很简单,只要直接重启crs就ok了:

[root@lunar1 ~]# ps -ef | grep -v grep|grep -E 'init|d.bin|ocls|evmlogger|UID'
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 19:20 ?        00:00:01 /sbin/init
root      2486     1  0 19:20 ?        00:00:00 /bin/sh /etc/init.d/init.tfa run
root      8924     1  0 20:13 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
[root@lunar1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@lunar1 ~]# ps -ef|grep ohasd
root      8924     1  0 20:13 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8968     1  4 20:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      9187  7273  0 20:14 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# 
[root@lunar1 ~]# ps -ef|grep d.bin
root      8968     1  0 20:14 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      9090     1  0 20:14 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      9101     1  0 20:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      9112     1  0 20:14 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
root      9122     1  0 20:14 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      9126     1  0 20:14 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      9139     1  0 20:14 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      9150     1  0 20:14 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      9169     1  0 20:14 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      9180     1  0 20:14 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      9212     1  1 20:14 ?        00:00:28 /u01/app/11.2.0.4/grid/bin/ologgerd -M -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      9340     1  0 20:18 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      9363     1  0 20:18 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/evmd.bin
root      9455     1  0 20:18 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      9532  9363  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid      9569     1  0 20:18 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      9572     1  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
root      9591     1  0 20:18 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      9682     1  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      9684     1  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
oracle    9774     1  0 20:19 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root     10642  7273  0 20:38 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]#
[root@lunar1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       lunar1                                       
ora.DATADG1.dg
               ONLINE  ONLINE       lunar1                                       
ora.DATADG2.dg
               ONLINE  ONLINE       lunar1                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       lunar1                                       
ora.asm
               ONLINE  ONLINE       lunar1                   Started             
ora.gsd
               OFFLINE OFFLINE      lunar1                                       
ora.net1.network
               ONLINE  ONLINE       lunar1                                       
ora.ons
               ONLINE  ONLINE       lunar1                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       lunar1                                       
ora.cvu
      1        ONLINE  ONLINE       lunar1                                       
ora.lunar.db
      1        ONLINE  ONLINE       lunar1                   Open                
      2        ONLINE  OFFLINE                                                   
ora.lunar1.vip
      1        ONLINE  ONLINE       lunar1                                       
ora.lunar2.vip
      1        ONLINE  INTERMEDIATE lunar1                   FAILED OVER         
ora.oc4j
      1        ONLINE  ONLINE       lunar1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       lunar1                                       
[root@lunar1 ~]# 

这里只显示了节点1,因为节点2我关闭了。
测试证明,只要先kill cssdmonitor 和 cssdagent进程(准确的说是cssagent,从那张CRS启动的经典大图上也可以看到这个关系),再kill ocssd.bin进程,系统是不会重启的。
另外,12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。

此条目发表在 RAC 分类目录,贴了 , , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注