联系:QQ(5163721)
标题:RAC环境下删除了/var/tmp/.oracle/的临时文件,有什么后果,以及如何处理
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
测试目的: 模拟RAC环境下有人误操作,删除了/var/tmp/.oracle/*下的oracle临时文件(删除Network Socket File)
测试过程:观察会有什么后果,以及如何处理。
.
测试环境:OEL 6.6 ,Oracle 11.2.0.4 Standalone(单实例使用ASM的环境)
如果是RAC,测试结论应该大体一致(机制类似)。
[root@lunarlib rootwork]# cat /etc/oracle-release Oracle Linux Server release 6.6 [root@lunarlib rootwork]# [root@lunarlib rootwork]# uname -a Linux lunarlib 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux [root@lunarlib rootwork]#
在Linux平台上,RAC或者HAS(单实例使用ASM的环境,比如standalone或者我们说的Oracle Restart)使用的Network Socket File在/var/tmp/.oracle/*文件:
(在其他平台(比如, AIX HPUX等等)Network Socket File可能在:ls -lrt /tmp/.oracle/* /tmp/.oracle 或者 /usr/tmp/.oracle)
[root@lunarlib etc]# ls -lrt /var/tmp/.oracle/* prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasd srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sprocr_local_conn_0_PROL srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/slunarlibDBG_OHASD srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_EVMD srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2 srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1 srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sAevm srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sCevm srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_CSSD srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_ srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1 srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost [root@lunarlib etc]#
使用crsctl stop has -f停止has,然后就可以直接删除/var/tmp/.oracle/* 下面的Network Socket File:
[root@lunarlib rootwork]# crsctl stop has -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'lunarlib' CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'lunarlib' CRS-2673: Attempting to stop 'ora.CRSDG.dg' on 'lunarlib' CRS-2673: Attempting to stop 'ora.lunardb.db' on 'lunarlib' CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'lunarlib' succeeded CRS-2677: Stop of 'ora.lunardb.db' on 'lunarlib' succeeded CRS-2673: Attempting to stop 'ora.DATADG1.dg' on 'lunarlib' CRS-2673: Attempting to stop 'ora.DATADG2.dg' on 'lunarlib' CRS-2677: Stop of 'ora.DATADG1.dg' on 'lunarlib' succeeded CRS-2677: Stop of 'ora.DATADG2.dg' on 'lunarlib' succeeded CRS-2677: Stop of 'ora.CRSDG.dg' on 'lunarlib' succeeded CRS-2679: Attempting to clean 'ora.CRSDG.dg' on 'lunarlib' CRS-2681: Clean of 'ora.CRSDG.dg' on 'lunarlib' succeeded CRS-2673: Attempting to stop 'ora.asm' on 'lunarlib' CRS-2677: Stop of 'ora.asm' on 'lunarlib' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'lunarlib' CRS-2677: Stop of 'ora.cssd' on 'lunarlib' succeeded CRS-2673: Attempting to stop 'ora.evmd' on 'lunarlib' CRS-2677: Stop of 'ora.evmd' on 'lunarlib' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'lunarlib' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasd srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2 srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1 -rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock -rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.2 srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.1 -rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock -rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock -rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.2 srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.1 srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/slunarlibDBG_OHASD srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_EVMD srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_CSSD srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sAevm srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sCevm srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sOCSSD_LL_lunarlib_ srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1 srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost [root@lunarlib rootwork]# [root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/* [root@lunarlib rootwork]# ll /var/tmp/.oracle total 0 [root@lunarlib rootwork]# crsctl start has CRS-4123: Oracle High Availability Services has been started. [root@lunarlib rootwork]#
如果/var/tmp/.oracle目录不存在,可以手工重建:
[root@lunarlib rootwork]# mkdir /var/tmp/.oracle [root@lunarlib rootwork]# ll /var/tmp/.oracle total 0 [root@lunarlib rootwork]# crsctl start has CRS-4123: Oracle High Availability Services has been started. [root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep d.bin grid 5177 1 1 18:12 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 5306 1 1 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent grid 5311 1 1 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 5339 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 5341 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 5356 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 5387 5339 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 5400 5264 0 18:14 pts/1 00:00:00 grep d.bin [root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* prw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/npohasd -rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/slunarlibDBG_OHASD -rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.2 srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.1 srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_EVMD srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_CSSD -rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_ srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sAevm srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sCevm -rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1 -rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost [root@lunarlib rootwork]#
如果在has正常运行的状态下删除上述oracle临时文件,那么数据库可以使用,但是不能正常关闭:
[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/* [root@lunarlib rootwork]# ll /var/tmp/.oracle/* ls: cannot access /var/tmp/.oracle/*: No such file or directory [root@lunarlib rootwork]# ll /var/tmp/.oracle/ total 0 [root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep ohasd root 2877 1 0 17:12 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run grid 5177 1 0 18:12 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot root 5653 5264 0 18:21 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]# ps -ef|grep ohasd root 2877 1 0 17:12 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run grid 5177 1 0 18:12 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot root 5660 5264 0 18:23 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]# ps -ef|grep d.bin grid 5177 1 0 18:12 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 5306 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent grid 5311 1 0 18:14 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 5339 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 5341 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 5356 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 5387 5339 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 5662 5264 0 18:23 pts/1 00:00:00 grep d.bin [root@lunarlib rootwork]# crsctl status res -t CRS-4639: Could not contact Oracle High Availability Services CRS-4000: Command Status failed, or completed with errors. [root@lunarlib rootwork]#
可以看到,这时,crs通信异常了。
我们看下数据库:
[oracle@lunarlib work]$ ss SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:17 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning and Automatic Storage Management options SYS@lunardb>alter system switch logfile; System altered. Elapsed: 00:00:00.14 SYS@lunardb>alter system checkpoint; System altered. Elapsed: 00:00:00.06 SYS@lunardb>shutdown immediate ORA-29701: unable to connect to Cluster Synchronization Service SYS@lunardb>exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning and Automatic Storage Management options [oracle@lunarlib work]$
这里看到数据库可以正常使用,但是不能关闭,关闭是报错:不能跟CSS进程通信。
[oracle@lunarlib work]$ ss SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:46 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning and Automatic Storage Management options SYS@lunardb>
数据库的alert显示为:
Mon Jan 11 18:26:37 2016 Shutting down instance (immediate) Stopping background process SMCO Shutting down instance: further logons disabled [oracle@lunarlib trace]$ cat lunardb_ora_22027.trc Trace file /u01/app/oracle/diag/rdbms/lunardb/lunardb/trace/lunardb_ora_22027.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning and Automatic Storage Management options ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1 System name: Linux Node name: lunarlib Release: 3.8.13-44.1.1.el6uek.x86_64 Version: #2 SMP Wed Sep 10 06:10:25 PDT 2014 Machine: x86_64 Instance name: lunardb Redo thread mounted by this instance: 1 Oracle process number: 23 Unix process pid: 22027, image: oracle@lunarlib (TNS V1-V3) *** 2016-01-11 18:26:37.174 *** SESSION ID:(135.10871) 2016-01-11 18:26:37.174 *** CLIENT ID:() 2016-01-11 18:26:37.174 *** SERVICE NAME:(SYS$USERS) 2016-01-11 18:26:37.174 *** MODULE NAME:(sqlplus@lunarlib (TNS V1-V3)) 2016-01-11 18:26:37.174 *** ACTION NAME:() 2016-01-11 18:26:37.174 Stopping background process SMCO *** 2016-01-11 18:26:38.176 kgxgncin: CLSS init failed with status 3 kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS NOTE: kfmsInit: ASM failed to initialize group services [oracle@lunarlib trace]$
检查一下oarcle的进程:
[oracle@lunarlib trace]$ ps -ef|grep ora_ oracle 5495 1 0 18:14 ? 00:00:00 ora_pmon_lunardb oracle 5497 1 0 18:14 ? 00:00:00 ora_psp0_lunardb oracle 5504 1 4 18:14 ? 00:00:36 ora_vktm_lunardb oracle 5508 1 0 18:14 ? 00:00:00 ora_gen0_lunardb oracle 5510 1 0 18:14 ? 00:00:00 ora_diag_lunardb oracle 5512 1 0 18:14 ? 00:00:00 ora_dbrm_lunardb oracle 5514 1 0 18:14 ? 00:00:00 ora_dia0_lunardb oracle 5516 1 0 18:14 ? 00:00:00 ora_mman_lunardb oracle 5518 1 0 18:14 ? 00:00:00 ora_dbw0_lunardb oracle 5520 1 0 18:14 ? 00:00:00 ora_lgwr_lunardb oracle 5522 1 0 18:14 ? 00:00:00 ora_ckpt_lunardb oracle 5524 1 0 18:14 ? 00:00:00 ora_smon_lunardb oracle 5526 1 0 18:14 ? 00:00:00 ora_reco_lunardb oracle 5528 1 0 18:14 ? 00:00:00 ora_rbal_lunardb oracle 5530 1 0 18:14 ? 00:00:00 ora_asmb_lunardb oracle 5532 1 0 18:14 ? 00:00:00 ora_mmon_lunardb oracle 5536 1 0 18:14 ? 00:00:00 ora_mmnl_lunardb oracle 5540 1 0 18:14 ? 00:00:00 ora_mark_lunardb oracle 5568 1 0 18:14 ? 00:00:00 ora_arc0_lunardb oracle 5570 1 0 18:14 ? 00:00:00 ora_arc1_lunardb oracle 5572 1 0 18:14 ? 00:00:00 ora_arc2_lunardb oracle 5574 1 0 18:14 ? 00:00:00 ora_arc3_lunardb oracle 5583 1 0 18:14 ? 00:00:00 ora_qmnc_lunardb oracle 5611 1 0 18:14 ? 00:00:00 ora_q000_lunardb oracle 5613 1 0 18:14 ? 00:00:00 ora_q001_lunardb oracle 6691 6657 0 18:29 pts/4 00:00:00 grep ora_ oracle 22988 1 0 18:26 ? 00:00:00 ora_o000_lunardb oracle 23012 1 0 18:26 ? 00:00:00 ora_o001_lunardb [oracle@lunarlib trace]$
使用shutdown abort关闭数据库:
SYS@lunardb>shutdown abort ORACLE instance shut down. SYS@lunardb>exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning and Automatic Storage Management options [oracle@lunarlib work]$ [oracle@lunarlib trace]$ ps -ef|grep ora_ oracle 6709 6657 0 18:31 pts/4 00:00:00 grep ora_ [oracle@lunarlib trace]$
alert显示:
Mon Jan 11 18:30:38 2016 Shutting down instance (abort) License high water mark = 5 USER (ospid: 26332): terminating the instance Instance terminated by USER, pid = 26332 Mon Jan 11 18:30:38 2016 Instance shutdown complete
这时,如果数据库再次启动就会报错:
[oracle@lunarlib work]$ ss SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:31:50 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SYS@lunardb>startup ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATADG1/lunardb/spfilelunardb.ora' ORA-17503: ksfdopn:2 Failed to open file +DATADG1/lunardb/spfilelunardb.ora ORA-29701: unable to connect to Cluster Synchronization Service SYS@lunardb>
而此时has的其他进程是存在的,只是/var/tmp/.oracle/* 下面的网络socket文件不在了:
[root@lunarlib rootwork]# ll /var/tmp/.oracle/* ls: cannot access /var/tmp/.oracle/*: No such file or directory [root@lunarlib rootwork]# ps -ef|grep ohasd root 2877 1 0 17:12 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run grid 5177 1 0 18:12 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot root 6723 4677 0 18:33 pts/0 00:00:00 grep ohasd [root@lunarlib rootwork]# ps -ef|grep d.bin grid 5177 1 0 18:12 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 5306 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent grid 5339 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 5341 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 5356 1 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 5387 5339 0 18:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 6725 4677 0 18:33 pts/0 00:00:00 grep d.bin [root@lunarlib rootwork]# /u01/app/11.2.0.4/grid/log/lunarlib/ohasd下的ohasd.log中有如下信息: 2016-01-11 18:28:09.091: [ CRSCOMM][406906624] IpcL: connection to member 9 has been removed 2016-01-11 18:28:09.091: [CLSFRAME][406906624] Removing IPC Member:{Relative|Node:0|Process:9|Type:3} 2016-01-11 18:28:09.091: [CLSFRAME][406906624] Disconnected from AGENT process: {Relative|Node:0|Process:9|Type:3} 2016-01-11 18:28:09.092: [ AGFW][333440768]{0:0:132} Agfw Proxy Server received process disconnected notification, count=1 2016-01-11 18:28:09.092: [ AGFW][333440768]{0:0:132} /u01/app/11.2.0.4/grid/bin/oraagent_grid disconnected. 2016-01-11 18:28:09.092: [ AGFW][333440768]{0:0:132} Agent /u01/app/11.2.0.4/grid/bin/oraagent_grid[5311] stopped! 2016-01-11 18:28:09.092: [ CRSCOMM][333440768]{0:0:132} IpcL: removeConnection: Member 9 does not exist in pending connections. 2016-01-11 18:28:09.093: [ AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:28:09.093: [ AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:3 2016-01-11 18:28:09.095: [ CRSPE][322934528]{0:0:133} Disconnected from server: 2016-01-11 18:28:09.098: [ AGFW][333440768]{0:0:132} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:31:39.112: [ INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) : Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:31:39.112: [ AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid because pid is not initialized 2016-01-11 18:31:39.112: [ AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:31:39.112: [ AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:5 2016-01-11 18:31:39.119: [ AGFW][333440768]{0:0:132} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:35:09.131: [ INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) : Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:35:09.131: [ AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid because pid is not initialized 2016-01-11 18:35:09.131: [ AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid 2016-01-11 18:35:09.131: [ AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:7 2016-01-11 18:35:09.137: [ AGFW][333440768]{0:0:132} Starting the HB [Interval = 30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid
此时,使用crsctl stop has -f不能停止has服务:
[root@lunarlib rootwork]# crsctl stop has -f CRS-4544: Unable to connect to OHAS CRS-4000: Command Stop failed, or completed with errors. [root@lunarlib rootwork]#
reboot是比较好的选择。那么,如果主机不方便reboot,怎么办呢?
不能重启主机,咱们可以手工处理相关问题。首先,手工清理所有has的进程的网络通讯socket临时文件:
[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/* [root@lunarlib rootwork]# ll /var/tmp/.oracle/ total 0 [root@lunarlib rootwork]# ps -ef|grep d.bin grid 4332 1 0 18:40 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 4560 1 0 18:42 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent grid 4566 1 0 18:42 ? 00:00:11 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 4591 1 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 4594 1 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 4603 1 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 4639 4591 0 18:42 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 4994 4305 0 19:02 pts/1 00:00:00 grep d.bin [root@lunarlib rootwork]# ps -ef|grep ohasd root 2882 1 0 18:40 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run grid 4332 1 0 18:40 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot root 4996 4305 0 19:02 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]# crsctl status res -t CRS-4639: Could not contact Oracle High Availability Services CRS-4000: Command Status failed, or completed with errors. [root@lunarlib rootwork]# [root@lunarlib rootwork]# crsctl stop has -f CRS-4544: Unable to connect to OHAS CRS-4000: Command Stop failed, or completed with errors. [root@lunarlib rootwork]#
这时正常的停止has的命令都不能使用了,因为进程间通讯的socket文件被我们删除了。
但是我们可以kill他们:
[root@lunarlib rootwork]# kill -9 4332 4560 4566 4591 4594 4603 4639 2882 4332 [root@lunarlib rootwork]# ps -ef|grep d.bin root 15575 4305 0 19:04 pts/1 00:00:00 grep d.bin [root@lunarlib rootwork]# ps -ef|grep ohasd root 15548 1 0 19:04 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 15580 4305 0 19:04 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]# kill -9 15548 [root@lunarlib rootwork]# ps -ef|grep ohasd root 15581 1 0 19:04 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 15608 4305 0 19:04 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep d.bin root 15623 4305 0 19:04 pts/1 00:00:00 grep d.bin [root@lunarlib rootwork]# [root@lunarlib rootwork]# /etc/init.d/init.ohasd stop -f [root@lunarlib rootwork]# ps -ef|grep ohasd root 15581 1 0 19:04 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 15650 4305 0 19:05 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]# /etc/init.d/init.ohasd stop [root@lunarlib rootwork]# ps -ef|grep ohasd root 15581 1 0 19:04 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 15672 4305 0 19:05 pts/1 00:00:00 grep ohasd [root@lunarlib rootwork]#
在我的测试中,has环境下,一次kill所有进程主机都没有重启(在rac环境下,kill ocssd.bin可能会引起主机重启):
[root@lunarlib rootwork]# ipcs -ma ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status ------ Semaphore Arrays -------- key semid owner perms nsems 0x00000000 0 root 600 1 0x00000000 65537 root 600 1 ------ Message Queues -------- key msqid owner perms used-bytes messages [root@lunarlib rootwork]#
然后手工重启has:
[root@lunarlib rootwork]# crsctl start has CRS-4123: Oracle High Availability Services has been started. [root@lunarlib rootwork]# [root@lunarlib rootwork]# ps -ef|grep ohasd root 15581 1 0 19:04 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run grid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot root 15817 4520 0 19:09 pts/0 00:00:00 tail -f ohasd.log root 15935 15908 0 19:10 pts/2 00:00:00 grep ohasd [root@lunarlib rootwork]# ps -ef|grep d.bin root 15806 4305 0 19:09 pts/1 00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start has grid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 15851 1 0 19:09 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 15937 15908 0 19:10 pts/2 00:00:00 grep d.bin [root@lunarlib rootwork]#
随着has的启动,它自己创建了新的网络通讯socket文件:
[root@lunarlib rootwork]# ll /var/tmp/.oracle total 0 prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasd srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASD srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11 -rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL -rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock [root@lunarlib rootwork]# ps -ef|grep d.bin root 15806 4305 0 19:09 pts/1 00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start has grid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 15851 1 0 19:09 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 15940 15908 0 19:11 pts/2 00:00:00 grep d.bin [root@lunarlib rootwork]# ps -ef|grep d.bin grid 15811 1 1 19:09 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 15947 1 0 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent grid 15952 1 1 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 15977 1 0 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 15980 1 1 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 15994 1 1 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 16026 15980 0 19:11 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 16040 15908 0 19:11 pts/2 00:00:00 grep d.bin [root@lunarlib rootwork]# ll /var/tmp/.oracle total 0 prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasd srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.1 srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.2 srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sAevm srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sCevm srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_CSSD srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_EVMD srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASD srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_ srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost -rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost_lock -rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib__lock srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11 -rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1 -rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL -rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sSYSTEM.evm.acceptor.auth [root@lunarlib rootwork]#
现在has全部启动正常了:
[root@lunarlib rootwork]# crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRSDG.dg ONLINE ONLINE lunarlib ora.DATADG1.dg ONLINE ONLINE lunarlib ora.DATADG2.dg ONLINE ONLINE lunarlib ora.LISTENER.lsnr ONLINE ONLINE lunarlib ora.asm ONLINE ONLINE lunarlib Started ora.ons OFFLINE OFFLINE lunarlib -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.cssd 1 ONLINE ONLINE lunarlib ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE ONLINE lunarlib ora.lunardb.db 1 ONLINE ONLINE lunarlib Open [root@lunarlib rootwork]#
总结,如果RAC或者HAS下
1,在Linux平台上,Network Socket File在/var/tmp/.oracle/目录下。在其他平台,可能的目录有:/tmp/.oracle/*, /tmp/.oracle 或者 /usr/tmp/.oracle
2,如果CRS或者HAS没有启动,删除oracle临时文件(Network Socket File),在CRS重启后会自动重新创建,没有不良影响。
3,如果CRS或者HAS已经启动并正常运行中,删除oracle临时文件,不影响数据库运行,但是数据库不能正常关闭(可以abort,但是不能启动)
4,如果出现了上面的情况3,CRS不能关闭(包括使用-f选项),只能手工清理共享内存段和kill 进程。在HAS中,kill ocssd.bin进程不会造成主机重启。但是在RAC环境下kill ocssd.bin进程会造成主机重启。
5,如果完成了上面的情况4,只需要重启CRS或者HAS就可以了。