单实例数据库转换为RAC数据库–手工转换

测试目的: 单实例数据库转换为RAC数据库
测试环境:Oracle 11.2.0.4
测试方法:手工转换

.
首先,安装一套RAC环境,并把单实例数据库通过通过rman还原到这个环境(通常如果是生产环境,我们会搭建从RAC到单实例数据库的ADG,以减少停机时间)。
然后生成一个源库(单实例数据库)spfile:

startup pfile=/home/oracle/lunar/spfile.lunardb.tmp
08:07:25 sys@lunardb>show parameter spfile
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string
08:08:38 sys@lunardb>

注意检查tnsnames.ora中用于local_listener参数的两个配置条目是否正确:

LISTENER_RAC1=(ADDRESS = (PROTOCOL = TCP)(HOST = dm0101-vip.lunar.com)(PORT = 1521))        
LISTENER_RAC2=(ADDRESS = (PROTOCOL = TCP)(HOST = dm0102-vip.lunar.com)(PORT = 1521))

修改刚才备份的pfile文件(/home/oracle/lunar/spfile.lunardb.tmp),添加RAC相关配置:

*.audit_file_dest='/u01/app/oracle/admin/lunardb/adump'
*.audit_trail='NONE'
*.compatible='11.2.0.3.0'
*.control_files='+DATADG/lunardb/control01.ctl','+DATADG/lunardb/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_files=2000
*.db_name='lunardb'
*.db_recovery_file_dest='+RECODG'
*.db_recovery_file_dest_size=336870912000
*.db_unique_name='lunardb'
*.deferred_segment_creation=FALSE
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=lunardbXDB)'
*.log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=lunardb'
*.log_archive_dest_state_1='ENABLE'
*.open_cursors=500
*.pga_aggregate_target=1059552256
*.processes=500
*.remote_login_passwordfile='EXCLUSIVE'
*.resource_manager_plan=''
*.session_cached_cursors=2000
*.sga_target=0
*.standby_file_management='AUTO'
*.undo_tablespace='UNDOTBS1'
#*.use_large_pages='ONLY'
*.db_cache_size=3g 
*.shared_pool_size=3g
*.streams_pool_size=28m
*.java_pool_size=200m
*.log_buffer=37108864
*.job_queue_processes=20
*.cluster_database=true
*.cluster_database_instances=2
*.undo_management=AUTO
*.db_create_file_dest='+DATADG'
*.db_create_online_log_dest_1='+RECODG'
lunardb1.undo_tablespace='UNDOTBS1'
lunardb2.undo_tablespace='UNDOTBS2'
lunardb1.instance_number=1
lunardb2.instance_number=2
lunardb1.instance_name=lunardb1
lunardb2.instance_name=lunardb2
lunardb1.thread=1
lunardb2.thread=2
lunardb1.local_listener=LISTENER_RAC1
lunardb2.local_listener=LISTENER_RAC2

使用这个pfile启动数据库:

08:26:59 @>startup pfile=/home/oracle/lunar/spfile.lunardb.tmp
ORACLE instance started.

Total System Global Area       6881869824 bytes
Fixed Size                        2266064 bytes
Variable Size                  3573550128 bytes
Database Buffers               3221225472 bytes
Redo Buffers                     84828160 bytes
Database mounted.
Database opened.
08:27:30 @>

添加thread2:

08:27:30 @>alter database add logfile thread 2
08:28:16   2    group 17  ('+RECODG') size 1024m,
08:28:16   3    group 18  ('+RECODG') size 1024m,
08:28:16   4    group 19  ('+RECODG') size 1024m,
        group 20  ('+RECODG') size 1024m,
08:28:16   5  08:28:16   6      group 21  ('+RECODG') size 1024m,
08:28:16   7    group 22  ('+RECODG') size 1024m,
        group 23  ('+RECODG') size 1024m,
08:28:16   8  08:28:16   9      group 24  ('+RECODG') size 1024m,
        group 25  ('+RECODG') size 1024m,
08:28:16  10  08:28:16  11      group 26  ('+RECODG') size 1024m,
08:28:16  12    group 27  ('+RECODG') size 1024m,
        group 28  ('+RECODG') size 1024m,
08:28:16  13  08:28:16  14      group 29  ('+RECODG') size 1024m,
08:28:16  15    group 30  ('+RECODG') size 1024m,
08:28:16  16    group 31  ('+RECODG') size 1024m,
08:28:16  17    group 32  ('+RECODG') size 1024m
;08:28:16  18  

Database altered.

Elapsed: 00:00:28.51
08:28:46 @>

添加实例2的undo表空间:

08:28:46 @>CREATE UNDO TABLESPACE UNDOTBS2 DATAFILE '+DATADG' SIZE 10480m ;        

Tablespace created.

Elapsed: 00:00:09.87
08:29:11 @>

启用实例2(thread2):

08:29:11 @>alter database enable public thread 2; 

Database altered.

Elapsed: 00:00:00.59
08:29:29 @>
1
创建spfile:

08:33:58 @>create spfile='+DATADG' from pfile='/home/oracle/lunar/spfile.lunardb.tmp';

File created.

Elapsed: 00:00:00.21
08:35:30 @>

使用grid用户查看:

ASMCMD> ls
spfile.3296.878718931
ASMCMD> pwd
+datadg/lunardb/PARAMETERFILE
ASMCMD> 

修改initlunardb1.ora 文件:

[oracle@dm01db01 dbs]$ cat initlunardb1.ora 
SPFILE='+datadg/lunardb/PARAMETERFILE/spfile.3296.878718931'
[oracle@dm01db01 dbs]$ 

检查数据库:

08:42:07 sys@lunardb>show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATADG/lunardb/parameterfile/spf
                                                 ile.3296.878718931
08:42:11 sys@lunardb>show parameter cluster

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2
cluster_interconnects                string
08:42:14 sys@lunardb>

执行@?/rdbms/admin/catclust.sql,这个过程按照文档即可,没啥说的,执行完了检查日志,看看是否有报错。
然后启动数据库,检查2个数据库实例是否都正常了

SYS@lunardb2>startup
ORACLE instance started.

Total System Global Area       6881869824 bytes
Fixed Size                        2266064 bytes
Variable Size                  3573550128 bytes
Database Buffers               3221225472 bytes
Redo Buffers                     84828160 bytes
Database mounted.
Database opened.
SYS@lunardb2>select * from v$active_instances;

     INST_NUMBER INST_NAME
---------------- ------------------------------------------------------------------------------------------------------------------------
               1 dm01db01.lunar.com:lunardb1
               2 dm01db02.lunar.com:lunardb2

Elapsed: 00:00:00.00
SYS@lunardb2>
SYS@lunardb2>show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +DATADG/lunardb/parameterfile/spf
                                                 ile.3296.878718931
SYS@lunardb2>

把数据库实例添加到CRS中:

[oracle@dm01db01 ~]$ srvctl config database -d lunardb
PRCD-1120 : The resource for database lunardb could not be found.
PRCR-1001 : Resource ora.lunardb.db does not exist
[oracle@dm01db01 ~]$ 
[oracle@dm01db01 ~]$ srvctl add database -d lunardb -o /u01/app/oracle/product/11.2.0.4/dbhome_1
[oracle@dm01db01 ~]$ srvctl add instance -d lunardb -n dm01db01 -i lunardb1
[oracle@dm01db01 ~]$ srvctl add instance -d lunardb -n dm01db02 -i lunardb2
[oracle@dm01db01 ~]$ srvctl config database -d lunardb
Database unique name: lunardb
Database name: 
Oracle home: /u01/app/oracle/product/11.2.0.4/dbhome_1
Oracle user: oracle
Spfile: 
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: lunardb
Database instances: lunardb1,lunardb2
Disk Groups: 
Mount point paths: 
Services: 
Type: RAC
Database is administrator managed
[oracle@dm01db01 ~]$ 
[oracle@dm01db01 ~]$ srvctl modify database -d lunardb -a DATADG,RECODG
[oracle@dm01db01 ~]$ srvctl modify database -d lunardb -p '+DATADG/lunardb/parameterfile/spfile.3296.878718931'
[oracle@dm01db01 ~]$ srvctl config database -d lunardb
Database unique name: lunardb
Database name: 
Oracle home: /u01/app/oracle/product/11.2.0.4/dbhome_1
Oracle user: oracle
Spfile: +DATADG/lunardb/parameterfile/spfile.3296.878718931
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: lunardb
Database instances: lunardb1,lunardb2
Disk Groups: DATADG,RECODG
Mount point paths: 
Services: 
Type: RAC
Database is administrator managed
[oracle@dm01db01 ~]$ 
发表在 Database, RAC | 标签为 | 留下评论

使用ass.awk脚本分析systemstate生成的trace文件(LTOM431版本自带了ass109.awk脚本)

在以前,很多客户和朋友曾经各种寻找ass109.awk脚本,用意分析systemstate生成的trace文件。
因为最初ass109.awk文件是Oracle内部一个老外大牛个人写脚本,还不算是Oracle公司产品化的东西,以为不能提供支持。
在LTOM431版本中,已经自带了ass109.awk脚本:

sftp> lpwd
E:/CRT-temp/tools/ltom431/ltom/tom_base/tom/src
sftp> pwd
/rootwork
sftp> lls 
ass109.awk            dumptracebuffer.sh    errorstack.sh
flush.sh              goiostat.sh           govmstat.sh
ha.sh                 haLevel.sh            hfile.sh
initiostat.sh         initvmstat.sh         ksdwrf.sh
lc2.sh                lc4.sh                liststatspack.sh
ltom.jar              ltomHeader.txt        ltommail.sh
monlatch.lst          pha.sh                prelim.sh
prelimss.sh           prha.sh               processstate.sh
prss.sh               pss.sh                rhaLevel.sh
rss.sh                runfile.sh            session_statspack.sh
spcpkg.lis            spctab.lis            spcusr.lis
sprepins.sql          ss.sh                 statspack.sh
testee.sh             tom92spreport.sql     tomsprepins.sql
tomspreport.sql       topaix.sh             tracebufferoff.sh
tracebufferon.sh
sftp> 

ass.awk的使用方法:

D:\>awk -f ass109.awk edw_ora_8371.trc
..........................
Ass.Awk Version 1.0.9 - Processing edw_ora_8371.trc
System State 1
~~~~~~~~~~~~~~~~
1:                                     
2:  waiting for 'rdbms ipc message'     wait
3:  waiting for 'pmon timer'            wait
4:  waiting for 'rdbms ipc message'     wait
5:  waiting for 'rdbms ipc message'     wait
6:  waiting for 'rdbms ipc message'     wait
7:  waiting for 'rdbms ipc message'     wait
8:                                     
9:  waiting for 'rdbms ipc message'     wait
10:                                    
11: waiting for 'rdbms ipc message'     wait
12: for 'Streams AQ: waiting for time management or cleanup tasks' wait
13: waiting for 'rdbms ipc message'     wait
14: waiting for 'Streams AQ: qmn coordinator idle wait' wait
15: waiting for 'rdbms ipc message'     wait
16: waiting for 'Wait for shrink lock'  wait		
17: waiting for 'smon timer'            wait
18: waiting for 'SQL*Net message from client' wait
19: waiting for 'rdbms ipc message'     wait
21: waiting for 'rdbms ipc message'     wait
23: waiting for 'rdbms ipc message'     wait
25: waiting for 'SQL*Net message from client' wait
27: waiting for 'SQL*Net message from client' wait
29: last wait for 'ksdxexeotherwait'   [Rcache object=4f4e57138,]	
     Cmd: Insert
30: waiting for 'Streams AQ: qmn slave idle wait' wait
33: for 'Streams AQ: waiting for messages in the queue' wait
Blockers
~~~~~~~~
       Above is a list of all the processes. If they are waiting for a resource
       then it will be given in square brackets. Below is a summary of the
       waited upon resources, together with the holder of that resource.
       Notes:
       ~~~~~
        o A process id of '???' implies that the holder was not found in the
          systemstate.
                    Resource Holder State
    Rcache object=4f4e57138,    16: waiting for 'Wait for shrink lock'	
Object Names
~~~~~~~~~~~~
Rcache object=4f4e57138,                                  
 
77807 Lines Processed.

从上面的分析可以看出:

Rcache object=4f4e57138,    16: waiting for 'Wait for shrink lock'		

表示当前数据库的blocker是数据库的16号进程(systemstate中的pid 16),他持有了对象4f4e57138的Row Cache锁,而pid 16正在等待:Wait for shrink lock。

29: last wait for 'ksdxexeotherwait'   [Rcache object=4f4e57138,]	
     Cmd: Insert

而这里清晰的指出,29号进程最后的等待是在4f4e57138对象,目前处于挂起状态,无法执行INSERT操作,阻塞他的正式pid 16

不过LTOM目前已经不再开发了,他的很多功能集成到ASH里面了:

IMPORTANT: LTOM is a legacy tool and is no longer being developed. 
Much of the functionality is now available within the Active Session History (ASH). 
For details see: Document 243132.1 Analysis of Active Session History (Ash) Online and Offline
发表在 troubleshoooting | 标签为 , , | 留下评论

11.2和12c RAC的ohasd守护进程在不同Linux版本的演变

前面我们已经讲解过11.2 RAC的启动过程,可以注意到,RAC的根守护进程是/etc/init.d/init.ohasd,那么不同版本的Linux中/etc/init.d/init.ohasd是如何启动的呢?
注意:12.1的非Flex Cluster启动过程跟11.2 RAC一致。但是从12.2beta版 RAC的测试结果来看,从12.2开始OUI安装很可能只有Flex Cluster了,没有了11.2的那种普通RAC了。
.
Linux4和Linux5中,在完成核内引导(内核被载入内存并运行,初始化所有的设备驱动程序和数据结构等)之后,就通过启动一个用户级程序/sbin/init的方式来启动其他用户级的进程或服务。
所以,init始终是第一个进程,其PID始终为1(ps -aux | less),它是系统所有进程的父进程.

[root@lunarnew1 grid]# ps -ef|grep init|grep -v grep
root         1     0  0 Jan14 ?        00:00:00 init [5]                                                                 
root      2987     1  0 Jan14 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
[root@lunarnew1 grid]# 
[root@lunarnew1 grid]# which init
/sbin/init
[root@lunarnew1 grid]# 
接下来/sbin/init会读取/etc/inittab的内容,/etc/inittab中包含了很多启动其他用户进程和服务的指令:
其格式如下:
 identifier :  run_level  :  action  :  process
因此,开机自动启动/etc/init.d/init.ohasd的脚本也配置在/etc/inittab中。
我的环境是Linux5.10,看一下ohasd的启动脚本的具体信息:
[root@lunarnew1 grid]# cat /etc/issue
Oracle Linux Server release 5.10
Kernel \r on an \m

[root@lunarnew1 grid]# 
[root@lunarnew1 grid]# ll /etc/inittab.orig
ls: /etc/inittab.orig: No such file or directory
[root@lunarnew1 grid]# ll /etc/inittab*
-rw-r--r-- 1 root root 1671 Mar 21  2015 /etc/inittab.no_crs
-rw-r--r-- 1 root root 1740 Mar 21  2015 /etc/inittab.crs
-rw-r--r-- 1 root root 1740 Mar 21  2015 /etc/inittab
[root@lunarnew1 grid]# 

我们看一下这三个文件哪里不同:

[root@lunarnew1 grid]# diff /etc/inittab.crs /etc/inittab
[root@lunarnew1 grid]# diff /etc/inittab.no_crs /etc/inittab
58a59,60
> 
> h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
[root@lunarnew1 grid]# 

可以看出,/etc/inittab.no_crs的内容就是在没安装GI以前的/etc/inittab备份文件,而/etc/inittab.crs的内容就是安装GI以后/etc/inittab 备份文件
也就是说,在Linux 5中,安装完RAC(10.2或者11.2)后,该脚本就会增加上面一行启动ohasd守护进程的脚本,如果要在系统启动时启动crs,那么就需要让/etc/inittab中包含下面的一行启动命令:
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 如果要让系统启动时不启动crs,那么就不能包含上述行。当然这个方法是Oracle不支持的,Oracle建议使用crsctl disable has来禁止开机启动HAS。
(注意,如果rac环境,使用crsctl disable crs命令)

/etc/inittab的全部内容如下(注意,只有Linux 5 和Linux5以下的版本是这样的):
[root@lunarnew1 grid]# cat  /etc/inittab
#
# inittab       This file describes how the INIT process should set up
#               the system in a certain run-level.
#
# Author:       Miquel van Smoorenburg, <miquels@drinkel.nl.mugnet.org>
#               Modified for RHS Linux by Marc Ewing and Donnie Barnes
#

# Default runlevel. The runlevels used by RHS are:
#   0 – halt (Do NOT set initdefault to this)
#   1 – Single user mode
#   2 – Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 – Full multiuser mode
#   4 – unused
#   5 – X11
#   6 – reboot (Do NOT set initdefault to this)
# 
id:5:initdefault:

# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Trap CTRL-ALT-DELETE
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# When our UPS tells us power has failed, assume we have a few minutes
# of power left.  Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.  
pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled"


# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:3:respawn:/etc/X11/prefdm -nodaemon


h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
[root@lunarnew1 grid]# 

到了Linux 6中,init只负责读取配置文件,处理各种服务和应用程序的依赖关系,根据事件来启动这些功能和服务,并动态的进行管理。
而事件在Linux 6中是由“Upstart 事件管理器”来管理的,并且这些事件使用initctl工具来控制,initctl的作用类似于Linux5中的chkconfig工具。
我这里的一套在Linux 6.6环境安装的11.2.0.4 单机ASM环境(Standalone),可以看到其中并没有Linux 5中那样/etc/inittab.orig文件:

[root@lunarlib rootwork]# ll /etc/inittab.orig
ls: cannot access /etc/inittab.orig: No such file or directory
[root@lunarlib rootwork]# 

也就是说,到了Linux 6版本,Oracle不再将/etc/init.d/init.ohasd的启动不再写入/etc/inittab文件(不由/sbin/init进程根据inittab来启动ohasd的脚本):

[root@lunarlib rootwork]# ll /etc/inittab*
-rw-r-----. 1 root root 884 Apr  6  2015 /etc/inittab
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# tail /etc/inittab
# Default runlevel. The runlevels used are:
#   0 - halt (Do NOT set initdefault to this)
#   1 - Single user mode
#   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 - Full multiuser mode
#   4 - unused
#   5 - X11
#   6 - reboot (Do NOT set initdefault to this)
# 
id:3:initdefault:
[root@lunarlib rootwork]# 

那么在Linux 6中Oracle RAC的这个守护进程如何启动的呢?
因为从Linux 6开始系统采用upstart事件引导机制(好像是Linux 6特有的),根据/etc/init/目录下的事件管理配置文件来决定系统在启动时运行哪些服务:

[root@lunarlib init.d]# cat /etc/init/oracle-ohasd.conf
# Copyright (c) 2001, 2011, Oracle and/or its affiliates. All rights reserved. 
#
# Oracle OHASD startup

start on runlevel [35]
stop  on runlevel [!35]
respawn
exec /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
[root@lunarlib init.d]# 

因此,Oracle也采用了这种方式。安装GI时,oracle会为/etc/init.d/init.ohasd引导进程配置一个相应的配置文件,也就是/etc/init/oracle-ohasd.conf配置文件。
也就是说,在Linux 6中使用/etc/init目录下的oracle-ohasd.conf配置文件的启动方式,取代了Linux5中的配置使用/etc/inittab启动的方式。
这个就是Oracle顺应Linux的发展,根据系统启动时的引导和启动服务的机制在Linux中不同版本的变化而变化。

[root@lunarlib rootwork]# ll /etc/init
total 72
-rw-r--r--. 1 root root  412 Jul 22  2014 control-alt-delete.conf
-rw-r--r--. 1 root root  130 Mar 12  2014 init-system-dbus.conf
-rw-r--r--. 1 root root  463 Jul 22  2014 kexec-disable.conf
-rw-r--r--  1 root root  220 Oct 11 01:30 oracle-ohasd.conf
-rw-r--r--. 1 root root  560 Jul 22  2014 plymouth-shutdown.conf
-rw-r--r--. 1 root root  357 Jul 22  2014 prefdm.conf
-rw-r--r--. 1 root root  505 Jul 22  2014 quit-plymouth.conf
-rw-r--r--. 1 root root  417 Jul 22  2014 rc.conf
-rw-r--r--. 1 root root 1046 Jul 22  2014 rcS.conf
-rw-r--r--. 1 root root  430 Jul 22  2014 rcS-emergency.conf
-rw-r--r--. 1 root root  725 Jul 22  2014 rcS-sulogin.conf
-rw-r--r--. 1 root root 2915 Nov 21  2013 readahead-collector.conf
-rw-r--r--. 1 root root 1559 Nov 21  2013 readahead.conf
-rw-r--r--. 1 root root  726 Nov 21  2013 readahead-disable-services.conf
-rw-r--r--. 1 root root 1302 Jul 22  2014 serial.conf
-rw-r--r--. 1 root root  791 Jul 22  2014 splash-manager.conf
-rw-r--r--. 1 root root  473 Jul 22  2014 start-ttys.conf
-rw-r--r--. 1 root root  335 Jul 22  2014 tty.conf
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ls -l /etc/init/oracle-ohasd.conf
-rw-r--r-- 1 root root 220 Oct 11 01:30 /etc/init/oracle-ohasd.conf
[root@lunarlib rootwork]# 

在Linux 6中,可以使用/sbin/initctl工具(类似于Linux5的chkconfig工具)来管理/etc/init.d/init.ohasd的启动。
(注意,Linux5中不能使用chkconfig管理/etc/init.d/init.ohasd进程)

[root@lunarlib rootwork]# initctl list
rc stop/waiting
tty (/dev/tty3) start/running, process 4353
tty (/dev/tty2) start/running, process 4349
tty (/dev/tty1) start/running, process 4347
tty (/dev/tty6) start/running, process 4365
tty (/dev/tty5) start/running, process 4361
tty (/dev/tty4) start/running, process 4359
plymouth-shutdown stop/waiting
control-alt-delete stop/waiting
rcS-emergency stop/waiting
readahead-collector stop/waiting
kexec-disable stop/waiting
quit-plymouth stop/waiting
rcS stop/waiting
prefdm stop/waiting
init-system-dbus stop/waiting
readahead stop/waiting
splash-manager stop/waiting
start-ttys stop/waiting
readahead-disable-services stop/waiting
rcS-sulogin stop/waiting
serial stop/waiting
oracle-ohasd start/running, process 12661
[root@lunarlib rootwork]# 

查看oracle-ohasd服务的信息:

[root@lunarlib rootwork]# /sbin/initctl list |grep oracle-ohasd
oracle-ohasd start/running, process 2897
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2897     1  0 06:05 ?        00:00:01 /bin/sh /etc/init.d/init.ohasd run
root      6651  6336  0 07:52 pts/0    00:00:00 grep ohasd
[root@lunarlib rootwork]# 

到了Linux 7,由于系统的引导方式又变化了,系统采用systemd机制,使用 socket 和 D-Bus 来并行启动服务的模式,提供基于守护进程的按需启动策略,已经没有运行级别的概念(但是完全兼容sysvinit)。
哪些需要启动的服务的配置文件放在 /lib/systemd/system/ 目录下,以 *.service命名,类似如下:
在Linux7中,系统启动加载Oracle RAC使用的ohasd单元需要如下配置(安装时,执行root.sh就会配置ohasd服务单元,如果没有配置好,自己手工配置就可以了):

[root@lunar system]# ll /etc/systemd/system/|grep ohasd
-rw-r--r--  1 root root  361 Oct  9 12:29 oracle-ohasd.service
[root@lunar system]# 
[root@lunar system]# systemctl list-units -t service|grep ohasd
ohasd.service                      loaded active exited  ohasd.service
oracle-ohasd.service               loaded active running Oracle High Availability Services
[root@lunar system]# 

事实上,在Linux 7(当前的版本只有Linux 7.1和7.2)上安装,Oracle 11.2.0.4(在11.2的版本中,只有11.2.0.4支持Linux 7,其余11.2支持Linux 7以下的版本)和12c(当前的是12.1和12.2beta版本)RAC时,有bug,该文件是不能被root.sh创建的,需要自己手工创建。
具体看下服务定义文件的格式:

[root@lunar system]# cat /etc/systemd/system/oracle-ohasd.service
# Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.
#
# Oracle OHASD startup

[Unit]
Description=Oracle High Availability Services
After=syslog.target network.target remote-fs.target

[Service]
ExecStart=/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Type=simple
Restart=always

[Install]
WantedBy=multi-user.target graphical.target
[root@lunar system]# 

systemd的单元定义中,如果指定了“After=”,那么就表示这个服务依赖于After指定的服务。如果没有After,就表示这个服务可以跟其他的并行启动。
这里定义了oracle-ohasd.service在syslog.target network.target remote-fs.target这三个单元启动后开始启动
执行命令为:/etc/init.d/init.ohasd run >/dev/null 2>&1 启动类型为:simple
最后,我们查看系统当前加载的所有单元:

[root@lunar system]# ll /etc/systemd/system/
total 16
drwxr-xr-x. 2 root root   30 Oct  8 07:17 basic.target.wants
drwxr-xr-x. 2 root root   30 Oct  7 23:08 bluetooth.target.wants
lrwxrwxrwx. 1 root root   41 Oct  7 23:08 dbus-org.bluez.service -> /usr/lib/systemd/system/bluetooth.service
lrwxrwxrwx. 1 root root   44 Oct  7 23:10 dbus-org.freedesktop.ModemManager1.service -> /usr/lib/systemd/system/ModemManager.service
lrwxrwxrwx. 1 root root   46 Oct  7 23:04 dbus-org.freedesktop.NetworkManager.service -> /usr/lib/systemd/system/NetworkManager.service
lrwxrwxrwx. 1 root root   57 Oct  7 23:04 dbus-org.freedesktop.nm-dispatcher.service -> /usr/lib/systemd/system/NetworkManager-dispatcher.service
lrwxrwxrwx  1 root root   41 Oct  8 16:51 default.target -> /usr/lib/systemd/system/multi-user.target
drwxr-xr-x. 2 root root   85 Oct  7 23:02 default.target.wants
lrwxrwxrwx. 1 root root   35 Oct  7 23:08 display-manager.service -> /usr/lib/systemd/system/gdm.service
drwxr-xr-x. 2 root root   31 Oct  7 23:02 getty.target.wants
drwxr-xr-x. 2 root root   90 Oct  9 12:29 graphical.target.wants
drwxr-xr-x. 2 root root 4096 Oct  9 12:29 multi-user.target.wants
lrwxrwxrwx. 1 root root   38 Oct  7 23:04 mysql.service -> /usr/lib/systemd/system/mysqld.service
-rw-r--r--  1 root root  361 Oct  9 12:29 oracle-ohasd.service
drwxr-xr-x. 2 root root   25 Oct  7 23:05 printer.target.wants
drwxr-xr-x. 2 root root 4096 Oct  8 07:32 sockets.target.wants
drwxr-xr-x. 2 root root   35 Oct  7 23:10 spice-vdagentd.target.wants
drwxr-xr-x. 2 root root 4096 Oct  8 22:19 sysinit.target.wants
drwxr-xr-x. 2 root root   83 Oct  7 23:05 system-update.target.wants
[root@lunar system]# 
[root@lunar system]# ps -ef|grep ohasd
root      1132     1  0 04:26 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
grid      4017     1  0 04:27 ?        00:00:59 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot
root     11362  8874  0 09:50 pts/1    00:00:00 grep --color=auto ohasd
[root@lunar system]# 
发表在 Database, Linux, RAC | 标签为 , , | 留下评论

etc目录下的init.ohasd和ohasd文件丢失后如何启动GI

上一遍我们已经知道11.2和12c RAC中的/etc/init.d/init.ohasd是启动RAC所有其他进程的守护进程。
那么如果有人误删除了这个文件或者错误修改了,怎么办呢?
这个解决不难,因为在Standalone环境中,/etc/init.d/init.ohasd来自于$GRID_HOME/crs/init/init.ohasd,而/etc/init.d/ohasd来自于$GRID_HOME/crs/init/ohasd。

[root@lunarlib etc]# env|grep ORA
ORACLE_SID=+ASM
GRID_HOME=/u01/app/11.2.0.4/grid
[root@lunarlib rootwork]# ls -lrt $GRID_HOME/crs/init
total 28
-rw-r--r-- 1 root root 6721 Oct 11 01:30 ohasd
-rw-r--r-- 1 root root 8794 Oct 11 01:30 init.ohasd
-rw-r--r-- 1 root root 7038 Oct 11 01:30 ohasd.sles
[root@lunarlib rootwork]# 

我们对比一下$GRID_HOME/crs/init/和/etc/init.d/下的ohasd和init.ohasd,看看文件内容是否一致:
[/shell]

[root@lunarlib rootwork]# ls -lrt $GRID_HOME/crs/init/ohasd
-rw-r--r-- 1 root root 6721 Oct 11 01:30 /u01/app/11.2.0.4/grid/crs/init/ohasd
[root@lunarlib rootwork]# ls -lrt /etc/init.d/ohasd
-rwxr-x--- 1 root root 6721 Oct 11 01:30 /etc/init.d/ohasd
[root@lunarlib rootwork]# ls -lrt $GRID_HOME/crs/init/init.ohasd
-rw-r--r-- 1 root root 8794 Oct 11 01:30 /u01/app/11.2.0.4/grid/crs/init/init.ohasd
[root@lunarlib rootwork]# ls -lrt /etc/init.d/init.ohasd
-rwxr-x--- 1 root root 8794 Jan 11 11:56 /etc/init.d/init.ohasd
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# diff $GRID_HOME/crs/init/ohasd /etc/init.d/ohasd
[root@lunarlib rootwork]# diff $GRID_HOME/crs/init/init.ohasd /etc/init.d/init.ohasd
[root@lunarlib rootwork]# 

可以看到,$GRID_HOME/crs/init/和/etc/init.d/目录下的文件内容是一致的,只是权限不同。/etc/init.d/目录下的文件权限是750,$GRID_HOME/crs/init下的权限是644。
好了,解决方法有了,如果/etc/init.d/init.ohasd或者/etc/init.d/ohasd丢失了,手工创建/etc/init.d/init.ohasd 就可以了:

cp $GRID_HOME/crs/init/init.ohasd /etc/init.d/init.ohasd 
chmod 750 /etc/init.d/init.ohasd

如果再细心一点,我们会发现$GRID_HOME/crs/init目录下除了这两个文件外,还有一个名称为ohasd.sles的文件。
熟悉SLES Linux的朋友可能猜到了,是的,这个是在SLES Linux上使用的ohasd版本。
检查当前版本是否为SLES:

[root@lunarlib rootwork]# rpm -q sles-release
package sles-release is not installed
[root@lunarlib rootwork]#

现在,我们删除/etc/init.d/init.ohasd文件来模拟init.ohasd文件丢失或者损坏:

[root@lunarlib rootwork]# rm -f /etc/init.d/init.ohasd
[root@lunarlib rootwork]# 

然后我们使用$GRID_HOME/crs/init/下面的文件复制过来,手工启动试试看:
下面的显示删除/etc/init.d/init.ohasd后reboot系统的结果(也可以使用kill进程的方式,不重启主机):

[root@lunarlib rootwork]# ps -ef | grep -v grep|grep ohasd
[root@lunarlib rootwork]# ps -ef | grep -v grep|grep -E 'init|d.bin|ocls|evmlogger|UID'
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 09:37 ?        00:00:00 /sbin/init
[root@lunarlib rootwork]# 

可以看到,当前没有任何RAC的进程被启动。
我们尝试恢复这个丢失的ohasd守护进程配置文件:

[root@lunarlib rootwork]# cp $GRID_HOME/crs/init/init.ohasd /etc/init.d/init.ohasd
cp: overwrite `/etc/init.d/init.ohasd'? y
[root@lunarlib rootwork]# 

然后reboot系统后,该进程已经启动了:

[root@lunarlib rootwork]# ps -ef | grep -v grep|grep ohasd
root      2882     1  0 10:33 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
[root@lunarlib rootwork]# ps -ef | grep -v grep|grep -E 'init|d.bin|ocls|evmlogger|UID'
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 10:32 ?        00:00:00 /sbin/init
root      2882     1  0 10:33 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# /sbin/initctl list |grep oracle-ohasd
oracle-ohasd start/running, process 2882
[root@lunarlib rootwork]# 

发表在 Database, Linux, RAC | 标签为 , , | 留下评论

11.2中,如何手工kill所有的CRS进程而不导致主机重启?

我们都知道,在RAC环境中,如果kill ocssd.bin进程,会引起主机重启。
但是有时候系统已经异常了了,且CRS不能正常关闭,而主机可能是几年没重启的老系统,没人敢重启,现在怎么办?
我们只能尝试手工kill进程的方式,然后手工修复CRS(注意,在10.2 RAC中,只有3个d.bin进程)。
测试环境:操作系统是OEL 6.6

[root@lunar1 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.6
[root@lunar1 ~]# 
[root@lunar1 ~]# uname -a
Linux lunar1 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@lunar1 ~]# 

这套RAC的CRS版本是11.2.0.4:

[root@lunar1 ~]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]
[root@lunar1 ~]# crsctl query crs releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.4.0]
[root@lunar1 ~]# crsctl query crs softwareversion
Oracle Clusterware version on node [lunar1] is [11.2.0.4.0]
[root@lunar1 ~]# 

注意,由于12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。
查看当前CRS的状态:

[root@lunar1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.DATADG1.dg
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.DATADG2.dg
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.asm
               ONLINE  ONLINE       lunar1                   Started             
               ONLINE  ONLINE       lunar2                   Started             
ora.gsd
               OFFLINE OFFLINE      lunar1                                       
               OFFLINE OFFLINE      lunar2                                       
ora.net1.network
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
ora.ons
               ONLINE  ONLINE       lunar1                                       
               ONLINE  ONLINE       lunar2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       lunar2                                       
ora.cvu
      1        ONLINE  ONLINE       lunar2                                       
ora.lunar.db
      1        ONLINE  ONLINE       lunar1                   Open                
      2        ONLINE  OFFLINE                               STARTING            
ora.lunar1.vip
      1        ONLINE  ONLINE       lunar1                                       
ora.lunar2.vip
      1        ONLINE  ONLINE       lunar2                                       
ora.oc4j
      1        ONLINE  ONLINE       lunar1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       lunar2                                       
[root@lunar1 ~]# 

查看当前所有的CRS进程:

[root@lunar1 ~]# ps -ef|grep d.bin
root      3860     1  0 19:31 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      3972     1  0 19:31 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      3983     1  0 19:31 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      3994     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
root      4004     1  0 19:31 ?        00:00:15 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      4007     1  0 19:31 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      4019     1  0 19:31 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      4032     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      4051     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      4063     1  0 19:31 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      4157     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      4180     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4343  4180  0 19:32 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5385     1  1 19:39 ?        00:00:17 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      5456     1  0 19:39 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      5473     1  0 19:39 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      5475     1  0 19:39 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle    7132     1  0 20:04 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7350  7273  0 20:04 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

这么多进程,他们的关系参见:11.2 RAC 的启动过程
好吧,我们开始模拟kill进程。首先kill 掉/u01/app/11.2.0.4/grid/bin/ohasd.bin(会自动重启,参见11.2 RAC 的启动过程

[root@lunar1 ~]# kill -9 3860
[root@lunar1 ~]# ps -ef|grep d.bin
grid      3983     1  0 19:31 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      3994     1  0 19:31 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      4007     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      4019     1  0 19:31 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      4032     1  0 19:31 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      4157     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      4180     1  0 19:31 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4343  4180  0 19:32 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5385     1  1 19:39 ?        00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      5456     1  0 19:39 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      5473     1  0 19:39 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      5475     1  0 19:39 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle    7132     1  0 20:04 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7490     1  0 20:06 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root      7534  2487 14 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  6 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  8 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  2 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  3 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      7676  7273  0 20:07 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

然后,我们kill cssdmonitor:

[root@lunar1 ~]# kill -9 4032
-bash: kill: (4032) - No such process
[root@lunar1 ~]# 

这里没有这个集成,表示cssdmonitor进程被重启过了:
(参见11.2 RAC 的启动过程

[root@lunar1 ~]# ps -ef|grep d.bin
grid      3983     1  0 19:31 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      3994     1  0 19:31 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      4007     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      4019     1  0 19:31 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      4157     1  0 19:31 ?        00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      4180     1  0 19:31 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4343  4180  0 19:32 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5385     1  1 19:39 ?        00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      5456     1  0 19:39 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      5473     1  0 19:39 ?        00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      5475     1  0 19:39 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
oracle    7132     1  0 20:04 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7490     1  0 20:06 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root      7534  2487  3 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      7740  7273  0 20:07 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

上面进程启动时间在20:04~20:07之间的,都是被/u01/app/11.2.0.4/grid/bin/ohasd.bin进程重启后,自动后台重启的。
现在,我们kill mdnsd gpnpd gipcd osysmond。
这4个进程中,前面3个是CRS启动除了ohasd以外,最早启动的几个进程。
如果kill这些进程,ohasd都会重启的:

[root@lunar1 ~]# kill -9 3983 3994 4007 4019
[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      6535     1  0 19:50 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      7490     1  0 20:06 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root      7534  2487  2 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      7756     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      7758     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
root      7776  7273  0 20:07 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

这里我们看到,刚才kill 的4 进程都没起来,怎么回事?
别急,还没到时间,ohasd需要check后才启动,O(∩_∩)O哈哈~
然后,我们kill 监听:

[root@lunar1 ~]# kill -9 6535 7490 
[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      7534  2487  2 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
grid      7571     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      7575     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      7756     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      7758     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      7783     1  2 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      7785     1  2 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      7844     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      7853     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin
grid      7873     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
root      7874     1 14 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      7944  7873  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid      7979     1  9 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7982     1  3 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
oracle    7986     1  4 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      8001     1  3 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      8025  7979  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER
grid      8028  7979  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER_SCAN1
root      8083  7273  0 20:08 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

好吧,看看,刚才kill的进程都被重启了,11.2的RAC真强悍啊。
现在我们kill /etc/init.d/init.ohasd进程:

[root@lunar1 ~]# ps -ef|grep ohasd
root      2487     1  0 19:20 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      7534  2487  1 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart
root      8191  7273  0 20:08 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# kill -9 2487 7534
[root@lunar1 ~]# ps -ef|grep ohasd
root      8239     1  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8257  8239  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8258  8257  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8267  7273  0 20:08 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# ps -ef|grep ohasd
root      8239     1  0 20:08 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8299  7273  0 20:08 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# 

这里我们看到的就是/etc/init.d/init.ohasd被系统自动重启的过程。这些信息会记录在/var/log/message/中:

[root@lunar1 ~]# tail -f /var/log/messages
Jan 24 19:45:31 lunar1 kernel: e1000 0000:00:03.0 eth0: Reset adapter
Jan 24 20:03:50 lunar1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jan 24 20:03:52 lunar1 kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jan 24 20:07:01 lunar1 clsecho: /etc/init.d/init.ohasd: ohasd is restarting 1/10.
Jan 24 20:07:01 lunar1 logger: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "restart"
Jan 24 20:08:26 lunar1 init: oracle-ohasd main process (2487) killed by KILL signal
Jan 24 20:08:26 lunar1 init: oracle-ohasd main process ended, respawning
Jan 24 20:13:58 lunar1 init: oracle-ohasd main process (8239) killed by KILL signal
Jan 24 20:13:58 lunar1 init: oracle-ohasd main process ended, respawning
Jan 24 20:14:12 lunar1 root: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "reboot"
^C
[root@lunar1 ~]# 

而且他进程都被自动重启了(注意这是crsd进程还没被重启):

[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
grid      7756     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
grid      7758     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      7783     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      7785     1  1 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      7844     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      7853     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin
grid      7873     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
root      7874     1  3 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      7944  7873  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid      7979     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      7982     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
oracle    7986     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root      8001     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      8119     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      8120     1  0 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
root      8321  8319  1 20:08 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has
root      8325  7273  0 20:08 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

现在我们依次kill:evmlogger.bin gpnpd.bin mdnsd.bin gipcd.bin evmd.bin oraagent.bin scriptagent.bin oraagent.bin orarootagent.bin和两个lisnterner

[root@lunar1 ~]# kill -9 7944 7756 7758 7783 7873 7979 7982 7986 8001 8119 8120
[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      7578     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
root      7588     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      7785     1  1 20:07 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      7844     1  0 20:07 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      8593  8591  0 20:09 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has
root      8597  7273  0 20:09 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

然后,kill osysmond.bin ologgerd cssdmonitor cssdagent :

[root@lunar1 ~]# kill -9 7785 7844 7588 7578  
[root@lunar1 ~]# 

好吧,现在就剩下一个ocssd.bin了:

[root@lunar1 ~]# ps -ef|grep d.bin
grid      4063     1  0 19:31 ?        00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      8629  7273  0 20:10 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]# 

现在我们kill 传说中一旦被kill就会引起主机重启的进程 ocssd.bin :

[root@lunar1 ~]# kill -9 4063
[root@lunar1 ~]# 

好了,我们的系统都还好好的,没有重启,资源也都释放干净了:

[root@lunar1 ~]# ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00000000 0          root       600        1         
0x00000000 65537      root       600        1         

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

[root@lunar1 ~]# 
[root@lunar1 ~]# 

如果要恢复,很简单,只要直接重启crs就ok了:

[root@lunar1 ~]# ps -ef | grep -v grep|grep -E 'init|d.bin|ocls|evmlogger|UID'
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 19:20 ?        00:00:01 /sbin/init
root      2486     1  0 19:20 ?        00:00:00 /bin/sh /etc/init.d/init.tfa run
root      8924     1  0 20:13 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
[root@lunar1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@lunar1 ~]# ps -ef|grep ohasd
root      8924     1  0 20:13 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8968     1  4 20:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      9187  7273  0 20:14 pts/2    00:00:00 grep ohasd
[root@lunar1 ~]# 
[root@lunar1 ~]# ps -ef|grep d.bin
root      8968     1  0 20:14 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      9090     1  0 20:14 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      9101     1  0 20:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin
grid      9112     1  0 20:14 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin
root      9122     1  0 20:14 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      9126     1  0 20:14 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/gipcd.bin
root      9139     1  0 20:14 ?        00:00:12 /u01/app/11.2.0.4/grid/bin/osysmond.bin
root      9150     1  0 20:14 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/cssdmonitor
root      9169     1  0 20:14 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      9180     1  0 20:14 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
root      9212     1  1 20:14 ?        00:00:28 /u01/app/11.2.0.4/grid/bin/ologgerd -M -d /u01/app/11.2.0.4/grid/crf/db/lunar1
root      9340     1  0 20:18 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot
grid      9363     1  0 20:18 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/evmd.bin
root      9455     1  0 20:18 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
grid      9532  9363  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
grid      9569     1  0 20:18 ?        00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      9572     1  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin
root      9591     1  0 20:18 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/orarootagent.bin
grid      9682     1  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      9684     1  0 20:18 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
oracle    9774     1  0 20:19 ?        00:00:03 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root     10642  7273  0 20:38 pts/2    00:00:00 grep d.bin
[root@lunar1 ~]#
[root@lunar1 ~]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       lunar1                                       
ora.DATADG1.dg
               ONLINE  ONLINE       lunar1                                       
ora.DATADG2.dg
               ONLINE  ONLINE       lunar1                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       lunar1                                       
ora.asm
               ONLINE  ONLINE       lunar1                   Started             
ora.gsd
               OFFLINE OFFLINE      lunar1                                       
ora.net1.network
               ONLINE  ONLINE       lunar1                                       
ora.ons
               ONLINE  ONLINE       lunar1                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       lunar1                                       
ora.cvu
      1        ONLINE  ONLINE       lunar1                                       
ora.lunar.db
      1        ONLINE  ONLINE       lunar1                   Open                
      2        ONLINE  OFFLINE                                                   
ora.lunar1.vip
      1        ONLINE  ONLINE       lunar1                                       
ora.lunar2.vip
      1        ONLINE  INTERMEDIATE lunar1                   FAILED OVER         
ora.oc4j
      1        ONLINE  ONLINE       lunar1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       lunar1                                       
[root@lunar1 ~]# 

这里只显示了节点1,因为节点2我关闭了。
测试证明,只要先kill cssdmonitor 和 cssdagent进程(准确的说是cssagent,从那张CRS启动的经典大图上也可以看到这个关系),再kill ocssd.bin进程,系统是不会重启的。
另外,12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。

发表在 RAC | 标签为 , , , | 留下评论

11.2 RAC 的启动过程

从11.2 GI(Grid Infrastructure)开始,RAC的结构跟10.2有翻天覆地的变化,在MOS的经典文档“11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”中有详细的解释。
其中有一副经典大图可以一目了然的告诉我们这些d.bin进程之间的依赖关系(也就是启动和关闭,谁启动重启谁等等):


11.2_clusterware


从CRS的启动过程,我们也可以清晰的看到进程的启动顺序。
下面是一个11.2.0.3环境的CRS启动过程:

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  4 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4338     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      4342     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
root      4348     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
root      4370     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4428  3507  0 20:37 pts/2    00:00:00 grep d.bin
[root@dm01db01 ~]# 

最先启动的是/u01/app/11.2.0.3/grid/bin/ohasd.bin ,他后面呆着reboot,表示它被kill后会被自动reboot。
/etc/init.d/init.ohasd进程就是重启/u01/app/11.2.0.3/grid/bin/ohasd.bin进程的守护进程。
他们的内容都来源于$GRID_HOME/crs/init/init.ohasd,后续blog会模拟丢失这个文件到处理,这里不赘述了。
会自动启动这个进程,并在/var/log/message中记录下这个启动过程。
/u01/app/11.2.0.3/grid/bin/ohasd.bin被kill 后,,系统会有几分钟的重启服务的时间,/var/log/message中记录下这个启动过程:

Jan 11 20:36:18 lunarlib clsecho: /etc/init.d/init.ohasd: ohasd is restarting 1/10.
Jan 11 20:36:18 lunarlib logger: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunarlib_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "restart"

这个重启的过程在空闲系统大概需要不到2分钟,$GRID_HOME/`hostname -s`/alert`hostname -s`.log中会ohasd.bin被kill和重启后执行检查(check)和恢复(recovery)各种资源的日志如下:

2016-01-11 20:36:18.500:
[/u01/app/11.2.0.4/grid/bin/cssdagent(16784)]CRS-5822:Agent '/u01/app/11.2.0.4/grid/bin/cssdagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:5:31} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oracssdagent_grid/oracssdagent_grid.log.
2016-01-11 20:36:18.504:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(16852)]CRS-5822:Agent '/u01/app/11.2.0.4/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:7:7} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:18.789:
[ohasd(17048)]CRS-2112:The OLR service started on node lunarlib.
2016-01-11 20:36:18.796:
[ohasd(17048)]CRS-1301:Oracle High Availability Service started on node lunarlib.
2016-01-11 20:36:49.574:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.CRSDG.dg'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:49.583:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.DATADG1.dg'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:49.594:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.DATADG2.dg'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:36:51.608:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(17083)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/log/lunarlib/agent/ohasd/oraagent_grid/oraagent_grid.log.
2016-01-11 20:37:52.943:
[ohasd(17048)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2016-01-11 20:37:52.943:
[ohasd(17048)]CRS-2769:Unable to failover resource 'ora.diskmon'.

好了,继续回到我们刚才的启动过程的讨论。接下来,我们看到orarootagent.bin cssdagent cssdmonitor不见了,增加 mdnsd.bin

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  4 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1 10 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
root      4452  3507  0 20:37 pts/2    00:00:00 grep d.bin            
[root@dm01db01 ~]# 

然后是增加了 ocssd.bin gpnpd.bin orarootagent.bin gipcd.bin osysmond.bin cssdmonitor cssdagent diskmon.bin

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  5 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  5 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  5 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  3 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4571  3507  4 20:37 pts/2    00:00:00 grep d.bin  
[root@dm01db01 ~]# 

然后是增加了 ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01。
ologgerd(Cluster Logger Service)进程是随着11.2.0.2安装过程自动安装的(11.2.0.2的新特性,以前的版本需要单独下载和安装),属于Cluster Health Monitor(以下简称CHM)组件。
CHM主要用来自动收集操作系统的资源(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。CHM会每秒收集一次数据。
CHM会自动安装在下面的软件:
11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
.
注意上面的osysmond.bin进程跟这里的ologgerd(Cluster Logger Service)进程进程是CHM的主要工作进程。
osysmond会将每个节点的资源使用情况发送给ologgerd(Cluster Logger Service),然后ologgerd将会把所有节点的信息都接收并保存到CHM的资料库。
而这个CHM的资料库在11.2是缺省保存在$GRID_HOME/crf/db/`hostname -s`目录下,大概需要1G的空间。
在12.1.0.1,CHM的资料库是单独保存在GI的数据库中,在安装时可以选择是否安装GIMR(Grid Infrastructure Management Repository )。
在12.1.0.2,CHM的资料库还是单独保存在GI的数据库中,但是GIMR(Grid Infrastructure Management Repository )已经是必选项了。
在12.2,GIMR(Grid Infrastructure Management Repository )使用的数据库MGMTDB可以选择是否跟CRS放在一个磁盘组,还是单独放在一个磁盘组中。
.

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  3 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  2 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4590     1  1 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4614  3507  0 20:38 pts/2    00:00:00 grep d.bin    
[root@dm01db01 ~]# 

继续看下面的启动过程。
在启动ocssd.bin以后,就会启动 octssd.bin :

[root@dm01db01 ~]# 
[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4590     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4685     1  4 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
root      4694  3507  0 20:38 pts/2    00:00:00 grep d.bin     
[root@dm01db01 ~]# 

接下来,启动evmd.bin:

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  2 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4590     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      4685     1  1 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  2 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      4732  3507  0 20:38 pts/2    00:00:00 grep d.bin 
[root@dm01db01 ~]# 

然后是crsd.bin 和 tnslsnr:

[root@dm01db01 ~]#
[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4685     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      5080     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      5100     1  1 20:39 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid      5189  4710  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid      5229     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      5242     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      5368     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      5376     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle    5466     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      5487     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
root      5897  3507  0 20:40 pts/2    00:00:00 grep d.bin
[root@dm01db01 ~]# 

当crsd.bin启动后,就可以使用crsctl status res -t来查看CRS状态了。
如果crsd.bin没启动,那么需要使用crsctl status res -t -init查看。

[root@dm01db01 ~]# ps -ef|grep d.bin
root      4296     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid      4430     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      4444     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid      4458     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root      4472     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      4476     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root      4494     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root      4509     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root      4530     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
grid      4534     1  0 20:37 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid      4557     1  0 20:37 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/ocssd.bin 
root      4685     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid      4710     1  0 20:38 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmd.bin
root      5080     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root      5100     1  0 20:39 ?        00:00:01 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid      5189  4710  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid      5229     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root      5242     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
grid      5368     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid      5376     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
oracle    5466     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid      5487     1  0 20:39 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
grid      6061  5487  0 20:41 ?        00:00:00 /bin/sh /u01/app/11.2.0.3/grid/bin/oc4jctl check
grid      6072  6061  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/perl/bin/perl /u01/app/11.2.0.3/grid/bin/oc4jctl.pl check 8888
grid      6086  5229  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER_SCAN1
grid      6088  5229  1 20:41 ?        00:00:00 /u01/app/11.2.0.3/grid/bin/lsnrctl status LISTENER
root      6096  3507  0 20:41 pts/2    00:00:00 grep d.bin     
[root@dm01db01 ~]# 

最后启动了lsnrctl 和 oc4jctl ,至此,CRS启动完毕。

发表在 RAC | 标签为 , | 留下评论

RAC环境下删除了/var/tmp/.oracle/的临时文件,有什么后果,以及如何处理

测试目的: 模拟RAC环境下有人误操作,删除了/var/tmp/.oracle/*下的oracle临时文件(删除Network Socket File)
测试过程:观察会有什么后果,以及如何处理。
.
测试环境:OEL 6.6 ,Oracle 11.2.0.4 Standalone(单实例使用ASM的环境)
如果是RAC,测试结论应该大体一致(机制类似)。

[root@lunarlib rootwork]# cat /etc/oracle-release 
Oracle Linux Server release 6.6
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# uname -a
Linux lunarlib 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@lunarlib rootwork]# 

在Linux平台上,RAC或者HAS(单实例使用ASM的环境,比如standalone或者我们说的Oracle Restart)使用的Network Socket File在/var/tmp/.oracle/*文件:
(在其他平台(比如, AIX HPUX等等)Network Socket File可能在:ls -lrt /tmp/.oracle/* /tmp/.oracle 或者 /usr/tmp/.oracle)

[root@lunarlib etc]# ls -lrt /var/tmp/.oracle/* 
prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasd
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sprocr_local_conn_0_PROL
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sAevm
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sCevm
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_CSSD
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost
[root@lunarlib etc]# 

使用crsctl stop has -f停止has,然后就可以直接删除/var/tmp/.oracle/* 下面的Network Socket File:

[root@lunarlib rootwork]# crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'lunarlib'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'lunarlib'
CRS-2673: Attempting to stop 'ora.CRSDG.dg' on 'lunarlib'
CRS-2673: Attempting to stop 'ora.lunardb.db' on 'lunarlib'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'lunarlib' succeeded
CRS-2677: Stop of 'ora.lunardb.db' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.DATADG1.dg' on 'lunarlib'
CRS-2673: Attempting to stop 'ora.DATADG2.dg' on 'lunarlib'
CRS-2677: Stop of 'ora.DATADG1.dg' on 'lunarlib' succeeded
CRS-2677: Stop of 'ora.DATADG2.dg' on 'lunarlib' succeeded
CRS-2677: Stop of 'ora.CRSDG.dg' on 'lunarlib' succeeded
CRS-2679: Attempting to clean 'ora.CRSDG.dg' on 'lunarlib'
CRS-2681: Clean of 'ora.CRSDG.dg' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'lunarlib'
CRS-2677: Stop of 'ora.asm' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'lunarlib'
CRS-2677: Stop of 'ora.cssd' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'lunarlib'
CRS-2677: Stop of 'ora.evmd' on 'lunarlib' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'lunarlib' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* 
prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasd
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.1
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.1
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_CSSD
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sAevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sCevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost
[root@lunarlib rootwork]#
[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/*
[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
[root@lunarlib rootwork]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@lunarlib rootwork]# 

如果/var/tmp/.oracle目录不存在,可以手工重建:

[root@lunarlib rootwork]# mkdir /var/tmp/.oracle
[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
[root@lunarlib rootwork]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      5177     1  1 18:12 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      5306     1  1 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      5311     1  1 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      5339     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      5341     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      5356     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      5387  5339  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5400  5264  0 18:14 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* 
prw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/npohasd
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/slunarlibDBG_OHASD
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.1
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_CSSD
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sAevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sCevm
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost
[root@lunarlib rootwork]# 

如果在has正常运行的状态下删除上述oracle临时文件,那么数据库可以使用,但是不能正常关闭:

[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/* 
[root@lunarlib rootwork]# ll /var/tmp/.oracle/* 
ls: cannot access /var/tmp/.oracle/*: No such file or directory
[root@lunarlib rootwork]# ll /var/tmp/.oracle/
total 0
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2877     1  0 17:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      5177     1  0 18:12 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      5653  5264  0 18:21 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2877     1  0 17:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      5177     1  0 18:12 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      5660  5264  0 18:23 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      5177     1  0 18:12 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      5306     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      5311     1  0 18:14 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      5339     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      5341     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      5356     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      5387  5339  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5662  5264  0 18:23 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# crsctl status res -t
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
[root@lunarlib rootwork]# 

可以看到,这时,crs通信异常了。
我们看下数据库:

[oracle@lunarlib work]$ ss

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:17 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options

SYS@lunardb>alter system switch logfile;

System altered.

Elapsed: 00:00:00.14
SYS@lunardb>alter system checkpoint;

System altered.

Elapsed: 00:00:00.06
SYS@lunardb>shutdown immediate
ORA-29701: unable to connect to Cluster Synchronization Service
SYS@lunardb>exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
[oracle@lunarlib work]$ 

这里看到数据库可以正常使用,但是不能关闭,关闭是报错:不能跟CSS进程通信。

[oracle@lunarlib work]$ ss

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:46 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options

SYS@lunardb>

数据库的alert显示为:

Mon Jan 11 18:26:37 2016
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
[oracle@lunarlib trace]$ cat lunardb_ora_22027.trc
Trace file /u01/app/oracle/diag/rdbms/lunardb/lunardb/trace/lunardb_ora_22027.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name:    Linux
Node name:      lunarlib
Release:        3.8.13-44.1.1.el6uek.x86_64
Version:        #2 SMP Wed Sep 10 06:10:25 PDT 2014
Machine:        x86_64
Instance name: lunardb
Redo thread mounted by this instance: 1
Oracle process number: 23
Unix process pid: 22027, image: oracle@lunarlib (TNS V1-V3)


*** 2016-01-11 18:26:37.174
*** SESSION ID:(135.10871) 2016-01-11 18:26:37.174
*** CLIENT ID:() 2016-01-11 18:26:37.174
*** SERVICE NAME:(SYS$USERS) 2016-01-11 18:26:37.174
*** MODULE NAME:(sqlplus@lunarlib (TNS V1-V3)) 2016-01-11 18:26:37.174
*** ACTION NAME:() 2016-01-11 18:26:37.174
 
Stopping background process SMCO

*** 2016-01-11 18:26:38.176
kgxgncin: CLSS init failed with status 3
kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS
NOTE: kfmsInit: ASM failed to initialize group services
[oracle@lunarlib trace]$ 

检查一下oarcle的进程:

[oracle@lunarlib trace]$ ps -ef|grep ora_
oracle    5495     1  0 18:14 ?        00:00:00 ora_pmon_lunardb
oracle    5497     1  0 18:14 ?        00:00:00 ora_psp0_lunardb
oracle    5504     1  4 18:14 ?        00:00:36 ora_vktm_lunardb
oracle    5508     1  0 18:14 ?        00:00:00 ora_gen0_lunardb
oracle    5510     1  0 18:14 ?        00:00:00 ora_diag_lunardb
oracle    5512     1  0 18:14 ?        00:00:00 ora_dbrm_lunardb
oracle    5514     1  0 18:14 ?        00:00:00 ora_dia0_lunardb
oracle    5516     1  0 18:14 ?        00:00:00 ora_mman_lunardb
oracle    5518     1  0 18:14 ?        00:00:00 ora_dbw0_lunardb
oracle    5520     1  0 18:14 ?        00:00:00 ora_lgwr_lunardb
oracle    5522     1  0 18:14 ?        00:00:00 ora_ckpt_lunardb
oracle    5524     1  0 18:14 ?        00:00:00 ora_smon_lunardb
oracle    5526     1  0 18:14 ?        00:00:00 ora_reco_lunardb
oracle    5528     1  0 18:14 ?        00:00:00 ora_rbal_lunardb
oracle    5530     1  0 18:14 ?        00:00:00 ora_asmb_lunardb
oracle    5532     1  0 18:14 ?        00:00:00 ora_mmon_lunardb
oracle    5536     1  0 18:14 ?        00:00:00 ora_mmnl_lunardb
oracle    5540     1  0 18:14 ?        00:00:00 ora_mark_lunardb
oracle    5568     1  0 18:14 ?        00:00:00 ora_arc0_lunardb
oracle    5570     1  0 18:14 ?        00:00:00 ora_arc1_lunardb
oracle    5572     1  0 18:14 ?        00:00:00 ora_arc2_lunardb
oracle    5574     1  0 18:14 ?        00:00:00 ora_arc3_lunardb
oracle    5583     1  0 18:14 ?        00:00:00 ora_qmnc_lunardb
oracle    5611     1  0 18:14 ?        00:00:00 ora_q000_lunardb
oracle    5613     1  0 18:14 ?        00:00:00 ora_q001_lunardb
oracle    6691  6657  0 18:29 pts/4    00:00:00 grep ora_
oracle   22988     1  0 18:26 ?        00:00:00 ora_o000_lunardb
oracle   23012     1  0 18:26 ?        00:00:00 ora_o001_lunardb
[oracle@lunarlib trace]$ 

使用shutdown abort关闭数据库:

SYS@lunardb>shutdown abort
ORACLE instance shut down.
SYS@lunardb>exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
[oracle@lunarlib work]$ 
[oracle@lunarlib trace]$ ps -ef|grep ora_
oracle    6709  6657  0 18:31 pts/4    00:00:00 grep ora_
[oracle@lunarlib trace]$ 

alert显示:

Mon Jan 11 18:30:38 2016
Shutting down instance (abort)
License high water mark = 5
USER (ospid: 26332): terminating the instance
Instance terminated by USER, pid = 26332
Mon Jan 11 18:30:38 2016
Instance shutdown complete

这时,如果数据库再次启动就会报错:

[oracle@lunarlib work]$ ss

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:31:50 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SYS@lunardb>startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATADG1/lunardb/spfilelunardb.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATADG1/lunardb/spfilelunardb.ora
ORA-29701: unable to connect to Cluster Synchronization Service
SYS@lunardb>

而此时has的其他进程是存在的,只是/var/tmp/.oracle/* 下面的网络socket文件不在了:

[root@lunarlib rootwork]# ll /var/tmp/.oracle/* 
ls: cannot access /var/tmp/.oracle/*: No such file or directory
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2877     1  0 17:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      5177     1  0 18:12 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      6723  4677  0 18:33 pts/0    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      5177     1  0 18:12 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      5306     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      5339     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      5341     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      5356     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      5387  5339  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      6725  4677  0 18:33 pts/0    00:00:00 grep d.bin
[root@lunarlib rootwork]# 
/u01/app/11.2.0.4/grid/log/lunarlib/ohasd下的ohasd.log中有如下信息:
2016-01-11 18:28:09.091: [ CRSCOMM][406906624] IpcL: connection to member 9 has been removed
2016-01-11 18:28:09.091: [CLSFRAME][406906624] Removing IPC Member:{Relative|Node:0|Process:9|Type:3}
2016-01-11 18:28:09.091: [CLSFRAME][406906624] Disconnected from AGENT process: {Relative|Node:0|Process:9|Type:3}
2016-01-11 18:28:09.092: [    AGFW][333440768]{0:0:132} Agfw Proxy Server received process disconnected notification, count=1
2016-01-11 18:28:09.092: [    AGFW][333440768]{0:0:132} /u01/app/11.2.0.4/grid/bin/oraagent_grid disconnected.
2016-01-11 18:28:09.092: [    AGFW][333440768]{0:0:132} Agent /u01/app/11.2.0.4/grid/bin/oraagent_grid[5311] stopped!
2016-01-11 18:28:09.092: [ CRSCOMM][333440768]{0:0:132} IpcL: removeConnection: Member 9 does not exist in pending connections.
2016-01-11 18:28:09.093: [    AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:28:09.093: [    AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:3
2016-01-11 18:28:09.095: [   CRSPE][322934528]{0:0:133} Disconnected from server:
2016-01-11 18:28:09.098: [    AGFW][333440768]{0:0:132} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:31:39.112: [    INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) :  Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:31:39.112: [    AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid because pid is not initialized
2016-01-11 18:31:39.112: [    AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:31:39.112: [    AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:5
2016-01-11 18:31:39.119: [    AGFW][333440768]{0:0:132} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:35:09.131: [    INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) :  Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:35:09.131: [    AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid because pid is not initialized
2016-01-11 18:35:09.131: [    AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:35:09.131: [    AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:7
2016-01-11 18:35:09.137: [    AGFW][333440768]{0:0:132} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid

此时,使用crsctl stop has -f不能停止has服务:

[root@lunarlib rootwork]# crsctl stop has -f
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
[root@lunarlib rootwork]# 

reboot是比较好的选择。那么,如果主机不方便reboot,怎么办呢?
不能重启主机,咱们可以手工处理相关问题。首先,手工清理所有has的进程的网络通讯socket临时文件:

[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/*
[root@lunarlib rootwork]# ll /var/tmp/.oracle/
total 0
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      4332     1  0 18:40 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      4560     1  0 18:42 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      4566     1  0 18:42 ?        00:00:11 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      4591     1  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4594     1  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      4603     1  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      4639  4591  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      4994  4305  0 19:02 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2882     1  0 18:40 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      4332     1  0 18:40 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      4996  4305  0 19:02 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# crsctl status res -t
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# crsctl stop has -f
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
[root@lunarlib rootwork]# 

这时正常的停止has的命令都不能使用了,因为进程间通讯的socket文件被我们删除了。
但是我们可以kill他们:

[root@lunarlib rootwork]# kill -9 4332 4560 4566 4591 4594 4603 4639 2882 4332
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15575  4305  0 19:04 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15548     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15580  4305  0 19:04 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# kill -9 15548
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15608  4305  0 19:04 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15623  4305  0 19:04 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# /etc/init.d/init.ohasd stop -f
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15650  4305  0 19:05 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# /etc/init.d/init.ohasd stop
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15672  4305  0 19:05 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# 

在我的测试中,has环境下,一次kill所有进程主机都没有重启(在rac环境下,kill ocssd.bin可能会引起主机重启):

[root@lunarlib rootwork]# ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00000000 0          root       600        1         
0x00000000 65537      root       600        1         

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

[root@lunarlib rootwork]#

然后手工重启has:

[root@lunarlib rootwork]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root     15817  4520  0 19:09 pts/0    00:00:00 tail -f ohasd.log
root     15935 15908  0 19:10 pts/2    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15806  4305  0 19:09 pts/1    00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start has
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid     15851     1  0 19:09 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root     15937 15908  0 19:10 pts/2    00:00:00 grep d.bin
[root@lunarlib rootwork]# 

随着has的启动,它自己创建了新的网络通讯socket文件:

[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasd
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15806  4305  0 19:09 pts/1    00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start has
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid     15851     1  0 19:09 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root     15940 15908  0 19:11 pts/2    00:00:00 grep d.bin
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid     15947     1  0 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid     15952     1  1 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid     15977     1  0 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid     15980     1  1 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid     15994     1  1 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid     16026 15980  0 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root     16040 15908  0 19:11 pts/2    00:00:00 grep d.bin
[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasd
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.1
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sAevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sCevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_CSSD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost_lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib__lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sSYSTEM.evm.acceptor.auth
[root@lunarlib rootwork]# 

现在has全部启动正常了:

[root@lunarlib rootwork]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       lunarlib                                     
ora.DATADG1.dg
               ONLINE  ONLINE       lunarlib                                     
ora.DATADG2.dg
               ONLINE  ONLINE       lunarlib                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       lunarlib                                     
ora.asm
               ONLINE  ONLINE       lunarlib                 Started             
ora.ons
               OFFLINE OFFLINE      lunarlib                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       lunarlib                                     
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       lunarlib                                     
ora.lunardb.db
      1        ONLINE  ONLINE       lunarlib                 Open                
[root@lunarlib rootwork]# 

总结,如果RAC或者HAS下
1,在Linux平台上,Network Socket File在/var/tmp/.oracle/目录下。在其他平台,可能的目录有:/tmp/.oracle/*, /tmp/.oracle 或者 /usr/tmp/.oracle
2,如果CRS或者HAS没有启动,删除oracle临时文件(Network Socket File),在CRS重启后会自动重新创建,没有不良影响。
3,如果CRS或者HAS已经启动并正常运行中,删除oracle临时文件,不影响数据库运行,但是数据库不能正常关闭(可以abort,但是不能启动)
4,如果出现了上面的情况3,CRS不能关闭(包括使用-f选项),只能手工清理共享内存段和kill 进程。在HAS中,kill ocssd.bin进程不会造成主机重启。但是在RAC环境下kill ocssd.bin进程会造成主机重启。
5,如果完成了上面的情况4,只需要重启CRS或者HAS就可以了。

发表在 ASM, Oracle 11.1 & Oracle11.2, RAC | 标签为 , | 留下评论

Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-11-配置VNC和常见问题处理

在Linux7上配置VNC和以往方法不同(也是由于Linux7上的服务管理方式的变化造成的)。
安装vnc的时候,需要配置yum源,我一般选择配置本地yum源,本地yum源有两种主要方式:
1,使用vbox的共享目录
2,使用虚拟出来的光驱
配制方法跟Linux7以前的版本(Linux6和Linux5)一样,这里不赘述。
.
查看本地yum源里面可以安装的vnc包:

[root@lunar1 yum.repos.d]# yum list|grep vnc
gtk-vnc2.x86_64                        0.5.2-7.el7                 @anaconda    
gvnc.x86_64                            0.5.2-7.el7                 @anaconda    
libvncserver.x86_64                    0.9.9-9.el7_0.1             @anaconda    
gtk-vnc2.i686                          0.5.2-7.el7                 OEL72_stage  
gvnc.i686                              0.5.2-7.el7                 OEL72_stage  
libvncserver.i686                      0.9.9-9.el7_0.1             OEL72_stage  
tigervnc.x86_64                        1.3.1-3.el7                 CentOS7_stage
tigervnc-icons.noarch                  1.3.1-3.el7                 CentOS7_stage
tigervnc-license.noarch                1.3.1-3.el7                 CentOS7_stage
tigervnc-server.x86_64                 1.3.1-3.el7                 CentOS7_stage
tigervnc-server-minimal.x86_64         1.3.1-3.el7                 CentOS7_stage
[root@lunar1 yum.repos.d]# 

我一般选择tigervnc。

[root@lunar1 yum.repos.d]# yum install tigervnc-server.x86_64
已加载插件:fastestmirror, langpacks
Loading mirror speeds from cached hostfile
正在解决依赖关系
--> 正在检查事务
---> 软件包 tigervnc-server.x86_64.0.1.3.1-3.el7 将被 安装
--> 正在处理依赖关系 tigervnc-server-minimal,它被软件包 tigervnc-server-1.3.1-3.el7.x86_64 需要
--> 正在检查事务
---> 软件包 tigervnc-server-minimal.x86_64.0.1.3.1-3.el7 将被 安装
--> 正在处理依赖关系 tigervnc-license,它被软件包 tigervnc-server-minimal-1.3.1-3.el7.x86_64 需要
--> 正在检查事务
---> 软件包 tigervnc-license.noarch.0.1.3.1-3.el7 将被 安装
--> 解决依赖关系完成

依赖关系解决

================================================================================================================================================================================================
 Package                                                 架构                                   版本                                        源                                             大小
================================================================================================================================================================================================
正在安装:
 tigervnc-server                                         x86_64                                 1.3.1-3.el7                                 CentOS7_stage                                 202 k
为依赖而安装:
 tigervnc-license                                        noarch                                 1.3.1-3.el7                                 CentOS7_stage                                  25 k
 tigervnc-server-minimal                                 x86_64                                 1.3.1-3.el7                                 CentOS7_stage                                 1.0 M

事务概要
================================================================================================================================================================================================
安装  1 软件包 (+2 依赖软件包)

总下载量:1.2 M
安装大小:3.0 M
Is this ok [y/d/N]: y
Downloading packages:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                                            7.9 MB/s | 1.2 MB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在安装    : tigervnc-license-1.3.1-3.el7.noarch                                                                                                                                         1/3 
  正在安装    : tigervnc-server-minimal-1.3.1-3.el7.x86_64                                                                                                                                  2/3 
  正在安装    : tigervnc-server-1.3.1-3.el7.x86_64                                                                                                                                          3/3 
  验证中      : tigervnc-license-1.3.1-3.el7.noarch                                                                                                                                         1/3 
  验证中      : tigervnc-server-minimal-1.3.1-3.el7.x86_64                                                                                                                                  2/3 
  验证中      : tigervnc-server-1.3.1-3.el7.x86_64                                                                                                                                          3/3 

已安装:
  tigervnc-server.x86_64 0:1.3.1-3.el7                                                                                                                                                          

作为依赖被安装:
  tigervnc-license.noarch 0:1.3.1-3.el7                                                       tigervnc-server-minimal.x86_64 0:1.3.1-3.el7                                                      

完毕!
[root@lunar1 yum.repos.d]# 

Linux7之前的系统,如果安装vnc一般都需要使用vncserver命令来设置口令,然后配置/etc/sysconfig/vncservers文件。
在Linux7中,仍然还存在这个文件,不过其内容只有一行:

[root@lunar1 ~]# cat /etc/sysconfig/vncservers 
# THIS FILE HAS BEEN REPLACED BY /lib/systemd/system/vncserver@.service
[root@lunar1 ~]# 
这里看到/etc/sysconfig/vncservers的内容实际上是告诉我们:vnc已经被换为systemd管理的服务了。
现在,我们来查看这个文件:
[root@lunar1 ~]# cat /lib/systemd/system/vncserver@.service
# The vncserver service unit file
#
# Quick HowTo:
# 1. Copy this file to /etc/systemd/system/vncserver@.service
# 2. Edit <USER> and vncserver parameters appropriately
#   ("runuser -l <USER> -c /usr/bin/vncserver %i -arg1 -arg2")
# 3. Run `systemctl daemon-reload`
# 4. Run `systemctl enable vncserver@:<display>.service`
#
# DO NOT RUN THIS SERVICE if your local area network is
# untrusted!  For a secure way of using VNC, you should
# limit connections to the local host and then tunnel from
# the machine you want to view VNC on (host A) to the machine
# whose VNC output you want to view (host B)
#
# [user@hostA ~]$ ssh -v -C -L 590N:localhost:590M hostB
#
# this will open a connection on port 590N of your hostA to hostB's port 590M
# (in fact, it ssh-connects to hostB and then connects to localhost (on hostB).
# See the ssh man page for details on port forwarding)
#
# You can then point a VNC client on hostA at vncdisplay N of localhost and with
# the help of ssh, you end up seeing what hostB makes available on port 590M
#
# Use "-nolisten tcp" to prevent X connections to your VNC server via TCP.
#
# Use "-localhost" to prevent remote VNC clients connecting except when
# doing so through a secure tunnel.  See the "-via" option in the
# `man vncviewer' manual page.


[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=forking
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
ExecStart=/usr/sbin/runuser -l <USER> -c "/usr/bin/vncserver %i"
PIDFile=/home/<USER>/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target
[root@lunar1 ~]# 

请注意,上面文件中的重要提示,这段提示清晰的告诉我们该如何配置vnc:

# Quick HowTo:
# 1. Copy this file to /etc/systemd/system/vncserver@.service
# 2. Edit <USER> and vncserver parameters appropriately
#   ("runuser -l <USER> -c /usr/bin/vncserver %i -arg1 -arg2")
# 3. Run `systemctl daemon-reload`
# 4. Run `systemctl enable vncserver@:<display>.service`

好吧,按照提示,我们首先配置root用户的vnc服务启动配置文件:

[root@lunar ~]# cp /lib/systemd/system/vncserver@.service /lib/systemd/system/vncserver@:1.service
[root@lunar ~]# ll /lib/systemd/system/vncserver@*
-rw-r--r-- 1 root root 1744 Oct  8 20:05 /lib/systemd/system/vncserver@:1.service
-rw-r--r-- 1 root root 1744 May  7  2014 /lib/systemd/system/vncserver@.service
[root@lunar ~]# 

然后,按照上面文件中(/lib/systemd/system/vncserver@.service)的配置方式
修改前:

[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=forking
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l <USER> -c "/usr/bin/vncserver %i"
PIDFile=/home/<USER>/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target

修改后:

[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=simple
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill :1 > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l root -c "/usr/bin/vncserver :1"
PIDFile=/root/.vnc/%H:1.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill :1 > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target

然后,重新加载systemd的配置:
[root@lunar1 ~]# systemctl daemon-reload
[root@lunar1 ~]#
然后,我们设置为自动启动:

[root@lunar1 ~]# systemctl enable vncserver@:1.service
Created symlink from /etc/systemd/system/multi-user.target.wants/vncserver@:1.service to /usr/lib/systemd/system/vncserver@:1.service.
[root@lunar1 ~]# ll /etc/systemd/system/multi-user.target.wants/vncserver@:1.service
lrwxrwxrwx 1 root root 44 1月  17 00:29 /etc/systemd/system/multi-user.target.wants/vncserver@:1.service -> /usr/lib/systemd/system/vncserver@:1.service
[root@lunar1 ~]# 

启动vnc服务:

[root@lunar ~]# systemctl enable vncserver@:1.service
[root@lunar ~]# systemctl start vncserver@:1.service
[root@lunar ~]# systemctl status vncserver@:1.service
vncserver@:1.service - Remote desktop service (VNC)
   Loaded: loaded (/usr/lib/systemd/system/vncserver@:1.service; enabled)
   Active: active (running) since Thu 2015-10-08 20:39:00 CST; 5s ago
  Process: 21624 ExecStop=/bin/sh -c /usr/bin/vncserver -kill :1 > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)
  Process: 21634 ExecStartPre=/bin/sh -c /usr/bin/vncserver -kill :1 > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)
 Main PID: 21668 (Xvnc)
   CGroup: /system.slice/system-vncserver.slice/vncserver@:1.service
           ‣ 21668 /usr/bin/Xvnc :1 -desktop lunar:1 (root) -auth /root/.Xauthority -geometry 1024x768 -rfbwait 30000 -rfbauth /root/.vnc/passwd -rfbport 5901 -fp catalogue:/etc/X11/fontpat...

Oct 08 20:39:00 lunar systemd[1]: Starting Remote desktop service (VNC)...
Oct 08 20:39:00 lunar systemd[1]: Started Remote desktop service (VNC).
[root@lunar ~]# 

在Linux7中,VNC使用590x开始的端口号:
只配置IPv4的情况:

[root@lunar ~]# lsof -i:5901
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
Xvnc    21668 root    7u  IPv4  91713      0t0  TCP *:5901 (LISTEN)
[root@lunar ~]# 
[root@lunar ~]# netstat -lnt | grep 590*
tcp        0      0 0.0.0.0:5901            0.0.0.0:*               LISTEN     
[root@lunar ~]# 
[root@lunar ~]# ss -lntp|grep 590
LISTEN     0      5                         *:5901                     *:*      users:(("Xvnc",21668,7))
[root@lunar ~]# 

配置了IPv4和IPv6的情况:

[root@lunar1 ~]# lsof -i:5901
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
Xvnc    7224 root    8u  IPv4 157461      0t0  TCP *:5901 (LISTEN)
Xvnc    7224 root    9u  IPv6 157462      0t0  TCP *:5901 (LISTEN)
[root@lunar1 ~]# 
[root@lunar1 ~]# 
[root@lunar1 ~]# lsof -i:5901
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
Xvnc    7224 root    8u  IPv4 157461      0t0  TCP *:5901 (LISTEN)
Xvnc    7224 root    9u  IPv6 157462      0t0  TCP *:5901 (LISTEN)
[root@lunar1 ~]# netstat -lnt | grep 590*
tcp        0      0 0.0.0.0:5901            0.0.0.0:*               LISTEN     
tcp6       0      0 :::5901                 :::*                    LISTEN     
[root@lunar1 ~]# 
[root@lunar1 ~]# ss -lntp|grep 590
LISTEN     0      5            *:5901                     *:*                   users:(("Xvnc",pid=7224,fd=8))
LISTEN     0      5           :::5901                    :::*                   users:(("Xvnc",pid=7224,fd=9))
[root@lunar1 ~]# 
[root@lunar1 ~]# lsof -i:5901
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
Xvnc    6232 root    8u  IPv4 154443      0t0  TCP *:5901 (LISTEN)
Xvnc    6232 root    9u  IPv6 154444      0t0  TCP *:5901 (LISTEN)
[root@lunar1 ~]# systemctl stop vncserver@:1.service
[root@lunar1 ~]# lsof -i:5901
[root@lunar1 ~]# systemctl status vncserver@:1.service
● vncserver@:1.service - Remote desktop service (VNC)
   Loaded: loaded (/usr/lib/systemd/system/vncserver@:1.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since 日 2016-01-17 00:49:12 CST; 25s ago
  Process: 7177 ExecStop=/bin/sh -c /usr/bin/vncserver -kill :1 > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)
  Process: 6232 ExecStart=/usr/sbin/runuser -l root -c /usr/bin/vncserver :1 (code=exited, status=0/SUCCESS)
  Process: 6197 ExecStartPre=/bin/sh -c /usr/bin/vncserver -kill :1 > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)
 Main PID: 6232 (code=exited, status=0/SUCCESS)

1月 17 00:47:54 lunar1 systemd[1]: Starting Remote desktop service (VNC)...
1月 17 00:47:54 lunar1 systemd[1]: Started Remote desktop service (VNC).
1月 17 00:49:12 lunar1 systemd[1]: Stopping Remote desktop service (VNC)...
1月 17 00:49:12 lunar1 systemd[1]: Stopped Remote desktop service (VNC).
[root@lunar1 ~]# systemctl start vncserver@:1.service
[root@lunar1 ~]# systemctl status vncserver@:1.service
● vncserver@:1.service - Remote desktop service (VNC)
   Loaded: loaded (/usr/lib/systemd/system/vncserver@:1.service; enabled; vendor preset: disabled)
   Active: active (running) since 日 2016-01-17 00:49:43 CST; 1s ago
  Process: 7177 ExecStop=/bin/sh -c /usr/bin/vncserver -kill :1 > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)
  Process: 7190 ExecStartPre=/bin/sh -c /usr/bin/vncserver -kill :1 > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)
 Main PID: 7193 (runuser)
   CGroup: /system.slice/system-vncserver.slice/vncserver@:1.service
           ‣ 7193 /usr/sbin/runuser -l root -c /usr/bin/vncserver :1

1月 17 00:49:43 lunar1 systemd[1]: Starting Remote desktop service (VNC)...
1月 17 00:49:43 lunar1 systemd[1]: Started Remote desktop service (VNC).
[root@lunar1 ~]# lsof -i:5901
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
Xvnc    7224 root    8u  IPv4 157461      0t0  TCP *:5901 (LISTEN)
Xvnc    7224 root    9u  IPv6 157462      0t0  TCP *:5901 (LISTEN)
[root@lunar1 ~]# 

设置vncserver的密码:vncpasswd root
最终root用户的vnc的配置文件:

[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
#Type=forking
Type=simple
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill :1 > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l root -c "/usr/bin/vncserver :1"
#PIDFile=/home/root/.vnc/%H:1.pid
PIDFile=/root/.vnc/%H:1.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill :1 > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target

然后使用同样的方法配置grid用户和oracle用户即可。
.
有时候vnc启动会报错,例如:

[root@lunar ~]# systemctl restart vncserver@:1.service
Job for vncserver@:1.service failed. See 'systemctl status vncserver@:1.service' and 'journalctl -xn' for details.
[root@lunar ~]# systemctl status vncserver@:1.service
vncserver@:1.service - Remote desktop service (VNC)
   Loaded: loaded (/usr/lib/systemd/system/vncserver@:1.service; enabled)
   Active: failed (Result: exit-code) since Thu 2015-10-08 20:18:22 CST; 17s ago
  Process: 19175 ExecStart=/sbin/runuser -l <USER> -c /usr/bin/vncserver %i (code=exited, status=1/FAILURE)
  Process: 19171 ExecStartPre=/bin/sh -c /usr/bin/vncserver -kill %i > /dev/null 2>&1 || : (code=exited, status=0/SUCCESS)

Oct 08 20:18:22 lunar systemd[1]: Starting Remote desktop service (VNC)...
Oct 08 20:18:22 lunar runuser[19175]: runuser: user <USER> does not exist
Oct 08 20:18:22 lunar systemd[1]: vncserver@:1.service: control process exited, code=exited status=1
Oct 08 20:18:22 lunar systemd[1]: Failed to start Remote desktop service (VNC).
Oct 08 20:18:22 lunar systemd[1]: Unit vncserver@:1.service entered failed state.
[root@lunar ~]# 

处理思路:
1,查看日志(/var/log/messages)或者使用Linux7中新引入的journalctl查看(参见前面的blog)
2,通常的问题有几类:
(1)配置文件有问题。
(2)/tmp/下的临时文件因为某种原因,系统没有自动清理
(3)ghome造成的一些问题,比如修改中文语言或者英文语言后,容易出现问题
.
解决的方法都很简单:
1,确保为正确配置
2,删除临时文件
3,根据配置文件的内容手工执行,并发现问题
其中第三个手工执行,我发现是杀手锏,几乎都可以解决。
如果上述都还是不行(目前为止,我还没遇到过),很简单,重装vnc,O(∩_∩)O哈哈~
下面是手工处理的例子,手工处理的方法是参考配置文件内容,我这里是以oracle用户的配置文件为例:

[root@lunar2 ~]# cat /usr/lib/systemd/system/vncserver@:3.service
# The vncserver service unit file
#
# Quick HowTo:
# 1. Copy this file to /etc/systemd/system/vncserver@.service
# 2. Edit oracle and vncserver parameters appropriately
#   ("runuser -l oracle -c /usr/bin/vncserver :3 -arg1 -arg2")
# 3. Run `systemctl daemon-reload`
# 4. Run `systemctl enable vncserver@:<display>.service`
#
# DO NOT RUN THIS SERVICE if your local area network is
# untrusted!  For a secure way of using VNC, you should
# limit connections to the local host and then tunnel from
# the machine you want to view VNC on (host A) to the machine
# whose VNC output you want to view (host B)
#
# [user@hostA ~]$ ssh -v -C -L 590N:localhost:590M hostB
#
# this will open a connection on port 590N of your hostA to hostB's port 590M
# (in fact, it ssh-connects to hostB and then connects to localhost (on hostB).
# See the ssh man page for details on port forwarding)
#
# You can then point a VNC client on hostA at vncdisplay N of localhost and with
# the help of ssh, you end up seeing what hostB makes available on port 590M
#
# Use "-nolisten tcp" to prevent X connections to your VNC server via TCP.
#
# Use "-localhost" to prevent remote VNC clients connecting except when
# doing so through a secure tunnel.  See the "-via" option in the
# `man vncviewer' manual page.


[Unit]
Description=Remote desktop service (VNC)
After=syslog.target network.target

[Service]
Type=simple
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill :3 > /dev/null 2>&1 || :'
ExecStart=/usr/sbin/runuser -l oracle -c "/usr/bin/vncserver :3"
PIDFile=/home/oracle/.vnc/%H:3.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill :3 > /dev/null 2>&1 || :'

[Install]
WantedBy=multi-user.target
[root@lunar2 ~]# 

按照上述手工kill:

[root@lunar2 ~]# /bin/sh -c '/usr/bin/vncserver -kill :3'

Can't find file /root/.vnc/lunar2.oracle.com:3.pid
You'll have to kill the Xvnc process manually
[root@lunar2 ~]#

检查临时文件:

[root@lunar2 ~]# ll /tmp/.X11-unix 
/tmp/.X11-unix:
total 0
srwxrwxrwx 1 root   root     0 Jan 23 21:33 X1
srwxrwxrwx 1 grid   oinstall 0 Jan 23 21:33 X2
srwxrwxrwx 1 oracle oinstall 0 Jan 23 21:33 X3
[root@lunar2 ~]#

其中X3就是oracle用户的临时文件,删除该临时文件:

[root@lunar2 ~]# rm -rf /tmp/.X11-unix/X3 
[root@lunar2 ~]#

重新启动试试看:

[root@lunar2 ~]# /usr/sbin/runuser -l oracle -c "/usr/bin/vncserver :3"

Warning: lunar2.oracle.com:3 is taken because of /tmp/.X3-lock
Remove this file if there is no X server lunar2.oracle.com:3
A VNC server is already running as :3
[root@lunar2 ~]# 

提示还有一个文件没有清理,继续手工删除:

[root@lunar2 ~]# rm -rf  /tmp/.X3-lock
[root@lunar2 ~]# 

再次手工启动:

[root@lunar2 ~]# /usr/sbin/runuser -l oracle -c "/usr/bin/vncserver :3"

New 'lunar2.oracle.com:3 (oracle)' desktop is lunar2.oracle.com:3

Starting applications specified in /home/oracle/.vnc/xstartup
Log file is /home/oracle/.vnc/lunar2.oracle.com:3.log

[root@lunar2 ~]# 

我们看到已经启动了,当然,这时候使用systemctl有时候可能发现还是Active: failed状态
但是不要紧,你使用vnc登录试试就知道了,我感觉这是bug,或者哪里没有将状态同步到systemctl中。
也或许是ghome本身容易有问题。
.
总结:Linux7 vnc常用管理命令

systemctl daemon-reload
systemctl enable vncserver@:1.service
systemctl status vncserver@:1.service
systemctl start vncserver@:1.service
systemctl status vncserver@:1.service
systemctl stop  vncserver@:1.service
systemctl status vncserver@:1.service
systemctl start vncserver@:1.service
systemctl status vncserver@:1.service

设置vncserver的密码:

vncpasswd root
vncpasswd grid
vncpasswd oracle

Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–1-简介
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-2-修改主机名和hostnamectl工具的使用
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–3-systemd(d.bin和ohasd守护进程)
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–4-target(图形界面和字符界面)
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–5-防火墙
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–6-开机自动启动或者禁用服务
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之添加网
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之修改IP地址
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之修改网络接口名
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-8-在Linux7上安装11.2 RAC和12.1 RAC需要禁用哪些服务
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-9-Linux 7.2上的virbr0设备
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-10-ABRT-系统启动后自动检查和报告错误
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-11-配置VNC和常见问题处理

发表在 Installation and Deinstall, Linux | 标签为 , , , , | 留下评论

Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-10-ABRT-系统启动后自动检查和报告错误

启动vm时,Linux7会自动检查和报告错误:

Last login: Sat Jan 16 23:09:25 2016 from 192.168.56.1
ABRT has detected 5 problem(s). For more info run: abrt-cli list --since 1452956965
[root@lunar1 ~]# 

Abrt从Linux 6中引入的一个自动缺陷报告工具。
我们知道,Linux6中开始对新硬件热插拔特性的支持进一步增强(估计也是移动互联网和新硬件层出不穷催生的需求)。
比如,Linux6包括高级数据完整性特性(DIF/DIX),可以通过AER的PCIe设备的增强错误检查,因此就引入了ABRT(自动缺陷报告工具)。
ABRT全程是 Automatic Bug Reporting Tool,它提供了确定和报告系统异常情况,包括内核故障和用户空间应用崩溃等。
.
abrt-cli就是这个工具的命令行接口,也可以使用图形界面的ABRT工具例如:gnome-abrt &。
详细信息,请参见Redhat的官方手册:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/sect-abrt-handling-problems.html

.
abrt-cli命令的help内容:

[root@lunar1 ~]# abrt-cli command --help 
用法:abrt-cli [--authenticate] [--version] COMMAND [DIR]...

    list, ls      列出 [in DIRs] 的问题
    remove, rm    删除问题目录 DIR
    report, e     分析并报告 DIR 中的问题数据
    info, i       输出有关 DIR 的信息
    status, st    显示最近的崩溃计数
    process, p    处理多个问题

详情请查看 'abrt-cli COMMAND --help'
[root@lunar1 ~]# 

现在,我们根据提示查看一下错误列表:

[root@lunar1 ~]# abrt-cli list --since 1452956965
id bcb60782560b94bcb05b57145242778992fcacd0
reason:         ibus-daemon killed by SIGABRT
time:           2016年01月17日 星期日 00时39分44秒
cmdline:        /usr/bin/ibus-daemon -r --xim
package:        ibus-1.5.3-13.el7
uid:            54322 (grid)
count:          1
Directory:      /var/spool/abrt/ccpp-2016-01-17-00:39:44-30522

id dc1e51ec4c3bae28ac787be1e32dcfc2ee627af7
reason:         gnome-shell killed by SIGTRAP
time:           2016年01月17日 星期日 00时39分44秒
cmdline:        /usr/bin/gnome-shell
package:        gnome-shell-3.14.4-37.el7
uid:            54322 (grid)
count:          1
Directory:      /var/spool/abrt/ccpp-2016-01-17-00:39:44-4330

id 8002c377b00f1031a59d4eeaa9491a48dcc9f95a
reason:         gnome-shell killed by SIGTRAP
time:           2016年01月17日 星期日 00时23分48秒
cmdline:        /usr/bin/gnome-shell
package:        gnome-shell-3.14.4-37.el7
uid:            54321 (oracle)
count:          2
Directory:      /var/spool/abrt/ccpp-2016-01-17-00:23:48-28148

id 4fd232514a4753e38c05ce5ca1d0ada8b4ef8ba0
reason:         gnome-shell killed by SIGTRAP
time:           2016年01月17日 星期日 00时38分55秒
cmdline:        /usr/bin/gnome-shell
package:        gnome-shell-3.14.4-37.el7
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/ccpp-2016-01-17-00:38:55-3844

id c7100bb912a43bebe3bdfa13859d473416faba59
reason:         ibus-daemon killed by SIGABRT
time:           2016年01月17日 星期日 00时38分51秒
cmdline:        /usr/bin/ibus-daemon -r --xim
package:        ibus-1.5.3-13.el7
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/ccpp-2016-01-17-00:38:51-26249

已禁用自动报告功能。请考虑启用该功能,方法是
作为有 root 特权的用户使用命令 ‘abrt-auto-reporting enabled’
[root@lunar1 ~]#
可见,主要是由于gnome引起的,因此推荐Linux使用图形界面的时候还是KDE吧,这个感觉还是比较稳定和靠谱的。


Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–1-简介
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-2-修改主机名和hostnamectl工具的使用
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–3-systemd(d.bin和ohasd守护进程)
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–4-target(图形界面和字符界面)
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–5-防火墙
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–6-开机自动启动或者禁用服务
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之添加网
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之修改IP地址
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之修改网络接口名
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-8-在Linux7上安装11.2 RAC和12.1 RAC需要禁用哪些服务
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-9-Linux 7.2上的virbr0设备
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-10-ABRT-系统启动后自动检查和报告错误
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-11-配置VNC和常见问题处理

发表在 Installation and Deinstall, Linux | 标签为 , , , , , | 留下评论

Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-9-Linux 7.2上的virbr0设备

当我们安装了Linux 7.2(CentOS 7.2和 OEL 7.2都有下面的现象),缺省会安装一个虚拟网卡virbr0。
查看当前的IP信息:

[root@lunar1 ~]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.56.11  netmask 255.255.255.0  broadcast 192.168.56.255
        inet6 fe80::a00:27ff:feee:cd8d  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:ee:cd:8d  txqueuelen 1000  (Ethernet)
        RX packets 291  bytes 23591 (23.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 212  bytes 77588 (75.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.60.11  netmask 255.255.255.0  broadcast 192.168.60.255
        inet6 fe80::a00:27ff:fe42:fb8  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:42:0f:b8  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 27  bytes 3989 (3.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 08:00:27:a8:83:cf  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 107  bytes 19698 (19.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 460  bytes 37308 (36.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 460  bytes 37308 (36.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:79:84:80  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[root@lunar1 ~]# 

这里我们看到Linux7开始使用enp0s3作为第一个缺省的网络接口名,类似于以前的eth0,当然,你后面可以修改这个缺省的网络接口名。
enp0s8是第二个网络接口名,而enp0s9就是我们新添加的第三个网络接口名。
enp0s3和enp0s8我们在安装时已经配置了网络连接和IP地址,设置了启动自动连接,因此没有问题。
enp0s9需要我们手工配置。
.
可以看出来,Linux7中缺省还有一个virbr0网络接口。
.
从网上摘录了virbr0的部分相关解释:
virbr0 是一种虚拟网络接口,这是由于安装和启用了 libvirt 服务后生成的
libvirt 在服务器(host)上生成一个 virtual network switch (virbr0),host 上所有的虚拟机(guests)通过这个 virbr0 连起来。
默认情况下 virbr0 使用的是 NAT 模式(采用 IP Masquerade),所以这种情况下 guest 通过 host 才能访问外部。
.
可以看出来,virbr0是一个虚拟网卡,并且由于在Linux7.2中(CentOS 7.2和OEL 7.2中都有)缺省安装和启用了 libvirt 服务后生成的:
也就是说,virbr0是虚拟化相关的内容,我们不需要,可以删除。首先我们检查一下当前的libvirt服务:

[root@lunar1 ~]# ps -ef|grep libvirt
root      1380     1  0 06:39 ?        00:00:00 /usr/sbin/libvirtd
nobody    2593     1  0 06:39 ?        00:00:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      2594  2593  0 06:39 ?        00:00:00 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
root      3585  2629  0 06:57 pts/0    00:00:00 grep --color=auto libvirt
[root@lunar1 ~]# 

libvirt 在服务器(host)上生成一个 virtual network switch (virbr0),host 上所有的虚拟机(guests)通过这个 virbr0 连起来。
默认情况下 virbr0 使用的是 NAT 模式(采用 IP Masquerade),所以这种情况下 guest 通过 host 才能访问外部。
这里我们不需要,因此可以手工卸载:

[root@lunar1 ~]# virsh net-list 
 名称               状态     自动开始  持久
----------------------------------------------------------
 default              活动     是           是

–删除virbr0设备:

[root@lunar1 ~]# virsh net-destroy default 
网络 default 被删除
[root@lunar1 ~]# 

–取消virbr0定义

[root@lunar1 ~]# virsh net-undefine default 
网络 default 已经被取消定义
[root@lunar1 ~]# 

–重启 libvirtd 服务,让上述设置生效

[root@lunar1 ~]#  service libvirtd restart 
Redirecting to /bin/systemctl restart  libvirtd.service
[root@lunar1 ~]# 
[root@lunar1 ~]# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.56.11  netmask 255.255.255.0  broadcast 192.168.56.255
        inet6 fe80::a00:27ff:feee:cd8d  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:ee:cd:8d  txqueuelen 1000  (Ethernet)
        RX packets 371  bytes 29927 (29.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 260  bytes 85704 (83.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.60.11  netmask 255.255.255.0  broadcast 192.168.60.255
        inet6 fe80::a00:27ff:fe42:fb8  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:42:0f:b8  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 27  bytes 3989 (3.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 08:00:27:a8:83:cf  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 159  bytes 29034 (28.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 524  bytes 42492 (41.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 524  bytes 42492 (41.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@lunar1 ~]# 

这时看到,已经没有virbr0设备了。


Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–1-简介
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-2-修改主机名和hostnamectl工具的使用
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–3-systemd(d.bin和ohasd守护进程)
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–4-target(图形界面和字符界面)
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–5-防火墙
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列–6-开机自动启动或者禁用服务
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之添加网
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之修改IP地址
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-7-网络管理之修改网络接口名
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-8-在Linux7上安装11.2 RAC和12.1 RAC需要禁用哪些服务
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-9-Linux 7.2上的virbr0设备
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-10-ABRT-系统启动后自动检查和报告错误
Linux7(CentOS,RHEL,OEL)和Oracle RAC环境系列-11-配置VNC和常见问题处理

发表在 Installation and Deinstall, Linux, ORACLE 12C, RAC | 标签为 , , , , , , | 评论关闭