联系:QQ(5163721)
标题:11.2 RAC 修改了目录权限(u01)后crs不能启动的解决方法–使用rootcrs.pl -init修复
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
还原节点损坏的场景:
[root@lunardb01 grid]# chown -R oracle:oinstall /u01 [root@lunardb01 grid]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@lunardb01 grid]# ps -ef|grep d.bin root 27170 1 6 19:27 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 27400 1 3 19:27 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 27609 19818 0 19:27 pts/1 00:00:00 grep d.bin [root@lunardb01 grid]# ps -ef|grep d.bin root 27170 1 5 19:27 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 27400 1 2 19:27 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 27621 19818 0 19:27 pts/1 00:00:00 grep d.bin [root@lunardb01 grid]# ps -ef|grep d.bin root 27170 1 1 19:27 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 27400 1 0 19:27 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 28150 19818 0 19:28 pts/1 00:00:00 grep d.bin [root@lunardb01 grid]#
可以看到,此时crs起不来了,后台报错:
-----ohasd的报错:
2014-10-04 19:27:27.643: [ CRSPE][1148361024] {0:0:2} RI [ora.mdnsd 1 1] new internal state: [STARTING] old value: [STABLE]
2014-10-04 19:27:27.643: [ CRSPE][1148361024] {0:0:2} Sending message to agfw: id = 223
2014-10-04 19:27:27.644: [ CRSPE][1148361024] {0:0:2} CRS-2672: Attempting to start 'ora.mdnsd' on 'lunardb01'
2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} Agfw Proxy Server received the message: RESOURCE_START[ora.mdnsd 1 1] ID 4098:223
2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} Creating the resource: ora.mdnsd 1 1
2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} Initializing the resource ora.mdnsd 1 1 for type ora.mdns.type
2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} SR: acl = owner:grid:rw-,pgrp:oinstall:rw-,other::r--,user:grid:rwx
2014-10-04 19:27:27.645: [ CRSPE][1148361024] {0:0:2} ICE has queued an operation. Details: Operation [START of [ora.gpnpd 1 1] on [lunardb01] : local=0, unplanned=00x2aaab00c68f0] cannot run cause it needs W lock for: WO for Placement Path RI:[ora.mdnsd 1 1] server [lunardb01] target states [ONLINE ], locked by op [START of [ora.mdnsd 1 1] on [lunardb01] : local=0, unplanned=00x2aaab00b72e0]. Owner: CRS-2683: It is locked by 'SYSTEM' for command 'Resource Autostart : lunardb01'
—crsd的报错:
2014-10-04 19:26:23.937: [ CRSCOMM][1158867264][FFAIL] Ipc: Couldnt clscreceive message, no message: 11
2014-10-04 19:26:23.938: [ CRSCOMM][1158867264] Ipc: Client disconnected.
2014-10-04 19:26:23.938: [ CRSCOMM][1158867264][FFAIL] IpcL: Listener got clsc error 11 for memNum. 1
2014-10-04 19:26:23.938: [ CRSCOMM][1158867264] IpcL: connection to member 1 has been removed
2014-10-04 19:26:23.938: [CLSFRAME][1158867264] Removing IPC Member:{Relative|Node:0|Process:1|Type:3}
2014-10-04 19:26:23.938: [CLSFRAME][1158867264] Disconnected from AGENT process: {Relative|Node:0|Process:1|Type:3}
2014-10-04 19:26:23.938: [ AGFW][1165171008] {1:33686:190} Agfw Proxy Server received process disconnected notification, count=1
2014-10-04 19:26:23.939: [ AGFW][1165171008] {1:33686:190} /u01/app/11.2.0/grid/bin/oraagent_grid disconnected.
2014-10-04 19:26:23.939: [ AGFW][1165171008] {1:33686:190} Agent /u01/app/11.2.0/grid/bin/oraagent_grid[5646] stopped!
2014-10-04 19:26:23.939: [ CRSCOMM][1165171008] {1:33686:190} IpcL: removeConnection: Member 1 does not exist.
–alert的报错:
2014-10-04 19:27:23.293
[ohasd(27170)]CRS-2112:The OLR service started on node lunardb01.
2014-10-04 19:27:23.314
[ohasd(27170)]CRS-1301:Oracle High Availability Service started on node lunardb01.
2014-10-04 19:27:23.314
[ohasd(27170)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2014-10-04 19:27:24.351
[/u01/app/11.2.0/grid/bin/orarootagent.bin(27307)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsload" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/orarootagent_root/orarootagent_root.log"
2014-10-04 19:27:27.171
[/u01/app/11.2.0/grid/bin/orarootagent.bin(27307)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2014-10-04 19:29:27.802
[/u01/app/11.2.0/grid/bin/oraagent.bin(27400)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/oraagent_grid//oraagent_grid.log.
2014-10-04 19:29:31.812
[ohasd(27170)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/ohasd/ohasd.log.
2014-10-04 19:31:34.907
[/u01/app/11.2.0/grid/bin/oraagent.bin(29240)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/oraagent_grid//oraagent_grid.log.
2014-10-04 19:31:38.918
[ohasd(27170)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/ohasd/ohasd.log.
2014-10-04 19:33:41.993
[/u01/app/11.2.0/grid/bin/oraagent.bin(30882)]CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/oraagent_grid//oraagent_grid.log.
2014-10-04 19:33:46.004
[ohasd(27170)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.gpnpd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/ohasd/ohasd.log.
可以看到,卡在ora.mdnsd服务不能启动:
[root@lunardb01 grid]# crsctl status res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE
ora.cssdmonitor
1 ONLINE OFFLINE
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 ONLINE OFFLINE
ora.drivers.acfs
1 ONLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE OFFLINE
ora.gpnpd
1 ONLINE OFFLINE
ora.mdnsd
1 ONLINE OFFLINE STARTING
[root@lunardb01 grid]#
使用rootcrs.pl的init选项尝试修复,结果是不行的:
[root@lunardb01 lunardb01]# $GRID_HOME/crs/install/rootcrs.pl -init Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params [root@lunardb01 lunardb01]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@lunardb01 lunardb01]# [root@lunardb01 ohasd]# ps -ef|grep d.bin root 12642 1 0 19:48 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 14804 1 0 19:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 15481 19818 0 19:52 pts/1 00:00:00 grep d.bin [root@lunardb01 ohasd]# ps -ef|grep d.bin root 12642 1 0 19:48 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 14804 1 0 19:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 15663 19818 0 19:52 pts/1 00:00:00 grep d.bin [root@lunardb01 ohasd]#
后台日志的报错信息,跟上面的是雷同的。
可见,使用rootcrs.pl -init修复目录权限,在chown -R /u01面前,作用不大。
