10.2.0.1 RAC中重建CRS的过程(比如错误的修改IP导致CRS不能启动)

联系:QQ(5163721)

标题:10.2.0.1 RAC中重建CRS的过程(比如错误的修改IP导致CRS不能启动)

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

VIP是oracle 10.1引入的一个绑定在public IP上的静态IP,它的信息岁PUBLIC网卡的信息一起保存在OCR(Oracle Cluster Registry)中。
很多时候客户有一个需求,比如在测试环境是一套IP,迁移到生产后是另一套IP。
这时如果客户不想重装系统,那么就需要修改RAC的IP。
.
通常这样的操作本身都是小操作,不需要大动干戈,但前提是一定按照官方流程操作,比如修改IP时,CRS是开启的还是关闭的?
再或者一类常见错误是,修改IP时,需要的填写的是子网信息而不是IP地址,这也是一类常见的因为修改IP造成CRS不能启动的。
类似这些小地方不注意,就会造成CRS不能启动。
不论是10.2、11.1,还是11.2,这些东西基本的核心内容其实是变化不大的,不同的是11.2更加完善和提供了更多可选的修复方法(报错非官方支持的)。
.
总结一下大概的东西:
1,PUBLIC网络对应的主机名不能修改,如果要修改,官方说法是一定要重装CRS。
这里我感觉是这样的:如果把主机上那几个用主机名名门的目录名称修改为新的主机名,然后执行重新配置CRS的操作,理论上可以行。
比如10g是rootdelete.sh rootdeinstall.sh root.sh
11.2以后是deconfig和reconfig
不过测试环境和自己玩的也就算了,生产环境,我还是感觉按照官方说法靠谱,否则日后出现各种怪癖问题,那就是给自己挖坑了……
.
2,对于PUBLIC网络,不修改主机名,只修改PUBLIC IP,这个不需要修改CRS的配置,因为不论是10g, 11.1 ,还是11.2,CRS中都没有就具体IP
3,对于修改PUBLIC的网络接口名(interface)、子网(subnet)和掩码信息(netmask)等信息,需要使用oifcfg进行修改。
这个以前有客户问过,需要多久?
答:理论上10分钟以内。如果需要验证和回退措施等检验,那么申请30分钟到1小时的停业务时间吧。
.
4,如果修改私有网络,在10.2和11.1,因为这些信息保存在CRS中,因此需要相应的修改CRS信息(使用oifcfg)
从11.2开始,CRS(Cluster Ready Service)升级为GI(Grid Infrastructure)。私有IP对应的私有主机名不再存储在OCR中了。
而是通过GNPN架构来管理(即 网络即插即用,后面会讲。OCR和私有IP对应的主机名也不再有依赖关系。
因此,可以随便修改而不需要在CRS层面改动任何东西。
.
5,如果修改错了,没关系,10.2和11.1的杀手锏是 rootdelete.sh rootdeinstall.sh root.sh
而11.2除了重新配置集群使用的deconfig和reconfig,还增加了使用gpnp profile来修复的功能。
同时gpnptool不知能修复这类问题,还能修复ASM spfile和CRS profile存放位置的信息异常等问题。
(比如ASM中查找磁盘的位置写错了,那么GI是无法正常启动的)
.
具体的,先从10.2中重建CRS的构成如下:
该过程可以用于修复,2类问题:
1,因为修改IP(PUBLIC IP,VIP,PRIVATE IP)等问题,造成CRS不能启动。
2,或者其他跟CRS配置相关的问题导致的CRS不能启动的问题。
.
具体操作如下:
1,停止节点1的CRS

[root@rh1 ~]# crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[root@rh1 ~]#

2,在节点1执行rootdelete.sh来删除CRS配置信息:

[root@rh1 ~]# cd $ORA_CRS_HOME/install
[root@rh1 install]# ls
cluster.ini         install.excl  paramfile.crs  rootaddnode.sbs   rootdeletenode.sh  rootlocaladd
cmdllroot.sh        install.incl  preupdate.sh   rootconfig        rootdelete.sh      rootupgrade
envVars.properties  make.log      readme.txt     rootdeinstall.sh  rootinstall        templocal
[root@rh1 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[root@rh1 install]# 

检查现在网卡的信息:

[root@rh1 install]#ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:8A:1A:12
          inet addr:192.168.10.11  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8a:1a12/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1504 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1295 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:129195 (126.1 KiB)  TX bytes:118741 (115.9 KiB)
          Interrupt:10 Base address:0x1400

eth1      Link encap:Ethernet  HWaddr 00:0C:29:8A:1A:1C
          inet addr:192.168.20.11  Bcast:192.168.20.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8a:1a1c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:204 errors:0 dropped:0 overruns:0 frame:0
          TX packets:210 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:19882 (19.4 KiB)  TX bytes:19404 (18.9 KiB)
          Interrupt:9 Base address:0x1480

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:19827 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19827 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6788710 (6.4 MiB)  TX bytes:6788710 (6.4 MiB)

[root@rh1 install]#

3,在节点2停止CRS

[root@rh2 ~]# crsctl stop crs
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
[root@rh2 ~]# 

4,在节点2执行rootdelete.sh来删除CRS配置信息:

[root@rh2 ~]# cd $ORA_CRS_HOME/install
[root@rh2 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[root@rh2 install]#

5,在节点1执行rootdeinstall.sh,清理OCR设备

[root@rh1 install]# ./rootdeinstall.sh

Removing contents from OCR device
2560+0 records in
2560+0 records out
[root@rh1 install]#

[root@rh1 install]# ps -e | grep -i 'ocs[s]d'
[root@rh1 install]# ps -e | grep -i 'cr[s]d.bin'
[root@rh1 install]# ps -e | grep -i 'ev[m]d.bin'
[root@rh1 install]# ps -ef|grep crs
root      2309 32489  0 13:32 pts/1    00:00:00 grep crs
[root@rh1 install]# ps -ef|grep d.bin
root      2311 32489  0 13:32 pts/1    00:00:00 grep d.bin
[root@rh1 install]#

6,在节点1执行root.sh来完成节点1的CRS的重新配置

[root@rh1 crs]# ./root.sh
WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
assigning default hostname rh1 for node 1.
assigning default hostname rh2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: rh1 int1 rh1
node 2: rh2 int2 rh2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw1
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        rh1
CSS is inactive on these nodes.
        rh2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
[root@rh1 crs]#

7,在节点2执行root.sh来完成节点1的CRS的重新配置

[root@rh2 crs]# ./root.sh
WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname rh1 for node 1.
assigning default hostname rh2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: rh1 int1 rh1
node 2: rh2 int2 rh2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
        rh1
        rh2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), "eth0" is not public. Public interfaces should be used to configure virtual IPs.
[root@rh2 crs]# 

确认两个节点的CRS进程都已经正常启动:

[root@rh1 crs]# ps -ef|grep d.bin
root     29156     1  0 12:17 ?        00:00:00 /u01/app/oracle/product/ora10g/crs/bin/crsd.bin reboot
oracle   29418 29154  0 12:18 ?        00:00:00 /u01/app/oracle/product/ora10g/crs/bin/evmd.bin
oracle   29585 29555  0 12:18 ?        00:00:00 /u01/app/oracle/product/ora10g/crs/bin/ocssd.bin

[root@rh1 crs]#

[root@rh2 crs]# ps -ef|grep d.bin
root     19689     1  0 11:57 ?        00:00:00 /u01/app/oracle/product/ora10g/crs/bin/crsd.bin reboot
oracle   19961 19687  0 11:58 ?        00:00:00 /u01/app/oracle/product/ora10g/crs/bin/evmd.bin
oracle   20096 20070  0 11:58 ?        00:00:00 /u01/app/oracle/product/ora10g/crs/bin/ocssd.bin
root     21283  8784  0 11:59 pts/1    00:00:00 grep d.bin
[root@rh2 crs]# 

检查现在的网卡信息,现在vip已经分别绑定在每个节点的public网卡上了:

[root@rh1 crs]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:3F:E6:E7
          inet addr:192.168.10.11  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe3f:e6e7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1305 errors:0 dropped:0 overruns:0 frame:0
          TX packets:731 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:202565 (197.8 KiB)  TX bytes:184325 (180.0 KiB)
          Interrupt:9 Base address:0x1400

eth1      Link encap:Ethernet  HWaddr 00:0C:29:3F:E6:F1
          inet addr:192.168.20.11  Bcast:192.168.20.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe3f:e6f1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5727 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8359 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4252719 (4.0 MiB)  TX bytes:7524822 (7.1 MiB)
          Interrupt:10 Base address:0x1480

eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:3F:E6:F1
          inet addr:192.168.10.21  Bcast:192.168.10.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:10 Base address:0x1480

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:85980 errors:0 dropped:0 overruns:0 frame:0
          TX packets:85980 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6918640 (6.5 MiB)  TX bytes:6918640 (6.5 MiB)

[root@rh1 crs]# 

然后再节点1(rh1)运行VIPCA来配置nodeapps

[root@rh1 crs]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.rh1.gsd    application    ONLINE    ONLINE    rh1
ora.rh1.ons    application    ONLINE    ONLINE    rh1
ora.rh1.vip    application    ONLINE    ONLINE    rh1
ora.rh2.gsd    application    ONLINE    ONLINE    rh2
ora.rh2.ons    application    ONLINE    ONLINE    rh2
ora.rh2.vip    application    ONLINE    ONLINE    rh2
[root@rh1 crs]#

[root@rh2 crs]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.rh1.gsd    application    ONLINE    ONLINE    rh1
ora.rh1.ons    application    ONLINE    ONLINE    rh1
ora.rh1.vip    application    ONLINE    ONLINE    rh1
ora.rh2.gsd    application    ONLINE    ONLINE    rh2
ora.rh2.ons    application    ONLINE    ONLINE    rh2
ora.rh2.vip    application    ONLINE    ONLINE    rh2
[root@rh2 crs]#

在节点1添加相应的ASM服务和ASM服务:

[oracle@rh1 ~]$ srvctl add database -d rac -o /u01/app/oracle/product/ora10g/db
[oracle@rh1 ~]$ srvctl add instance -d rac -i rac1 -n rh1
[oracle@rh1 ~]$ srvctl add instance -d rac -i rac2 -n rh2
[oracle@rh1 ~]$ srvctl add asm -n rh1 -i +ASM1 -o $ORACLE_HOME
[oracle@rh1 ~]$ srvctl add asm -n rh2 -i +ASM2 -o $ORACLE_HOME
[oracle@rh1 ~]$

启动ASM和数据库
[oracle@rh1 ~]$ srvctl start asm -n rh1
[oracle@rh1 ~]$ srvctl start asm -n rh2
[oracle@rh1 ~]$ srvctl start database -d rac
[oracle@rh1 ~]$

一切ok了:

[oracle@rh1 ~]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.rac.db     application    ONLINE    ONLINE    rh1
ora....c1.inst application    ONLINE    ONLINE    rh1
ora....c2.inst application    ONLINE    ONLINE    rh2
ora....SM1.asm application    ONLINE    ONLINE    rh1
ora.rh1.gsd    application    ONLINE    ONLINE    rh1
ora.rh1.ons    application    ONLINE    ONLINE    rh1
ora.rh1.vip    application    ONLINE    ONLINE    rh1
ora....SM2.asm application    ONLINE    ONLINE    rh2
ora.rh2.gsd    application    ONLINE    ONLINE    rh2
ora.rh2.ons    application    ONLINE    ONLINE    rh2
ora.rh2.vip    application    ONLINE    ONLINE    rh2
[oracle@rh1 ~]$

使用netca重建LISTENER:

[oracle@rh1 admin]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.rac.db     application    ONLINE    ONLINE    rh1         
ora....c1.inst application    ONLINE    ONLINE    rh1         
ora....c2.inst application    ONLINE    ONLINE    rh2         
ora....SM1.asm application    ONLINE    ONLINE    rh1         
ora....H1.lsnr application    ONLINE    ONLINE    rh1         
ora.rh1.gsd    application    ONLINE    ONLINE    rh1         
ora.rh1.ons    application    ONLINE    ONLINE    rh1         
ora.rh1.vip    application    ONLINE    ONLINE    rh1         
ora....SM2.asm application    ONLINE    ONLINE    rh2         
ora....H2.lsnr application    ONLINE    ONLINE    rh2         
ora.rh2.gsd    application    ONLINE    ONLINE    rh2         
ora.rh2.ons    application    ONLINE    ONLINE    rh2         
ora.rh2.vip    application    ONLINE    ONLINE    rh2         
[oracle@rh1 admin]$ 
此条目发表在 RAC 分类目录,贴了 , , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注