ORA-00443 background process MMNL did not start

联系:QQ(5163721)

标题:ORA-00443 background process MMNL did not start

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

启动asm报ORA-00443

[root@lunar ~]# srvctl start asm
PRCR-1079 : Failed to start resource ora.asm
CRS-5017: The resource action "ora.asm start" encountered the following error: 
ORA-00443: background process "MMNL" did not start
. For details refer to "(:CLSN00107:)" in "/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log".

CRS-2674: Start of 'ora.asm' on 'lunar' failed

先看下ORA-00443的含义:

[grid@lunar ~]$ oerr ora 00443
00443, 00000, "background process \"%s\" did not start"
// *Cause:  The specified process did not start.
// *Action: Ensure that the executable image is in the correct place with
//          the correct protections, and that there is enough memory.
[grid@lunar ~]$

感觉是内存不足…………

再看下/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log:

2013-08-24 15:54:54.330: [ora.asm][1092012352] {0:0:2} [start] clsnUtils::error Exception type=2 string=
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log".

2013-08-24 15:54:54.330: [    AGFW][1092012352] {0:0:2} sending status msg [CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log".
] for start for resource: ora.asm lunar 1


2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} InstConnection::connectInt (2) Exception OCIException
2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} InstConnection:connect:excp OCIException OCI error 1034
2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} AsmCommonAgent DedicatedThread Exception OCIException
2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0

我们发现,ohasd进程尝试多次重启ASM都是报上述错误,貌似什么原因造成ASM起不来。。。。

接着检查下ASM的日志发现如下信息:

Starting up:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option.
ORACLE_HOME = /u01/app/12.1/grid
System name:    Linux
Node name:      lunar
Release:        2.6.32-300.10.1.el5uek
Version:        #1 SMP Wed Feb 22 17:37:40 EST 2012
Machine:        x86_64
Using parameter settings in server-side spfile +DATA/ASM/ASMPARAMETERFILE/registry.253.818242245
System parameters with non-default values:
  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
IMODE=BR
ILAT =0
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NOTE: remote asm mode is local (mode 0x301; from cluster type)
Starting up:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option.
ORACLE_HOME = /u01/app/12.1/grid
System name:    Linux
Node name:      lunar
Release:        2.6.32-300.10.1.el5uek
Version:        #1 SMP Wed Feb 22 17:37:40 EST 2012
Machine:        x86_64
Using parameter settings in server-side spfile +DATA/ASM/ASMPARAMETERFILE/registry.253.818242245
System parameters with non-default values:
  large_pool_size          = 12M
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskgroups           = "RECO"
  asm_power_limit          = 1
NOTE: remote asm mode is local (mode 0x301; from cluster type)
Sat Aug 24 15:54:18 2013
NOTE: PatchLevel of this instance 0
Starting background process PMON
Sat Aug 24 15:54:19 2013
PMON started with pid=2, OS id=3521
Starting background process PSP0
Sat Aug 24 15:54:19 2013
PSP0 started with pid=3, OS id=3525
Starting background process VKTM
Sat Aug 24 15:54:20 2013
VKTM started with pid=4, OS id=3529 at elevated priority
Starting background process GEN0
Sat Aug 24 15:54:20 2013
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Sat Aug 24 15:54:20 2013
GEN0 started with pid=5, OS id=3535
Starting background process MMAN
Sat Aug 24 15:54:20 2013
MMAN started with pid=6, OS id=3539
Starting background process DIAG
Sat Aug 24 15:54:21 2013
DIAG started with pid=8, OS id=3547
Starting background process DIA0
Sat Aug 24 15:54:21 2013
DIA0 started with pid=9, OS id=3551
Starting background process DBW0
Sat Aug 24 15:54:21 2013
DBW0 started with pid=10, OS id=3555
Starting background process LGWR
Sat Aug 24 15:54:21 2013
LGWR started with pid=11, OS id=3559
Starting background process CKPT
Sat Aug 24 15:54:21 2013
CKPT started with pid=12, OS id=3563
Starting background process SMON
Sat Aug 24 15:54:21 2013
SMON started with pid=13, OS id=3567
Starting background process LREG
Sat Aug 24 15:54:21 2013
LREG started with pid=14, OS id=3571
Starting background process RBAL
Sat Aug 24 15:54:21 2013
RBAL started with pid=15, OS id=3575
Starting background process GMON
Sat Aug 24 15:54:22 2013
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F6AC008] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1]
Sat Aug 24 15:54:22 2013
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F67A010] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1]
Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_psp0_3525.trc  (incident=27225):
ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F67A010] [PC:0xA6B0D9C] [Non-existent physical address] []
Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_3579.trc  (incident=28801):
ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F6AC008] [PC:0xA6B0D9C] [Non-existent physical address] []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM/incident/incdir_27225/+ASM_psp0_3525_i27225.trc
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM/incident/incdir_28801/+ASM_gmon_3579_i28801.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sat Aug 24 15:54:24 2013
Using default pga_aggregate_limit of 2048 MB
Sat Aug 24 15:54:26 2013
Dumping diagnostic data in directory=[cdmp_20130824155426], requested by (instance=1, osid=3525 (PSP0)), summary=[incident=27225].
Process GMON died, see its trace file
Sat Aug 24 15:54:27 2013
USER (ospid: 3474): terminating the instance due to error 443
Sat Aug 24 15:54:28 2013
Instance terminated by USER, pid = 3474

这里可以发现,实际上ASM的进程已经启动了pmon, smon,ckpt,dbwr…等重要进程,但是后来被GMON进程终止了。
这里简单说下,GMON和PSP0进程都是ORACLE 10.2 ASM中就有的进程,其中:
GMON(ASM Disk Group Monitor Process)是10.2 asm引入的一个新的进程, 该进程ASM instace启动以后监控diskgroup的元数据信息,并跟ocssd进程进行交互,
GMON负责将ASM实例的Diskgroup信息发送给ocssd,这样,其他数据库实例通过跟ocssd交互并获得ASM磁盘组的信息,再之后,数据库实例就可以打开磁盘组,对其进读写的操作。
文档中是这样描述的:

GMON
	ASM Disk Group Monitor Process
	Monitors all mounted ASM disk groups
	GMON monitors all the disk groups mounted in an ASM instance and is responsible for maintaining consistent disk membership and status information. 
Membership changes result from adding and dropping disks, whereas disk status changes result from taking disks offline or bringing them online.

而PSP0进程的主要作用是创建新的进程,文档描述如下:

PSP0
	Process Spawner Process
	Spawns Oracle background processes after initial instance startup

我们来分析一下这两个进程的trace。

检查GMON的trace,发现确实系统当时空闲内存很少:

========= Dump for incident 27225 (ORA 7445 [dbgtTrcData_int]) ========
Dump continued from file: /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_3579.trc
ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F6AC008] [PC:0xA6B0D9C] [Non-existent physical address] []

========= Dump for incident 28801 (ORA 7445 [dbgtTrcData_int]) ========
----- Beginning of Customized Incident Dump(s) -----
Dumping swap information
Memory (Avail / Total) = 75.04M / 1164.46M
Swap (Avail / Total) = 3999.99M /  3999.99M
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F6AC008] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1]
Registers:
%rax: 0x0000000000000001 %rbx: 0x000000009f6abfd0 %rcx: 0xffffffff0000ffff
%rdx: 0x0000000000000000 %rdi: 0x000000009f6aff80 %rsi: 0x0000000000000000
%rsp: 0x00007fff8433f580 %rbp: 0x00007fff8433f790  %r8: 0x000000009f6abfd0
 %r9: 0x000000009f6abfd8 %r10: 0x0000000000010000 %r11: 0x000000000000000b

检查PSP0的trace,发现确实系统当时空闲内存很少:

Dump continued from file: /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_psp0_3525.trc
ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F67A010] [PC:0xA6B0D9C] [Non-existent physical address] []

========= Dump for incident 27225 (ORA 7445 [dbgtTrcData_int]) ========
----- Beginning of Customized Incident Dump(s) -----
Dumping swap information
Memory (Avail / Total) = 75.04M / 1164.46M
Swap (Avail / Total) = 3999.99M /  3999.99M
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F67A010] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1]
Registers:
%rax: 0x0000000000000001 %rbx: 0x000000009f679fd8 %rcx: 0xffffffff0000ffff
%rdx: 0x000000000000004b %rdi: 0x000000009f67bf80 %rsi: 0x0000000000000000
%rsp: 0x00007fff83b440e0 %rbp: 0x00007fff83b442f0  %r8: 0x000000009f679fd8
 %r9: 0x000000009f679fe0 %r10: 0x0000000000010000 %r11: 0x0000000000000008

已经差不多定位了,系统内存不足,因此,关闭VM,增加VM的内存,然后重启,一切ok了
现在看下asm中sga的参数配置:

SQL> show parameter target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
memory_max_target                    big integer 1076M					居然是1G
memory_target                        big integer 1076M
pga_aggregate_target                 big integer 0
sga_target                           big integer 0
SQL> show parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
sga_max_size                         big integer 1088M
sga_target                           big integer 0
unified_audit_sga_queue_size         integer     1048576
SQL> 

查询了Oracle 10.2中,Oracle Database Administrator’s Guide “Using Automatic Storage Management”这一章节对于ASM的实例有如下的简单说明:

ASM Instance Memory Requirements
ASM instances are smaller than database instances. 
A 64 MB SGA should be sufficient for all but the largest ASM installations.
Total memory footprint for a typical ASM instance is approximately 100 MB.

也就是说,在10.2的环境中,该ASM实例设置为100M足够了。

从11.2以后,ASM单独有一个doc来讲: Oracle Automatic Storage Management Administrator’s Guide:
Automatic memory management automatically manages the memory-related parameters for both Oracle ASM and database instances with the MEMORY_TARGET parameter.
Automatic memory management is enabled by default on an Oracle ASM instance, even when the MEMORY_TARGET parameter is not explicitly set.
The default value used for MEMORY_TARGET is acceptable for most environments.
This is the only parameter that you must set for complete Oracle ASM memory management.
Oracle strongly recommends that you use automatic memory management for Oracle ASM. ————》Oracle强烈推荐使用AMM方式管理ASM实例

If you do not set a value for MEMORY_TARGET, but you do set values for other memory related parameters, Oracle internally calculates the optimum value for MEMORY_TARGET based on those memory parameter values.
You can also increase MEMORY_TARGET dynamically, up to the value of the MEMORY_MAX_TARGET parameter, just as you can do for the database instance.

Although it is not recommended, you can disable automatic memory management by either setting the value for MEMORY_TARGET to 0 in the Oracle ASM parameter file or by running an ALTER SYSTEM SET MEMORY_TARGET=0 statement. When you disable automatic memory management, Oracle reverts to auto shared memory management and automatic PGA memory management. To revert to Oracle Database 10g release 2 (10.2) functionality to manually manage Oracle ASM SGA memory, also run the ALTER SYSTEM SET SGA_TARGET=0 statement. You can then manually manage Oracle ASM memory using the information in “Oracle ASM Parameter Setting Recommendations”, that discusses Oracle ASM memory-based parameter settings. Unless specified, the behaviors of the automatic memory management parameters in Oracle ASM instances behave the same as in Oracle Database instances.

Notes:
For a Linux environment, automatic memory management cannot work if /dev/shm is not available or is undersized.
For more information, see Oracle Database Administrator’s Reference for Linux and UNIX-Based Operating Systems.
For information about platforms that support automatic memory management, see Oracle Database Administrator’s Guide.

The minimum MEMORY_TARGET for Oracle ASM is 256 MB. If you set MEMORY_TARGET to 100 MB, then Oracle increases the value for MEMORY_TARGET to 256 MB automatically.

也就是说,从11.2开始,Oracle强烈推荐使用AMM方式管理ASM实例,并且最小值是256M。

我们看下12c(12.1)的文档中除了11.2的上述描述外,增加了一个内容,无他:
In an Oracle Exadata environment, the recommended settings for managing memory are SGA_TARGET = 1250MB, PGA_AGGREGATE_TARGET = 400MB, MEMORY_TARGET = 0, and MEMORY_MAX_TARGET = 0.

在Exadata的ASM环境,缺省配置是这样的:

		在Exadata Version	11.2.2.2.0:
		---------------------------------------
		memory_max_target = 1073741824 
		memory_target = 1073741824
		pga_aggregate_target = 104857600 
		sga_max_size = 943718400 
		sga_target = 943718400
		shared_pool_size = 0 
		sort_area_size = 65536
		large_pool_size = 12582912

		从Exadata Version	11.2.3.1.1到11.2.3.2.1:
		---------------------------------------
		memory_max_target = 0 
		memory_target = 0 
		pga_aggregate_target = 419430400 
		sga_max_size = 1325400064 
		sga_target = 1325400064
		sort_area_size = 65536 
		large_pool_size = 16777216

居然用 sort_area_size ,不知道这里面有什么玄机………………

此条目发表在 ORA-XXXXX 分类目录,贴了 , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注