Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息

联系:QQ(5163721)

标题:Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息
Exadata的磁盘自动管理-2-Cell上各种磁盘相关的概念
Exadata的磁盘自动管理-3-磁盘自动管理的操作规则
从11.2.3.2.x开始,系统可以自动识别磁盘性能下降,并自动从当前配置中移除。
出现性能下降的磁盘会影响系统整体的性能表现,因为所有的工作负载是平均的分布到所有的磁盘的。
举个例子,如果一个磁盘有相对于其他的磁盘有30%的性能下降,那么整体的系统IO能力就会下降30%。


当检测到磁盘性能下降时,系统自动从当前配置将该盘移除。Exadata会执行一些列性能测试,如果这个磁盘的问题是临时问题,那么系统会自动将其放回原来的配置中。如果这个磁盘不能通过测试,那么就会被标记为性能不佳(pool performance),并自动开启一个ASR(Automatic Service Request)来请求更换磁盘。
这一特性对于硬盘和flash盘同样有效。


说到Exadata的IO能力和管理,必要要提到众所周知的CELLSRV。
CELLSRV是Exadata上存储服务器的主要组成部件,它是一个多线程服务器,主要针对简单的块请求和智能扫描请求(如投影和筛选的表扫描等)提供服务,另外,CELLSRV也与DBRM协同工作来计量各种数据库和客户组在发送IO时所用IO带宽。
CELLSRV收集与操作相关的大力统计信息。Oracle数据库和ASM进程使用LIBCELL和CELLSRV通信,LIBCELL使用iDB协议将IO请求转换为要发送给CELLSRV的消息。


CELL上跟IO相关的主要进程还有MS和RS。
MS(管理服务器),MS提供Exadata cell管理和配置的功能。它与命令行界面CELLCLI协同工作,每个CELL有CELLCLI进行单独管理。
CELLCLI是一个本地化的管理工具,也就是说他需要在某个特定CELL中管理该CELL。
不过,在Exadata上,可以用dcli实现远程地对多个CELL运行同一个CELLCLI命令来达到集成管理或者统一管理。dcli是一套集成的工具包,本质是通过SSH进行集成管理,可以单独配置(比如你自己有很多非Exadata的linux环境,可以根据需要配置dcli)。
另外,在CELL上,除了CELLSRV负责收集主要的统计信息外,MS也负责发送警报和收集其它统计信息。


RS(重启服务器),RS用于启动和关闭CELLSRV和MS服务,并监控这些服务进程的状态,在必要时负责重新启动cellsrv和ms。


前面说到CELLSRV会自动检测磁盘的性能,并自动更改配置。当CELLSRV检测到磁盘性能下降时,cell disk的状态就会更改为’normal – confinedOnline’,物理盘的状态更改为’warning – confinedOnline’。
这意味着磁盘已经进入了性能表现不佳的第一个阶段,这个阶段是个过渡阶段,磁盘状态不会停留在这个阶段很长时间。


通常磁盘的常见状态会有4种:
HEALTH_BAD_ONLINE
HEALTH_BAD_OFFLINE
HEALTH_GOOD
HEALTH_FAIL


这些检测和状态的变化,会有CELLSRV记录到alert中,例如:

Thu Dec 19 01:27:16 2013
CDHS: Mark cd health state change CD_08_dm01cel05  with newState HEALTH_BAD_ONLINE  pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD
Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0 inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0
Thu Dec 19 01:27:16 2013
global conf related state: numHDsConf: 3 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_GOOD to newState HEALTH_BAD_ONLINE
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_GOOD to newState HEALTH_BAD_ONLINE
ABSOLUTE SERVICE TIME VIOLATION DETECTED ON DISK /dev/sdi: CD name - CD_08_dm01cel05 AVERAGE SERVICETIME: 163.000000 ms. AVERAGE WAITTIME: 1.666667 ms. AVERAGE REQUESTSIZE: 1137 sectors. NUMBER OF IOs COMPLETED IN LAST CYCLE ON DISK: 9 THRESHOLD VIOLATION COUNT: 6 NON_ZERO SERVICETIME COUNT: 6 SET CONFINE SUCCESS: 1
NOTE: Initiating ASM Instance operation: Query ASM Deactivation Outcome on 3 disks
Published 1 grid disk events Query ASM Deactivation Outcome on DG DATA_DM01 to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
Published 1 grid disk events Query ASM Deactivation Outcome on DG DBFS_DG to: 
ClientHostName = dm01db08.lunar,  ClientPID = 10891
Published 1 grid disk events Query ASM Deactivation Outcome on DG RECO_DM01 to: 
ClientHostName = dm01db07.lunar,  ClientPID = 10523

同样的信息,在alerthistroy总也有可以观察到:

70_1  2013-12-19T01:27:21+08:00       warning         "Hard disk entered confinement offline status. The LUN 0_8 changed status to warning - confinedOffline. CellDisk changed status to normal - confinedOffline. All subsequent I/Os on this disk are failed immediately. Confinement tests will be run on the disk to determine if the disk should be dropped. Status                      : WARNING - CONFINEDOFFLINE  Manufacturer                : HITACHI  Model Number                : HUS1560SCSUN600G  Size                        : 600G  Serial Number               : 1216KLN0HN  Firmware                    : A700  Slot Number                 : 8  Cell Disk                   : CD_08_dm01cel05  Grid Disk                   : RECO_DM01_CD_08_dm01cel05, DBFS_DG_CD_08_dm01cel05, DATA_DM01_CD_08_dm01cel05  Reason for confinement      : threshold for service time exceeded"

我猜测,上述信息是通过类似:下面这样的命令完成的:
list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

上述就是磁盘检测的第一步,接下来,Exadata就开始执行test步骤,即“Prepare for test – confined offline”
这个步骤的第一件事情,就是将所有该celldisk上的griddisk都offline,然后运行在其上运行性能测试。
这里,CELLSRV会请求ASM将griddisk offline,同时alert总记录信息如下:

NOTE: Initiating ASM Instance operation: ASM OFFLINE disk on 3 disks
Published 1 grid disk events ASM OFFLINE disk on DG DATA_DM01 to: 
ClientHostName = dm01db07.lunar,  ClientPID = 10523
Published 1 grid disk events ASM OFFLINE disk on DG DBFS_DG to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
Published 1 grid disk events ASM OFFLINE disk on DG RECO_DM01 to: 
ClientHostName = dm01db04.lunar,  ClientPID = 10378
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_BAD_ONLINE to newState HEALTH_BAD_OFFLINE
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_BAD_ONLINE to newState HEALTH_BAD_OFFLINE

如果可能的话,ASM会将所有相关的GRIDDISK OFFLINE。
注意,这个是“如果可能的话”,也就是说还需要看其他一些情况,比如是否满足了’disk_repair_time’的需求(缺省3.6小时)。

接下来,MS进程会对已经offline的griddisk进行压力测试,如果测试结果良好,MS会告诉CELLSRV说:这个磁盘是好的,请放心使用,而CELLSRV得到这个通知后就会告诉ASM:亲,请将这些griddisk online吧,他们都是ok的。
例如:

Fri Nov 08 02:48:03 2013
CDHS: Do cd health state change  after confinement CD_08_dm01cel05 testFailed 0
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_BAD_OFFLINE to newState HEALTH_GOOD
Set CD perf state normal.
No need to clear proactive drop state for RECO_DM01_CD_08_dm01cel05 [5401ad6b-29b3-4f56-a958-29580980abf9] :
No need to clear proactive drop state for DATA_DM01_CD_08_dm01cel05 [37f5e40a-5e03-48be-b5ab-34f70e629f20] :
No need to clear proactive drop state for DBFS_DG_CD_08_dm01cel05 [45bf182b-5124-4e5c-94ef-9b6fe4fd4390] :
NOTE: Initiating ASM instance operation:
 Operation: ONLINE ASM disks for 3 Grid disks guids...
NOTE: Initiating ASM Instance operation: ASM ONLINE disk on 3 disks
Published 1 grid disk events ASM ONLINE disk on DG DATA_DM01 to: 
ClientHostName = dm01db07.lunar,  ClientPID = 10523
Published 1 grid disk events ASM ONLINE disk on DG DBFS_DG to: 
ClientHostName = dm01db08.lunar,  ClientPID = 10675
Published 1 grid disk events ASM ONLINE disk on DG RECO_DM01 to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_BAD_OFFLINE to newState HEALTH_GOOD

类似的,alerthistory中的信息如下:

dm01cel05: 70_2  2013-12-19T01:31:06+08:00       clear           "Hard disk status changed to normal.  Status        : NORMAL  Manufacturer  : HITACHI  Model Number  : HUS1560SCSUN600G  Size          : 600GB  Serial Number : 1216KLN0HN  Firmware      : A700  Slot Number   : 8  Cell Disk     : CD_08_dm01cel05  Grid Disk     : RECO_DM01_CD_08_dm01cel05, DBFS_DG_CD_08_dm01cel05, DATA_DM01_CD_08_dm01cel05"

当然,如果MS对offline的griddisk进行测试时候,发现性能确实已经不满足要求,呢么celldisk的状态就会改变为’proactive failure’,同时物理盘的状态改变为 ‘warning – poor performance’
也就是说,这些盘就需要从当前配置中移除。这时,MS就会通知CELLSRV这一信息,而CELLSRV就会通知ASM去drop这些griddisk。

在alerthistory中相关信息如下:

dm01cel05: 87_2  2014-01-31T02:06:15+08:00       critical        "Hard disk entered poor performance status.  Status                      : WARNING - POOR PERFORMANCE  Manufacturer                : HITACHI  Model Number                : HUS1560SCSUN600G  Size                        : 600G  Serial Number               : 1216KLN0HN  Firmware                    : A700  Slot Number                 : 8  Cell Disk                   : CD_08_dm01cel05  Grid Disk                   : RECO_DM01_CD_08_dm01cel05, DBFS_DG_CD_08_dm01cel05, DATA_DM01_CD_08_dm01cel05  Reason for poor performance : threshold for service time exceeded"

而cell的alert会有如下信息:

Fri Jan 31 02:06:12 2014
CDHS: Do cd health state change  after confinement CD_08_dm01cel05 testFailed 1
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_BAD_OFFLINE to newState HEALTH_FAIL
NOTE: Initiating ASM Instance operation: ASM DROP dead disk on 3 disks
Published 1 grid disk events ASM DROP dead disk on DG DATA_DM01 to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
Published 1 grid disk events ASM DROP dead disk on DG DBFS_DG to: 
ClientHostName = dm01db04.lunar,  ClientPID = 10378
Published 1 grid disk events ASM DROP dead disk on DG RECO_DM01 to: 
ClientHostName = dm01db08.lunar,  ClientPID = 10891
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_BAD_OFFLINE to newState HEALTH_FAIL

同时,在asm的磁盘中就会出现相应的drop磁盘的信息(drop force)
之后,等待rebalance完成后,就可以更换磁盘了。ASM的rebalance信息可以参考+ASM5_rbal_xxxx和+ASM5_arb0_xxxx相关的trace。

此条目发表在 内部机制 分类目录,贴了 , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注