Flashcache WriteBack的常用Metric和event

联系:QQ(5163721)

标题:Flashcache WriteBack的常用Metric和event

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

User guide上列出了全部的Metric,这里只说些一般比较关注的:

CellCLI> list metriccurrent FC_BY_USED, FC_BY_DIRTY
         FC_BY_USED      FLASHCACHE      307,012 MB
         FC_BY_DIRTY     FLASHCACHE      250,320 MB

FC_BY_USED – number of MB cached (total)
FC_BY_DIRTY – number of dirty MB cached (data written only to FlashCache but not to disks)

CellCLI> list metriccurrent GD_BY_FC_DIRTY
         GD_BY_FC_DIRTY  DATA_CD_00_cel14    7,214 MB
         GD_BY_FC_DIRTY  DATA_CD_01_cel14    6,698 MB
              . . . 

CellCLI> list metriccurrent CD_BY_FC_DIRTY where metricObjectName=FD_05_cel14
         CD_BY_FC_DIRTY  FD_05_cel14       19,430 MB

GD_BY_FC_DIRTY – number of dirty MB cached for the griddisk
CD_BY_FC_DIRTY – number of dirty MB cached on the flash celldisk

通常我们测试一个应用的flashcache的负载,可以参考如下过程:
1,重置flashcache的统计信息:

CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(resetStats,0,0,0)"

2,应用测试增加负载

3,dump统计信息:
看 cellsrv 的ADR…/trace/下面的svtrc开头的trace文件,比如我的环境:


export CELL_ADR=/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell

CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(dumpStats,0,0,L)"
  L – is any non-negative number to be used as a result Label

动态调整从griddisk重读的频率:

CellCLI> alter cell events = "immediate cellsrv.cellsrv_setparam ('_cell_flashcache_diag_reads_frequency ','20')"

下面是查看flashcache的控制块的结构的方法:
首先在cellsrv的alert中找到类似下面的信息:

。。。
Sun Jun 30 10:24:28 2013
<strong>Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=67d1e918-7455-4afa-9b6f-7ecb2da11b82 (504684860), size=816MB, cdisk=FD_02_dm01cel01</strong>
Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=d6400f21-abcc-4ea3-a09c-79425619ef36 (1922598212), size=816MB, cdisk=FD_01_dm01cel01
Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=decfc31a-e1ba-4d63-9161-50f6db206572 (1710125700), size=816MB, cdisk=FD_03_dm01cel01
Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=4dec16e9-1559-468d-9cfe-39443ac18ddc (1241609148), size=816MB, cdisk=FD_00_dm01cel01
FlashCache: allowing client IOs (mode=Writeback)
Sun Jun 30 10:24:52 2013
Smart Flash Logging enabled on FlashLog dm01cel01_FLASHLOG (2124647276), size=128MB, cdisk=FD_00_dm01cel01
Sun Jun 30 10:24:52 2013
Smart Flash Logging enabled on FlashLog dm01cel01_FLASHLOG (885498860), size=128MB, cdisk=FD_03_dm01cel01
Sun Jun 30 10:24:53 2013
Smart Flash Logging enabled on FlashLog dm01cel01_FLASHLOG (1996959724), size=128MB, cdisk=FD_02_dm01cel01


。。。


dump这个flashcache的控制块结构:

CellCLI> list cell detail
         name:                   dm01cel01
         bbuTempThreshold:       60
         bbuChargeThreshold:     800
         bmcType:                absent
         cellVersion:            OSS_11.2.3.2.1_LINUX.X64_130109
         cpuCount:               2
         diagHistoryDays:        7
         fanCount:               1/1
         fanStatus:              normal
         flashCacheMode:         WriteBack
         id:                     ef56a78d-a9cc-4c04-baac-648a18370eb7
         interconnectCount:      1
         interconnect1:          eth0
         iormBoost:              0.0
         ipaddress1:             192.168.56.11/24
         kernelVersion:          2.6.18-274.el5xen
   <strong>      makeModel:              Fake hardware</strong>
         metricHistoryDays:      7
         offloadEfficiency:      1,000.0
         powerCount:             1/1
         powerStatus:            normal
         releaseVersion:         11.2.3.2.1
         releaseTrackingBug:     14522699
         status:                 online
         temperatureReading:     0.0
         temperatureStatus:      normal
         upTime:                 0 days, 0:09
         cellsrvStatus:          running
         msStatus:               running
         rsStatus:               running

<strong>CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(dumpctrlblock, 504684860,0,L)"</strong>
Dump sequence #1 has been written to <strong><strong>/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/dm01cel01/trace/svtrc_2312_58.trc</strong>
Cell dm01cel01 successfully altered

CellCLI> 

trace内容如下:

Trace file /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/dm01cel01/trace/svtrc_2312_58.trc
ORACLE_HOME = /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109
System name:    Linux
Node name:      dm01cel01
Release:        2.6.18-274.el5xen
Version:        #1 SMP Mon Jul 25 14:24:57 EDT 2011
Machine:        x86_64
CELL SW Version:        OSS_11.2.3.2.1_LINUX.X64_130109

*** 2013-06-30 10:24:25.763
UserThread: LWPID: 2648 userId: 58 kernelId: 58 pthreadID: 0x6641a940
FCC: Control block memory dump for Flash ID 504684860:
2013-06-30 10:33:35.042335*: Dump sequence #1:
2AAB6BD00000 A624DCC2 00000000 54434346 4B4C424C  [..$.....FCCTLBLK]
2AAB6BD00010 64373600 31396531 34372D38 342D3535  [.67d1e918-7455-4]
2AAB6BD00020 2D616661 66366239 6365372D 61643262  [afa-9b6f-7ecb2da]
2AAB6BD00030 38623131 03050032 00000002 00000000  [11b82...........]
2AAB6BD00040 00280000 00000000 33000000 00000000  [..(........3....]
2AAB6BD00050 5B0B8A32 0000013F 1E14E13C 00000002  [2..[?...<.......]
2AAB6BD00060 00000010 00000000 54434346 4B4C424C  [........FCCTLBLK]
2AAB6BD00070 00000000 00000000                    [........]
<strong>FCC: Control block formatted dump for Flash ID 504684860:
  fid=504684860, head/tail FCCTLBLK/FCCTLBLK, mdSize=2621440, chksum=2787433666,
  flashSz=855638016, flags=5, cacheline size shift 16,   creation_timestamp=1371622050354,
  version=2, guid=67d1e918-7455-4afa-9b6f-7ecb2da11b82, pers_mode=Writeback,
  verif_level=crc.
</strong><strong>FCC: Control block for flashID=504684860 is located on cdisk='FD_02_dm01cel01' at offset 184549376
FCC: Primary FC metadata for flashID=504684860 is located on cdisk='FD_02_dm01cel01', offset/size: 184614912 1261568
FCC: Shadow FC metadata for flashID=504684860 is located on cdisk='FD_02_dm01cel01', offset/size: 185876480 1261568
</strong>


检查某个griddisk是否被cache,并dump其cache-line header的信息

首先看alert
。。。
Sun Jun 30 10:24:26 2013
CellDisk v0.7 name=CD_disk01_dm01cel01 status=NORMAL confine status=NONE confine reason=CD_GOOD guid=55031178-14e1-49f8-b080-ce4293b5c683 found on dev=/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01
  GridDisk name=data_CD_disk01_dm01cel01 guid=6065c05e-8eae-461c-9b43-02b5c46fd6bb (542754140), cached by these FlashCache parts:  504684860
  GridDisk name=reco_CD_disk01_dm01cel01 guid=6d40fc07-34e3-4e71-9fd8-a51a98e68769 (709161532), cached by these FlashCache parts:  504684860
Initialization of celldisk CD_disk01_dm01cel01 on /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01 completed.
  GridDisk name=data_CD_disk12_dm01cel01 guid=0fb93c36-a5b5-431a-aa7b-025b52f7cbe4 (2820417692), cached by these FlashCache parts:  1241609148
  GridDisk name=reco_CD_disk12_dm01cel01 guid=bba5ae9b-dc0f-4515-94f1-2104e8d0bc44 (1611296652), cached by these FlashCache parts:  1241609148
。。。

使用griddisk id和griddisk offset从FC中dump相关的cache的元数据:

SQL> conn lunar/lunar
Connected.
SQL> create table lunartest as select * from dba_objects;

Table created.

SQL> alter table lunartest STORAGE (CELL_FLASH_CACHE keep);

Table altered.

SQL> select object_id from user_objects;

 OBJECT_ID
----------
     17852

SQL> select count(*) from lunartest;

  COUNT(*)
----------
     17580

SQL> 

CellCLI> LIST FLASHCACHECONTENT WHERE objectNumber=17852 DETAIL
         cachedKeepSize:         0
         cachedSize:             966656
         dbID:                   3118431096
         dbUniqueName:           BBFF
         hitCount:               0
         missCount:              0
         objectNumber:           17852
         tableSpaceNumber:       4

CellCLI>

SQL> select count(*) from lunartest;

  COUNT(*)
----------
     17580

SQL> 
CellCLI> LIST FLASHCACHECONTENT WHERE objectNumber=17852 DETAIL
         cachedKeepSize:         1925120
         cachedSize:             1925120
         dbID:                   3118431096
         dbUniqueName:           BBFF
         hitCount:               25
         hoursToExpiration:      24
         missCount:              2
         objectNumber:           17852
         tableSpaceNumber:       4

CellCLI> 
CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(dumpmdchunk,1,542754140,1)"

Cell dm01cel01 successfully altered

CellCLI> 

[root@dm01cel01 trace]# cat svtrc_2312_80.trc
Trace file /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/dm01cel01/trace/svtrc_2312_80.trc
ORACLE_HOME = /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109
System name: Linux
Node name: dm01cel01
Release: 2.6.18-274.el5xen
Version: #1 SMP Mon Jul 25 14:24:57 EDT 2011
Machine: x86_64
CELL SW Version: OSS_11.2.3.2.1_LINUX.X64_130109

*** 2013-06-30 10:24:25.608
UserThread: LWPID: 2670 userId: 80 kernelId: 80 pthreadID: 0x74030940
2013-06-30 10:24:26.061175*: For GridDisk data_CD_disk05_dm01cel01 set these caching FlashIDs: 1922598212
2013-06-30 10:24:26.061175*: For GridDisk reco_CD_disk05_dm01cel01 set these caching FlashIDs: 1922598212
2013-06-30 10:24:26.139905*: [CDP] initCDPers – found persdata for guid: fb9bb908-044e-44a1-afa1-2428c065b9bc
dmgType: DMG_UNKNOWN dmgSlot: 255 predFailStat: 0 ioTimeIndex: 0 lastIOCompTime: 1372559066070289 lastIOSubmitTime: 0 histIOLatIndex: 0
CellDisk UUID: fb9bb908-044e-44a1-afa1-2428c065b9bc CellDiskPersObj File offset: 7360
confTransIdx: 0 Current confine state: NONE Health incarnation number: 0 ConfineTransIndex cstate ccause activeForced activeAlertSent inactiveForced inactiveAlertSent asmRespond testsFailed testOutcomeForced noneTime activeTime inactiveTime finalTime
0 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
1 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
2 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
3 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
4 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
5 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
6 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
7 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
8 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
9 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0

2013-06-30 10:25:15.733434*: New info from MS for CD CD_disk09_dm01cel01: diskMediaGroup: DMG_UNKNOWN, disk slot number: 1, predictive failure on disk: 0
No Cache header ID=1, loc=542703616
[root@dm01cel01 trace]#

不知道是否因为我的是VM,因此,这个命令实际上并没有令我满意的结果…………

此条目发表在 FAQ, POC和性能调整, 体系架构, 硬件配置 分类目录,贴了 , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注