Exadata上cell如何判断何时进行智能扫描?————1.在Exadata环境的测试

联系:QQ(5163721)

标题:Exadata上cell如何判断何时进行智能扫描?————1.在Exadata环境的测试

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

我们知道,SmartScan的先决条件是下面3个:
1,必须是对象上的全表扫描或者全索引快速扫描(TABLE ACCESS FULL或INDEX FAST FULL SCAN)
2,必须使用直接路径读取(Direct Path Read)。SmartScan的数据流是无法缓存在SGA缓冲池中的。直接路径读取可以串行,也可以并行,缓冲在进程的PGA(heap)中。
3,对象必须存储在Exadata的Cell节点上
.
在Exadata环境的数据库节点上:
1,ASM实例使用libcell11.a这个动态库跟cell节点通信,
2,数据库实例使用kcfis( Kernel File Intelligent Storage )执行SmartScan。
3,Diskmon进行心跳监控,I/O fencing和IORM 。
.
而cell节点上,主要是cellsrv(多线程的服务)完成相关的IO请求的操作(MS主要是监控,RS是重启其他进程)
.
并且,对于Linux环境,Exadata和非Exadata上面安装的Oracle介质并无却别,那么有两个问题:
1,这两个环境是上执行计划为什么会不同(一个是全表扫描,一个是智能扫描)?
2,cell在收到数据库节点的请求后,怎么判断是否执行只能扫描呢?
.
首先,在Exadata上大体有下面3种读取数据块的方式:
1,普通的block读:从磁盘读取数据块,读入到SGA
2,直接路径读:从磁盘读取数据块,读入PGA
3,智能扫描:由cell直接从底盘读取,读入PGA
.
而在非Exadata环境,通常只有前两种。

我们分别用gdb跟踪一下。
首先在我的Exadata VM上测试:

创建测试表:

------------ session 1 ----------------
LUNAR@lunar>create table lunar as select * from dba_objects where rownum<=100;

Table created.

Elapsed: 00:00:00.35
LUNAR@lunar>select pid, spid from v$process p, v$session s where p.addr = s.paddr and s.sid=(select sid from v$mystat where rownum=1);

       PID SPID
---------- ------------------------------------------------
        24 12868

Elapsed: 00:00:00.05
LUNAR@lunar>

使用gdb跟踪这个会话:
———— 回到 session 2 —————-
(gdb) c
Continuing.
继续跟踪

———— 回到 session 1 —————-
执行: LUNAR@bbff1>select count(1) from lunar;

hang住,等待gdb继续跟踪……

———— 回到 session 2 继续跟踪 —————-
(gdb) c

Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()
(gdb) c
Continuing.
c

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()
(gdb) c
Continuing.

Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()
(gdb) c
Continuing.

Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()
(gdb) c
Continuing.

———— 回到 session 1 —————-
这时,我们看到session已经返回了结果:

LUNAR@bbff1>select count(1) from lunar;

  COUNT(1)
----------
       100

Elapsed: 00:01:58.50
LUNAR@bbff1>

我们发现,对于这个SQL的执行,基本上是反复重复下面的动作,这里例子中是重复做了8次:(后面注释为个人猜测)

Breakpoint 2, 0x00000000021c9bee in kcfis_capability_tab_get ()      ---猜测是读表的大小或者各种存储属性等等(storage参数的相关内容,是否压缩,是否加密等等)
Breakpoint 6, 0x00000000021c9cd6 in kcfis_tablespace_is_on_sage ()   ---判断表空间是否在cell上,也就是SAGE(Storage Appliance for Grid Environments)是Exadata最早的雏形,也是Oracle内部开发时该项目的名称
Breakpoint 170, 0x000000000681b1b6 in kcfis_tbs_storage_attr ()      ---判断表空间的属性 1  
Breakpoint 14, 0x00000000021cb0ae in kcfis_get_ts_attrs ()           ---判断表空间的属性 2  猜测这两判断的来源应该不同

在使用event 10046跟踪一下:

LUNAR@bbff1>select pid, spid from v$process p, v$session s where p.addr = s.paddr and s.sid=(select sid from v$mystat where rownum=1);

       PID SPID
---------- ------------------------
        32 20361

Elapsed: 00:00:06.14
LUNAR@bbff1>

从跟踪文件上看,执行计划是“TABLE ACCESS STORAGE FULL LUNAR”,确实是Smart Scan:

PARSING IN CURSOR #47327115779400 len=26 dep=0 uid=37 oct=3 lid=37 tim=1394896509806091 hv=3764746106 ad='7e7afa50' sqlid='dprpmj3h6axvu'
select count(1) from lunar
END OF STMT
PARSE #47327115779400:c=32002,e=4391771,p=0,cr=5,cu=0,mis=1,r=0,dep=0,og=1,plh=291688323,tim=1394896509806089
EXEC #47327115779400:c=0,e=628,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=291688323,tim=1394896509806815
WAIT #47327115779400: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1394896509806894
FETCH #47327115779400:c=0,e=74,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=291688323,tim=1394896509806997
STAT #47327115779400 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=4 pr=0 pw=0 time=91 us)'
STAT #47327115779400 id=2 cnt=100 pid=1 pos=1 obj=17867 op='TABLE ACCESS STORAGE FULL LUNAR (cr=4 pr=0 pw=0 time=138 us cost=3 size=0 card=100)'
WAIT #47327115779400: nam='SQL*Net message from client' ela= 323 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1394896509807410
FETCH #47327115779400:c=0,e=4,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=291688323,tim=1394896509807575
WAIT #47327115779400: nam='SQL*Net message to client' ela= 6 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1394896509807623

格式化一下:

SQL ID: dprpmj3h6axvu Plan Hash: 291688323

select count(1) 
from
 lunar


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       4.39          0          5          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       4.39          0          9          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 37  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=0 pw=0 time=91 us)
       100        100        100   TABLE ACCESS STORAGE FULL LUNAR (cr=4 pr=0 pw=0 time=138 us cost=3 size=0 card=100)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message from client                     2      126.89        126.89
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00

顺便解释一下,之所以之前重复了8次执行kcfis那些函数,还包括了动态采样,O(∩_∩)O哈哈~:

********************************************************************************

SQL ID: 08vx76v2xp6jp Plan Hash: 291688323

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE 
  NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') 
  NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), 
  NVL(SUM(C2),:"SYS_B_1") 
FROM
 (SELECT /*+ NO_PARALLEL("LUNAR") FULL("LUNAR") NO_PARALLEL_INDEX("LUNAR") */ 
  :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM "LUNAR"."LUNAR" "LUNAR") SAMPLESUB


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.30          0          0          0           0
Execute      1      0.01       0.95          0          0          0           0
Fetch        1      0.00       0.21          0          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.01       1.46          0          4          0           1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 37     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=0 pw=0 time=211740 us)
       100        100        100   TABLE ACCESS STORAGE FULL LUNAR (cr=4 pr=0 pw=0 time=9494 us cost=3 size=0 card=409)

********************************************************************************

未完,待续……

此条目发表在 内部机制 分类目录。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注

您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>