诊断因为inefficent SQL导致严重Cache buffers chain / Cache buffer lru chain Latch,引起的CPU紧张和用户Batch缓慢
数据库817,有人叫batch太慢了,开始检查。
CPU使用率一直很高,且有进程IO等待,但iostat没有显示有wt
vmstat 1 10
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id
3 7 0 39824280 41916960 0 0 0 0 0 0 0 0 6 6 6 4294967196 0 0 -810 -1696 -1165
21 28 0 39444944 40480648 3093 12 16512 0 0 0 0 0 1 5 3 65720 205520 53041 70 29 0
28 6 0 39444944 40480784 3307 4 14392 0 0 0 0 0 3 3 3 58991 161877 43637 74 26 0
25 115 0 39445008 40480696 1776 13 16032 0 0 0 0 0 0 1 2 53622 170834 40450 71 29 0
46 10 0 39446648 40482024 1054 2 12976 0 0 0 0 0 2 3 2 49978 163920 36885 72 28 0
17 10 0 39446648 40482000 1366 7 14664 0 0 0 0 0 0 0 2 44678 187017 36723 71 28 0
14 5 0 39445952 40481536 1142 696 9904 0 0 0 0 0 5 2 0 54537 109587 30931 71 29 0
46 121 0 39445376 40480744 875 26 11464 0 0 0 0 0 3 5 3 75939 197783 44311 69 31 0
40 43 0 39451008 40485880 1364 1384 14368 0 0 0 0 0 2 5 4 75546 180180 47494 70 30 0
16 15 0 39450232 40485128 1782 994 18408 0 0 0 0 0 0 3 1 61110 138910 38253 72 28 0
iostat 1 5
tty sd0 sd1 sd2 sd3 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
23 1227 0 0 0 105 6 7 105 6 7 103 6 7 45 95 21 44
0 234 0 0 0 0 0 0 8 1 1 8 1 11 71 29 0 0
0 80 0 0 0 32 4 8 32 4 10 8 1 3 68 32 0 0
0 80 0 0 0 0 0 0 24 3 12 8 1 9 70 30 0 0
0 80 0 0 0 8 1 7 16 2 1 8 1 6 69 31 0 0
使用top/prstat,没有发现使用非凡多cpu资源的进程
察看数据库是否有并行进程,v$px_session返回空行。
检查v$session_wait,发现很多latch free,怀疑latch free导致cpu 紧张,db file sequential/scattered read导致有进程被IO阻塞。
EVENT COUNT(*)
---------------------------------------------------------------- ---------
PL/SQL lock timer 21
SQL*Net message from client 362
SQL*Net message to client 2
SQL*Net more data to client 2
db file scattered read 6
db file sequential read 25
latch free 10
log file sync 3
pmon timer 1
rdbms ipc message 6
smon timer 1
察看是那些latch,发现很多cache buffers chains,cache buffers lru chain怀疑不高效率的SQL导致hot block
SQL> select v$session.sid,name latch_name from
v$session,v$latch,v$session_wait s where v$session.sid=s.sid and s.event='latch free' and s.p2=v$latch.latch#;
SID LATCH_NAME
---------- ---------------------------------------------------------------
181 cache buffers chains
287 cache buffers chains
416 cache buffers lru chain
485 cache buffers lru chain
502 cache buffers lru chain
586 cache buffers lru chain
想察看哪个segment引起hot block, 但如下sql 几分钟内无返回,故cancel。
Select distinct owner,segment_name,partition_name,segment_type from dba_extents a ,
(Select obj, dbarfil, dbablk from x$bh where Hladdr in (Select addr from
v$latch_children where name='cache buffers chains' and addr in (select p1raw from v$session_wait
where event='latch free') )) b where
b.dbarfil=a. RELATIVE_FNO and b.dbablk >= a.block_id and b.dbablk<( a.block_id+a.blocks);
跟踪进程,察看进程执行什么SQL
********************************************************************************
SELECT SEVT.SCHD_EV_ID,ATEV.ROW_SEQ
FROM
FT_T_SEVT SEVT,FT_T_ATEV ATEV WHERE SEVT.SCHD_POST_TMS<=:b1 AND SEVT.ACTG_TRN_EV_ID=ATEV.ACTG_TRN_EV_ID AND SEVT.ACCT_GRP_OID=:b2 AND SEVT.ACCT_ID BETWEEN:b3 AND:b4 ORDER BY ATEV.ROW_SEQ
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 20 0.00 0.00 0 0 0 0
Fetch 19 57.42 90.34 461156 517860 3344 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 39 57.42 90.34 461156 517860 3344 0
********************************************************************************
SQL> @$Oracle_HOME/rdbms/admin/utlXPls
Plan Table
--------------------------------------------------------------------------------
Operation Name Rows Bytes Cost Pstart Pstop
--------------------------------------------------------------------------------
SELECT STATEMENT 20 1K 1724
SORT ORDER BY 20 1K 1724
NESTED LOOPS 20 1K 1722
TABLE Access FULL FT_T_SEVT 20 1K 1662
TABLE ACCESS BY INDEX RFT_T_ATEV 415M 6G 3
INDEX UNIQUE SCAN FT_T_ATEV 415M 2
--------------------------------------------------------------------------------
9 rows selected.
明显表FT_T_SEVT 缺少索引,导致full table scan. 还trace了其他几个session,发现也是执行如上同样的SQL语句。检查v$sql
SQL>selectusers_opening,EXECUTIONS,BUFFER_GETS,DISK_READS from v$sql where address='B83AC5F0';
USERS_OPENING
USERS_OPENING 表示有60个session在执行同样的SQL语句,导致表FT_T_SEVT 非常热点,严重的cache buffer chain/cache buffer lru chain 使CPU使用率较高;Full table scan也给IO带来等待。
EXECUTIONS BUFFER_GETS DISK_READS 60 49784 -1.604E+09 1963650435