昨晚业务系统导入资料并重建索引时一个会话突然停滞不前,用TOAD一看,一直在等待Library Cache Lock。TOAD、OEM中都看不到此锁,会话每三秒启动一次,但每次都是等待这个锁。显然,这和数据字典有关,应该是一个索引的数据字典中的记录被锁住了,导致无法重建。可是杀光了其他ACTIVE的会话,问题仍然没有得到解决,看来是某一个被杀死的会话持有该锁,而会话尚未回滚完全,进程仍然吊死着。现在的问题就是找这个会话了。
首先想到的文档就是Oracle9i Database Reference了,找到附录A,说明如下:
This event controls the concurrency between clients of the library cache. It acquires a lock on the object handle so that either:
One client can prevent other clients from accessing the same object
The client can maintain a dependency for a long time (for example, no other client can change the object)
This lock is also obtained to locate an object in the library cache.
Wait Time: 3 seconds (1 second for PMON)
Parameters:
handle address
Address of the object being loaded
lock address
Address of the load lock being used. This is not the same thing as a latch or an enqueue, it is a State Object.
mode
Indicates the data pieces of the object which need to be loaded
namespace
几乎等于什么都没说,不过lock address应该会有点用处。
转而上网搜索解决方案,终于找到一篇metalink上的文档:
Doc ID:
Note:122793.1
Subject: HOW TO FIND THE SESSION HOLDING A LIBRARY CACHE LOCK
Type: BULLETIN
Status: PUBLISHED
Content Type: TEXT/PLAIN
Creation Date: 23-OCT-2000
Last Revision Date: 17-JUL-2002
PURPOSE
-------
In some situations it may happen your session is 'hanging' and is awaiting for
a 'Library cache lock'. This document describes how to find the session that
in fact has the lock you are waiting for.
SCOPE & APPLICATION
-------------------
Support analysts, dba's, ..
HOW TO FIND THE SESSION HOLDING A A LIBRARY CACHE LOCK
------------------------------------------------------
Common situations:
* a DML operation that is hanging because the table which is accessed is currently
undergoing changes (ALTER TABLE). This may take quite a long time depending on
the size of the table and the type of the modification
(e.g. ALTER TABLE x MODIFY (col1 CHAR(200) on thousands of records).
* The compilation of package will hang on Library Cache Lock and Library Cache Pin
if some users are executing any Procedure/Function defined in the same package.
In the first situation the V$LOCK view will show that the session doing the
'ALTER TABLE' has an exclusive DML enqueue lock on the table object (LMODE=6,
TYPE=TM and ID1 is the OBJECT_ID of the table). The waiting session however does
not show up in V$LOCK yet so in an environment with a lot of concurrent sessions
the V$LOCK information is insufficient to track down the culprit blocking your
operation.
METHOD 1: SYSTEMSTATE ANALYSIS
------------------------------
One way of finding the session blocking you is to analyze the system state dump.
Using the systemstate event one can create a tracefile containing detailed
information on every Oracle process. This information includes all the resources
held & requested by a specific process.
Whilst an operation is hanging, open a new session and launch the following
statement:
ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 8';
Oracle will now create a systemstate tracefile in your USER_DUMP_DEST directory.
Get the PID (ProcessID) of the 'hanging' session from the V$PROCESS by matching
PADDR from V$SESSION with ADDR from V$PROCESS:
SELECT PID FROM V$PROCESS WHERE ADDR=
(SELECT PADDR FROM V$SESSION WHERE SID=sid_of_hanging_session);
The systemstate dump contains a separate section with information for each
process. Open the tracefile and do a search for 'PROCESS pid_from_select_stmt'.
In the process section look up the wait event by doing a search on 'waiting for'.
Example output:
PROCESS 8:
----------------------------------------
SO: 50050b08, type: 1, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=8, calls cur/top: 5007bf6c/5007bf6c, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 82 0 4
last post received-location: kslpsr
last process to post me: 5004ff08 1 2
last post sent: 0 0 13
last post sent-location: ksasnd
last process posted by me: 5004ff08 1 2
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 50058ac4
O/S info: user: daemon, term: pts/1, ospid: 15161
OSD pid info: 15161
----------------------------------------
SO: 5005f294, type: 3, owner: 50050b08, flag: INIT/-/-/0x00
(session) trans: 0, creator: 50050b08, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-0008-00000002, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 6, prv: 0, user: 41/LC
O/S info: user: daemon, term: pts/1, ospid: 15160, machine: goblin.forgotten.realms
program: [url=mailto:sqlplus@goblin.forgotten.realms]sqlplus@goblin.forgotten.realms (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
waiting for 'library cache lock' blocking sess=0x0 seq=253 wait_time=0
!>> handle address=5023ef9c, lock address=5019cad4, 10*mode+namespace=15
Using the 'handle address' you can look up the process that is keeping a lock
on your resource by doing a search on the address within the same tracefile.
Example output:
PROCESS 9:
----------------------------------------
SO: 50050e08, type: 1, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=9, calls cur/top: 5007bbac/5007bbfc, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
<cut> ....
----------------------------------------
SO: 5019d5e4, type: 34, owner: 5015f65c, flag: INIT/-/-/0x00
!>> LIBRARY OBJECT PIN: pin=5019d5e4 handle=5023ef9c mode=X lock=0
user=5005fad4 session=5005fad4 count=1 mask=0511 savepoint=118218 flags=[00]
From the output we can see that the Oracle process with PID 9 has an exclusive
lock on the object we are trying to access. Using V$PROCESS and V$SESSION we can
retrieve the sid,user,terminal,program,... for this process. The actual statement
that was launched by this session is also listed in the tracefile (statements and
other library cache objects are preceded by 'name=').
METHOD 2: EXAMINE THE X$KGLLK TABLE
-----------------------------------
The X$KGLLK table (accessible only as SYS/INTERNAL) contains all the
library object locks (both held & requested) for all sessions and
is more complete than the V$LOCK view although the column names don't
always reveal their meaning.
You can examine the locks requested (and held) by the waiting session
by looking up the session address (SADDR) in V$SESSION and doing the
following select:
select * from x$kgllk where KGLLKSES = 'saddr_from_v$session'
This will show you all the library locks held by this session where
KGLNAOBJ contains the first 80 characters of the name of the object.
The value in KGLLKHDL corresponds with the 'handle address' of the
object in METHOD 1.
You will see that at least one lock for the session has KGLLKREQ > 0
which means this is a REQUEST for a lock (thus, the session is waiting).
If we now match the KGLLKHDL with the handles of other sessions in
X$KGLLK that should give us the address of the blocking session since
KGLLKREQ=0 for this session, meaning it HAS the lock.
SELECT * FROM X$KGLLK LOCK_A
WHERE KGLLKREQ = 0
AND EXISTS (SELECT LOCK_B.KGLLKHDL FROM X$KGLLK LOCK_B
WHERE KGLLKSES = 'saddr_from_v$session' /* BLOCKED SESSION */
AND LOCK_A.KGLLKHDL = LOCK_B.KGLLKHDL
AND KGLLKREQ > 0);
If we look a bit further we can then again match KGLLKSES with SADDR
in v$session to find further information on the blocking session:
SELECT SID,USERNAME,TERMINAL,PROGRAM FROM V$SESSION
WHERE SADDR in
(SELECT KGLLKSES FROM X$KGLLK LOCK_A
WHERE KGLLKREQ = 0
AND EXISTS (SELECT LOCK_B.KGLLKHDL FROM X$KGLLK LOCK_B
WHERE KGLLKSES = 'saddr_from_v$session' /* BLOCKED SESSION */
AND LOCK_A.KGLLKHDL = LOCK_B.KGLLKHDL
AND KGLLKREQ > 0)
);
In the same way we can also find all the blocked sessions:
SELECT SID,USERNAME,TERMINAL,PROGRAM FROM V$SESSION
WHERE SADDR in
(SELECT KGLLKSES FROM X$KGLLK LOCK_A
WHERE KGLLKREQ > 0
AND EXISTS (SELECT LOCK_B.KGLLKHDL FROM X$KGLLK LOCK_B
WHERE KGLLKSES = 'saddr_from_v$session' /* BLOCKING SESSION */
AND LOCK_A.KGLLKHDL = LOCK_B.KGLLKHDL
AND KGLLKREQ = 0)
);
RELATED DOCUMENTS
-----------------
[NOTE:1020008.6] SCRIPT FULLY DECODED LOCKING SCRIPT
[NOTE:1054939.6] COMPILATION OF PACKAGE IS HANGING ON LIBRARY CACHE LOCK
.
开头说了半天废话,后面给出了两个解决方案,进行事件跟踪或者查询X$KGLLK表,还是后者容易一点。搜寻半天,终于问到sysdna登录的方法(权限低,没密码
)。第一次用X$表,用sysdba手还有点发抖(那可是正式的业务系统),敲入命令发现运行了很久才反应,找出一个SID了,仔细一看,居然是一个后台维护终端,程序是TOAD,状态INACTIVE。询问相关人员,从未ALTER该表,看来是TOAD看表结构或者数据的时候出了问题,kill掉终于正常,索引很快建完。只是INACTIVE的会话居然也持有锁,真是奇怪。工具太好了,也是双刃剑啊。