分享
 
 
 

彻底搞清楚library cache lock的成因和解决方法(一)

王朝mssql·作者佚名  2006-12-17
窄屏简体版  字體: |||超大  

彻底搞清楚library cache lock的成因和解决方法(一)

彻底搞清楚library cache lock的成因和解决方法(一) 问题描述:

接到应用人员的报告,说是在任何对表CSNOZ629926699966的操作都会hang,包括desc CSNOZ629926699966,

例如:

ora9i@cs_dc02:/ora9i > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:11:06 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.4.0 - Production

SQL> conn pubuser/pubuser

Connected.

SQL> desc CSNOZ629926699966

。。。

这个进程 hang 了

。。。

询问了一下之前有无特别的操作,业务人员说很久以前执行了脚本,但是该教本运行很久都没有结果,然后他就退出了会话,再之后,就出现了上面的情况。脚本内容如下:

$ cat CSNOZ629926699966.sh

#!/bin/sh

sqlplus pubuser/pubuser@csmisc << EOF #use your username/password

create table CSNOZ629926699966 as select * from CSNOZ62992266cs

where mid not in ( select mid from pubuser.SUBSCRIPTION_BAK_200412@newdb where servid='020999011964' and status in ('A','B','S'));

exit;

$

$

$

$

解决过程:

ora9i@cs_dc02:/ora9i > sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:19:13 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.4.0 - Production

SQL> select * from v$lock where block=1;

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL>

我们看到目前没有锁的信息

SQL> select xidusn, object_id, session_id, locked_mode from v$locked_object;

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE

---------- ---------- ---------- -----------

14 18 37 3

。。。

SQL> /

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE

---------- ---------- ---------- -----------

14 18 37 3

。。。

SQL> /

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE

---------- ---------- ---------- -----------

14 18 37 3

。。。

SQL>

查找 v$locked_object,我们发现了一个可疑的会话,SID 37:

SQL> select object_name,owner,object_type from dba_objects where object_id=18;

。。。 。。。

OBJECT_NAME OWNER OBJECT_TYPE

------------------------------ ------------------------------ ------------------

OBJ$ SYS TABLE

。。。 。。。

SQL>

奇怪怎么一直有这个锁??

初步猜测是由于SID为37的会话执行了上面的DDL语句,并在语句未完成前异常退出,

造成了所有访问那个(DDL语句中涉及到的)对象的进程都hang了。

接下来我们看看等待事件:

SQL> select event,sid,p1,p2,p3 from v$session_wait where event not like 'SQL*%' and event not like 'rdbms%';

EVENT P1 P2 SID

---------------------------------------------------------------- ---------- ---------- ----------

pmon timer 300 0 1

ges remote message 32 0 4

gcs remote message 64 0 5

gcs remote message 64 0 7

smon timer 300 0 19

library cache lock 1.3835E+19 1.3835E+19 30

wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID

---------------------------------------------------------------- ---------- ---------- ----------

pmon timer 300 0 1

ges remote message 32 0 4

gcs remote message 64 0 5

gcs remote message 64 0 7

smon timer 300 0 19

library cache lock 1.3835E+19 1.3835E+19 30

wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID

---------------------------------------------------------------- ---------- ---------- ----------

pmon timer 300 0 1

ges remote message 32 0 4

gcs remote message 64 0 5

gcs remote message 64 0 7

smon timer 300 0 19

library cache lock 1.3835E+19 1.3835E+19 30

wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID

---------------------------------------------------------------- ---------- ---------- ----------

pmon timer 300 0 1

ges remote message 32 0 4

gcs remote message 64 0 5

gcs remote message 64 0 7

smon timer 300 0 19

library cache lock 1.3835E+19 1.3835E+19 30

wakeup time manager 0 0 22

7 rows selected.

SQL>

我们注意到下面的事件:

EVENT P1 P2 SID

---------------------------------------------------------------- ---------- ---------- ----------

。。。

library cache lock 1.3835E+19 1.3835E+19 30

。。。

P1 是句柄地址(handle address),也就是'library cache lock'发生的地址。

P2 是一个状态对象,在这里,它表示在对象上加载的锁的地址(lock address)。

P1 和 P2都是科学计数发表示的10进制数。

这些信息再次证实了上面的猜测,SID 37阻塞了SID 30。

找出这两个可疑进程的sid和serial,然后对他们设置10046事件:

SQL> select sid,serial# from v$session where sid in (30,37);

SID SERIAL#

---------- ----------

30 24167

37 2707

SQL> exec dbms_system.set_ev(30,24167,10046,12,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,10046,12,'');

PL/SQL procedure successfully completed.

SQL>

跟踪期间咱们再次测试一下,看看有没有其他线索。

新开一个进程,找出其sid, serial和spid等信息:

ora9i@cs_dc01:/ora9i > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 11:36:25 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.4.0 - Production

SQL> select distinct sid from v$mystat;

SID

----------

33

SQL> select sid,serial# from v$session where sid=33;

SID SERIAL#

---------- ----------

33 6639

SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=37);

SPID PID

------------ ----------

20552 26

SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=30);

SPID PID

------------ ----------

22580 28

SQL> show parameter dump

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

background_core_dump string partial

background_dump_dest string /ora9i/app/oracle/admin/csmisc

/bdump

core_dump_dest string /ora9i/app/oracle/admin/csmisc

/cdump

max_dump_file_size string UNLIMITED

shadow_core_dump string partial

user_dump_dest string /ora9i/app/oracle/admin/csmisc

/udump

SQL>

然后,再尝试对 CSNOZ629926699966 表进行操作

SQL> desc CSNOZ629926699966

。。。

还是hang住了。

于是中断这个操作(CTRL + C):

SQL> desc CSNOZ629926699966

ERROR:

ORA-01013: user requested cancel of current operation

SQL> select tname from tab where tname='CSNOZ629926699966';

no rows selected

SQL>

查看PUBUSER用户下的这个表,居然不存在!!

进一步证实了前面的猜测,也就是说会话37阻塞了其他所有操作表CSNOZ629926699966的会话,造成了进程的hang,当然,包括上面的SID 30和SID 33的DDL语句

现在,我们结束10046的事件跟踪:

SQL> exec dbms_system.set_ev(30,24167,0,0,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,0,0,'');

PL/SQL procedure successfully completed.

SQL>

根据上面记录的信息,我们知道这两个会话产生的跟踪信息分别为:

SID为30的会话,产生的跟踪文件为:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc

SID为37的会话,产生的跟踪文件为:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_20552.trc

看看trace文件:

ora9i@cs_dc02:/ora9i > cd /ora9i/app/oracle/admin/csmisc/udump

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > ll -tlc

total 4432

-rw-r----- 1 ora9i dba 332995 Jan 10 12:00 csmisc2_ora_22580.trc

-rw-r----- 1 ora9i dba 3168 Jan 10 11:59 csmisc2_ora_20552.trc

-rw-r----- 1 ora9i dba 407133 Jan 7 15:10 csmisc2_ora_2708.trc

-rw-r----- 1 ora9i dba 640 Jan 7 14:48 csmisc2_ora_835.trc

-rw-r----- 1 ora9i dba 1590 Dec 30 22:50 csmisc2_ora_16244.trc

-rw-r----- 1 ora9i dba 1308403 Dec 30 22:44 csmisc2_ora_16033.trc

-rw-r----- 1 ora9i dba 616 Dec 28 14:16 csmisc2_ora_2176.trc

-rw-r----- 1 ora9i dba 644 Dec 8 18:22 csmisc2_ora_21083.trc

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > mailx -s "csmisc2_ora_22580.trc" zhangdp@aspire-tech.com < csmisc2_ora_22580.trc

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > mailx -s "csmisc2_ora_20552.trc" zhangdp@aspire-tech.com < csmisc2_ora_20552.trc

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > exit

SQL>

我们看到SID为30的会话,产生的跟踪文件(csmisc2_ora_22580.trc)为的主要内容是:

/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.4.0 - Production

ORACLE_HOME = /ora9i/app/oracle/product/920

System name: HP-UX

Node name: cs_dc02

Release: B.11.11

Version: U

Machine: 9000/800

Instance name: csmisc2

Redo thread mounted by this instance: 2

Oracle process number: 28

Unix process pid: 22580, image: oracle@cs_dc02 (TNS V1-V3)

*** 2005-01-10 11:31:49.416

*** SESSION ID:(30.24167) 2005-01-10 11:31:49.354

WAIT #0: nam='library cache lock' ela= 507258 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 505686 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507678 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507595 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507880 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507317 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507703 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507683 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 508265 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507100 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507684 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 505889 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507731 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507650 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507604 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

WAIT #0: nam='library cache lock' ela= 507698 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

。。。 。。。

我们看到SID 30的跟踪文件中的等待事件就是在V$SESSION_WAIT中看到的'library cache lock' .

再看看SID为37的会话,产生的跟踪文件(csmisc2_ora_20552.trc)为的主要内容是:

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.4.0 - Production

ORACLE_HOME = /ora9i/app/oracle/product/920

System name: HP-UX

Node name: cs_dc02

Release: B.11.11

Version: U

Machine: 9000/800

Instance name: csmisc2

Redo thread mounted by this instance: 2

Oracle process number: 26

Unix process pid: 20552, image: oracle@cs_dc02 (TNS V1-V3)

*** 2005-01-10 11:33:22.702

*** SESSION ID:(37.2707) 2005-01-10 11:33:22.690

WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

*** 2005-01-10 11:35:07.452

WAIT #1: nam='SQL*Net message from dblink' ela= 102293555 p1=675562835 p2=1 p3=0

WAIT #1: nam='SQL*Net message to dblink' ela= 3 p1=675562835 p2=1 p3=0

*** 2005-01-10 11:36:55.980

WAIT #1: nam='SQL*Net message from dblink' ela= 105969709 p1=675562835 p2=1 p3=0

WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

*** 2005-01-10 11:39:05.416

WAIT #1: nam='SQL*Net message from dblink' ela= 126390826 p1=675562835 p2=1 p3=0

WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

*** 2005-01-10 11:41:12.878

WAIT #1: nam='SQL*Net message from dblink' ela= 124461520 p1=675562835 p2=1 p3=0

WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

*** 2005-01-10 11:43:01.285

WAIT #1: nam='SQL*Net message from dblink' ela= 105859385 p1=675562835 p2=1 p3=0

WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

*** 2005-01-10 11:44:48.200

WAIT #1: nam='SQL*Net message from dblink' ela= 104397696 p1=675562835 p2=1 p3=0

WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

。。。 。。。

现在我们来dump 系统状态(systemstate),看看更详细的信息。

首先简单的介绍一下 event systemstate。

很多人把 systemstate 事件理解为dump发生的那一刻的系统内所有进程的信息,这是个错误的概念,事实上,

转储 system state 产生的跟踪文件是从dump那一刻开始到dump任务完成之间一段事件内的系统内所有进程的信息。

dump systemstate产生的跟踪文件包含了系统中所有进程的进程状态等信息。每个进程对应跟踪文件中的一段内容,反映该进程的状态信息,包括进程信息,会话信息,enqueues信息(主要是lock的信息),缓冲区的信息和该进程在SGA区中持有的(held)对象的状态等信息。

那么通常在什么情况下使用systemstate比较合适呢?

Oracle推荐的使用systemstate事件的几种情况是:

数据库 hang 住了 数据库很慢 进程正在hang 数据库出现某些错误 资源争用dump systemstate的语法为:

ALTER SESSION SET EVENTS 'immediate trace name systemstate level 10';

也可以使用ORADEBUG实现这个功能

ORADEBUG DUMP SYSTEMSTATE level 10

如果希望在数据库发生某种错误时除非systemstate事件,可以在参数文件(spfile或者pfile)中设置event参数,

例如,当系统发生死锁(出现ORA-00060错误)时dump systemstate:

event = "60 trace name systemstate level 10"

言归正传,我们dump系统状态:

SQL> ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 8';

Session altered.

SQL> host

ora9i@cs_dc02:/ora9i >cd /ora9i/app/oracle/admin/csmisc/udump

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > ll -ctl

-rw-r----- 1 ora9i dba 1070863 Jan 10 13:02 csmisc2_ora_22580.trc

-rw-r----- 1 ora9i dba 1345368 Jan 10 13:01 csmisc2_ora_22568.trc

-rwxrwxrwx 1 ora9i dba 44114 Jan 10 12:52 ass1015.awk

-rw-r----- 1 ora9i dba 407133 Jan 7 15:10 csmisc2_ora_2708.trc

-rw-r----- 1 ora9i dba 640 Jan 7 14:48 csmisc2_ora_835.trc

-rw-r----- 1 ora9i dba 1590 Dec 30 22:50 csmisc2_ora_16244.trc

-rw-r----- 1 ora9i dba 1308403 Dec 30 22:44 csmisc2_ora_16033.trc

-rw-r----- 1 ora9i dba 616 Dec 28 14:16 csmisc2_ora_2176.trc

-rw-r----- 1 ora9i dba 644 Dec 8 18:22 csmisc2_ora_21083.trc

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump >

ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > mailx -s "22568" zhangdp@aspire-tech.com < csmisc2_ora_22568.trc

这个跟踪文件很大(因为它包含了所有进程的信息),那么我们从哪里开始看起呢?

首先,通过在跟踪文件中查找字符串"waiting for 'library cache lock'",我们找到了被阻塞进程的信息:

PROCESS 28: ----------------被阻塞的Oracle进程,这里PROCESS 28对应了V$PROCESS中的PID的值,

也就是说我们可以根据这一信息在V$PROCESS和V$SESSION找到被阻塞的会话的信息

----------------------------------------

SO: c000000109c83bf0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00

(process) Oracle pid=28, calls cur/top: c00000010b277890/c00000010b277890, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 17 24 6

last post received-location: ksusig

last process to post me: c000000109c840f8 25 0

last post sent: 0 0 15

last post sent-location: ksasnd

last process posted by me: c000000109c7ff90 1 6

(latch info) wait_event=0 bits=0

Process Group: DEFAULT, pseudo proc: c000000109eefda0

O/S info: user: ora9i, term: pts/th, ospid: 22580 ----------------该进程的操作系统进程号,对应于V$PROCESS中的SPID

OSD pid info: Unix process pid: 22580, image: oracle@cs_dc02 (TNS V1-V3)

----------------------------------------

SO: c000000109f02c68, type: 4, owner: c000000109c83bf0, flag: INIT/-/-/0x00

(session) trans: 0000000000000000, creator: c000000109c83bf0, flag: (100041) USR/- BSY/-/-/-/-/-

DID: 0002-001C-00000192, short-term DID: 0000-0000-00000000

txn branch: 0000000000000000

oct: 0, prv: 0, sql: c00000011f8ea068, psql: c00000011f8ea068, user: 50/PUBUSER

O/S info: user: ora9i, term: , ospid: 22536, machine: cs_dc02

program: sqlplus@cs_dc02 (TNS V1-V3)

application name: SQL*Plus, hash value=3669949024

waiting for 'library cache lock' blocking sess=0x0 seq=18589 wait_time=0

handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515

。。。 。。。

SO: c00000010b277890, type: 3, owner: c000000109c83bf0, flag: INIT/-/-/0x00

(call) sess: cur c000000109f02c68, rec 0, usr c000000109f02c68; depth: 0

----------------------------------------

SO: c00000011a449e20, type: 51, owner: c00000010b277890, flag: INIT/-/-/0x00

LIBRARY OBJECT LOCK: lock=c00000011a449e20 handle=c000000122e2a6d8 request=S

call pin=0000000000000000 session pin=0000000000000000

htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350

user=c000000109f02c68 session=c000000109f02c68 count=0 flags=[00] savepoint=463

the rest of the object was already dumped

。。。 。。。

请注意下面的信息:

waiting for 'library cache lock' blocking sess=0x0 seq=18589 wait_time=0

handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515

这段信息告诉我们ORACLE PID为 28的进程(PROCESS 28),正在等待'library cache lock' ,通过‘handle address=c000000122e2a6d8’我们可以找到阻塞它的会话的ORACLE PID信息。

还要注意这段信息:

LIBRARY OBJECT LOCK: lock=c00000011a449e20 handle=c000000122e2a6d8 request=S

call pin=0000000000000000 session pin=0000000000000000

htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350

user=c000000109f02c68 session=c000000109f02c68 count=0 flags=[00] savepoint=463

这里就是阻塞PROCESS 28进程的会话的信息。

简单的记住这个依据的要点是:

waiting session的'handle address'的值对应于blocking session的'handle'的值。

回过头来,看看这个值,它应于上面我们在V$SESSION_WAIT中看到的P1和P2的值:

SQL> select to_number('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') from dual;

TO_NUMBER('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')

----------------------------------------------------------------

1.3835E+19

SQL>

问题的成因已经基本上明确了,这里推荐两种解决问题的方法:

方法1,根据 c000000122e2a6d8 地址,我们可以得到当前在library cache中相应的锁信息:

SQL> l

1 select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ,KGLLKPNS,KGLLKHDL

2* from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ

SQL> /

INST_ID USER_NAME KGLNAOBJ KGLLKSNM KGLLKUSE KGLLKSES KGLLKMOD KGLLKREQ KGLLKPNS KGLLKHDL

---------- ------------- ---------------------- ---------- ---------------- ---------------- ---------- ---------- ---------------- ----------------

2 PUBUSER CSNOZ629926699966 30 C000000109F02C68 C000000109F02C68 0 2 00 C000000122E2A6D8

2 PUBUSER CSNOZ629926699966 37 C000000108C99E28 C000000108C99E28 3 0 00 C000000122E2A6D8

SQL>

按照Oracle推荐的做法,我们现在应该使用'alter system kill session'命令kill掉SID 37,

结果得到了ORA-00031错误:

SQL> alter system kill session '37,2707';

alter system kill session '37,2707'

*

ERROR at line 1:

ORA-00031: session marked for kill

SQL>

检查SID 37的状态:

SQL> set linesize 150

SQL> col program for a50

SQL> select sid,serial#,status,username,program from v$session where sid=37;

SID SERIAL# STATUS USERNAME PROGRAM

---------- ---------- -------- ------------------------------ --------------------------------------------------

37 2707 KILLED PUBUSER sqlplus@cs_dc02 (TNS V1-V3)

SQL>

再次证实了我们最初的想法—— 有人在执行了某个需要运行很久的DDL(多数是语句效率低,当然不排除遭遇bug的可能),

然后没等语句结束就异常退出了会话。

这个例子中我们在上面的跟踪文件已经找到了该会话对应的操作系统进程(SPID),如果在其他情况下,我们如何找到这种状态为'KILLED'

的操作系统进程号(SPID)呢?

下面给出了一个方法,可以借鉴:

SQL> l

1 SELECT s.username,s.status,

2 x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,

3 decode(bitand (x.ksuprflg,2),0,null,1)

4 FROM x$ksupr x,v$session s

5 WHERE s.paddr(+)=x.addr

6 and bitand(ksspaflg,1)!=0

7* and s.sid=37

SQL> /

USERNAME STATUS ADDR KSLLAPSC KSLLAPSN KSLLASPO KSLLID1R KS D

------------------------------ -------- ---------------- ---------- ---------- ------------ ---------- -- -

PUBUSER KILLED C000000109C831E0 41 15 16243 17

SQL>

x$ksupr.ADDR列的值对应了V$PROCESS 中的ADDR的值,知道了这个SPID的地址,找到这个操作系统进程(SPID)就简单了,例如:

SQL> select spid,pid from v$process where addr='C000000109C831E0';

SPID PID

------------ ----------

20552 26

SQL>

现在,我们只需要在操作系统上 kill 操作系统进程20552就可以了:

ora9i@cs_dc02:/ora9i > ps -ef | grep 20552

ora9i 20552 1 0 Jan 8 ? 0:01 oraclecsmisc2 (LOCAL=NO)

ora9i 14742 14740 0 17:19:02 pts/ti 0:00 grep 20552

ora9i@cs_dc02:/ora9i > kill -9 20552

ora9i@cs_dc02:/ora9i > ps -ef | grep 20552

ora9i 14966 14964 0 17:40:01 pts/ti 0:00 grep 20552

ora9i@cs_dc02:/ora9i >

再来检查一下SID 37的信息,我们看到这个会话是真的被kill掉了,

ora9i@cs_dc02:/ora9i > exit

SQL> select sid,serial#,status,username,program from v$session where sid=37;

no rows selected

SQL> l

1 SELECT s.username,s.status,

2 x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,

3 decode(bitand (x.ksuprflg,2),0,null,1)

4 FROM x$ksupr x,v$session s

5 WHERE s.paddr(+)=x.addr

6 and bitand(ksspaflg,1)!=0

7* and s.sid=37

SQL> /

no rows selected

SQL>

回到刚才hang住的会话,它已经恢复了正常操作,

并且我们已经得到了'ORA-04043: object CSNOZ629926699966 does not exist'这个正常的信息:

SQL> desc CSNOZ629926699966

ERROR:

ORA-04043: object CSNOZ629926699966 does not exist

SQL>

在开一个会话,测试一把:

ora9i@cs_dc02:/ora9i > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 17:42:16 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:

Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.4.0 - Production

SQL> set timing on

SQL> desc CSNOZ629926699966

ERROR:

ORA-04043: object CSNOZ629926699966 does not exist

SQL>

当发出命令'desc CSNOZ629926699966'的时候,我们看到系统立刻返回了ORA-04043: object CSNOZ629926699966 does not exist'信息,问题就此解决了。

这里,简单的介绍一下X$KGLLK,这个基表保存了库缓存中对象的锁的信息,它对于解决这类问题特别有用,其名称的含义如下:

[K]ernel Layer

[G]eneric Layer

[L]ibrary Cache Manager ( defined and mapped from kqlf )

Object Locks

X$KGLLK - Object [L]oc[K]s

KGLNAOBJ 列包含了在librarky cache中的对象上执行命令的语句的前80个字符(其实从这里我们也可以大大缩小范围了)

X$KGLLK.KGLLKUSE 和 x$kgllk.KGLLKSES 对应于跟踪文件中的owner的值

X$KGLLK.KGLLKADR

X$KGLLK.KGLLKHDL 对应于跟踪文件中的handle的值(handle=C000000122E2A6D8),也就是'library cache lock'的地址

X$KGLLK.KGLLKPNS 对应于跟踪文件中的session pin的值

X$KGLLK.KGLLKSPN对应于跟踪文件中的savepoint的值

我们再来看一下更全面的信息:

SQL> set linesize 2000

SQL> select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ

2 /

ADDR INDX INST_ID KGLLKADR KGLLKUSE KGLLKSES KGLLKSNM KGLLKHDL KGLLKPNC KGLLKPNS KGLLKCNT KGLLKMOD KGLLKREQ KGLLKFLG KGLLKSPN KGLLKHTB KGLNAHSH KGLHDPAR KGLHDNSP USER_NAME KGLNAOBJ

---------------- ---------- ---------- ---------------- ---------------- ---------------- ---------- ---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------------- ---------- ------------------------------ ------------------------------------------------------------

800003FB0007E4D0 33 2 C00000011A449E20 C000000109F02C68 C000000109F02C68 30 C000000122E2A6D8 00 00 0 0 2 0 463 C00000011A4BC350 3990848181 C000000122E2A6D8 1 PUBUSER CSNOZ629926699966

800003FB0007E5B0 34 2 C00000011A44A150 C000000108C99E28 C000000108C99E28 37 C000000122E2A6D8 00 00 1 3 0 0 179 C00000011A4BB328 3990848181 C000000122E2A6D8 1 PUBUSER CSNOZ629926699966

SQL> set linesize 100

SQL> l

1* select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ

SQL> /

ADDR INDX INST_ID KGLLKADR KGLLKUSE KGLLKSES KGLLKSNM

---------------- ---------- ---------- ---------------- ---------------- ---------------- ----------

KGLLKHDL KGLLKPNC KGLLKPNS KGLLKCNT KGLLKMOD KGLLKREQ KGLLKFLG

---------------- ---------------- ---------------- ---------- ---------- ---------- ----------

KGLLKSPN KGLLKHTB KGLNAHSH KGLHDPAR KGLHDNSP USER_NAME

---------- ---------------- ---------- ---------------- ---------- ------------------------------

KGLNAOBJ

------------------------------------------------------------

800003FB0007E4D0 33 2 C00000011A449E20 C000000109F02C68 C000000109F02C68 30

C000000122E2A6D8 00 00 0 0 2 0

463 C00000011A4BC350 3990848181 C000000122E2A6D8 1 PUBUSER

CSNOZ629926699966

800003FB0007E5B0 34 2 C00000011A44A150 C000000108C99E28 C000000108C99E28 37

C000000122E2A6D8 00 00 1 3 0 0

179 C00000011A4BB328 3990848181 C000000122E2A6D8 1 PUBUSER

CSNOZ629926699966

SQL>

————待续————

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有