昨天一备份数据库,在open read only的过程中出现故障,从而导致在standby上备份失败。alert log 中信息如下:
Errors in file /opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 13914
ORA-1092 signalled during: alter database open read only...
Tue Aug 10 09:31:51 2004
再查看trace file信息
[root@crmsb bdump]# more?? /opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc
/opt/oracle/admin/crmhz/udump/crmhz_ora_13914.trc
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0
System name:??? Linux
Node name:????? crmsb
Release:??????? 2.4.9-e.3smp
Version:??????? #1 SMP Fri May 3 16:48:54 EDT 2002
Machine:??????? i686
Instance name: crmhz
Redo thread mounted by this instance: 1
Oracle process number: 11
Unix process pid: 13914, image: oracle@crmsb (TNS V1-V3)
*** SESSION ID:(10.3) 2004-08-10 09:30:37.408
*** 2004-08-10 09:30:37.408
Managed Recovery: Cancel posted.
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
都没有提供什么有价值的信息,很奇怪会出现604错误
于是决定trace? open过程
startup nomount
alter database mount standby? database;
select sid from v$mystat where rownum = 1;
select sid,serial# from v$session where sid = ?;
exec dbms_system.set_ev(sid,serial#,10046,12,'');
alter database open read only;
这是数据库crash ,找到trace file,发现末尾部分为
select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0),2))from objauth$ where obj#=:1 group by grantee#,privilege#,nvl(col#,0) order by grantee#
END OF STMT
PARSE #13:c=0,e=485,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=0,tim=1066522815818430
BINDS #13:
?bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
?? bfp=40647620 bln=22 avl=02 flg=05
?? value=72
EXEC #13:c=0,e=518,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=1066522815819173
FETCH #13:c=0,e=58,p=0,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=1066522815819296
=====================
PARSING IN CURSOR #12 len=34 dep=1 uid=0 oct=7 lid=0 tim=1066522815819480 hv=3008185193 ad='56fcbaac'
delete from idl_ub1$ where obj#=:1
END OF STMT
PARSE #12:c=10000,e=14804,p=6,cr=79,cu=0,mis=1,r=0,dep=1,og=0,tim=1066522815819472
BINDS #12:
?bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
?? bfp=40645a48 bln=22 avl=06 flg=05
?? value=4294951150
WAIT #12: nam='db file sequential read' ela= 41 p1=1 p2=22839 p3=1
WAIT #12: nam='db file sequential read' ela= 27 p1=1 p2=9 p3=1
EXEC #12:c=0,e=884,p=2,cr=4,cu=2,mis=0,r=0,dep=1,og=4,tim=1066522815820884
ERROR #12:err=16000 tim=1837753940
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
EXEC #1:c=370000,e=375493,p=304,cr=5505,cu=4,mis=0,r=0,dep=0,og=4,tim=1066522815826303
ERROR #1:err=1092 tim=1837753940
很奇怪,正常standby数据库open read only的时候是没有delete的,在这里居然出现dml。于是一开始我尝试去研究idl_ub1$是个什么表,结果发现是存储 pl/sql代码的,很奇怪,这个数据库为什么会出现个状况,继续在 $ORACLE_HOME/rdbms/admin 下 grep -ir idl_ub1$ * ,结果找到和一个x$ 表挂上关系做连接了,未果,又通过 delete from idl_ub1$ where obj#=:1 中绑定变量value=4294951150 去结合搜索出来的view查找,也没什么进展。实际上,根据 WAIT #12: nam='db file sequential read' ela= 27 p1=1 p2=9 p3=1 可以知道这个delete是访问了file# =1 block#=9 的数据块,很显然,这是系统回滚段的块头,根据这里我们可以知道,这是一个回滚操作,也就是说,在主数据库中由于DDL 的失败,使得系统回滚段中记录了该操作的回退信息,但是数据库又还没有回滚完成,到了standby中就出现这个状况了。
为了尝试解决这个问题,首先继续应用几个日志,再open read only,结果失败,也就是说该失败的ddl在主数据库一直没有回滚,而数据库中已经查找不到任何异常进程或者事务、锁,只能认为是系统出现异常。于是,今天在征得相关负责人的同意的情况下,我们决定在下班后将主数据库重新启动一次看看,这样强迫系统回滚失败的ddl。下班重新启动数据库,切换日志并归档,在standby上应用日志到重起数据库之后产生的归档日志,再open read only 成功,问题得到解决。