早晨到办公室听同事说表被锁了,一试,发现表中某字段为1111111的行都被锁了,SELECT都不行。报错误ORA-01591,打开TOAD的Knowledge eXPert,描述很少,只是说由于分布式事务错误而造成锁定。
询问同事,昨天通过一个存储过程调用另一个存储过程出了错误,而后者通过透明网关insert一些数据到SQl Server数据库。
立即想到打开OEM,谁知道大失所望,进入锁,根本没发现相关的对象被锁定,开始有点郁闷。转而检查会话,该用户有5个会话,都是INACTIVE,不管三七二十一,全部杀掉。结果依旧,并且锁也没有出现。远程登陆上主机,发现CPU和进程都正常,也没有发现透明网关进程挂死(之前曾发现TG4SQL在无业务量时也会出现25%左右的CPU,挂死)。
忽然想到看看alert.log,经过仔细搜索,终于发现:
Wed Nov 17 00:00:04 2004
Errors in file d:\Oracle\admin\xdcj\udump\xdcj_j006_3020.trc:
ORA-12012: 自动执行作业 82 出错
ORA-01591: 锁定已被有问题的分配事务处理6.5.887985挂起
ORA-06512: 在line 6
这正是出错的地方,往前追溯:
Tue Nov 16 17:35:04 2004
Error 28500 trapped in 2PC on transaction 6.5.887985. Cleaning up.
Error stack returned to user:
ORA-02054: 事务处理6.5.887985有问题
ORA-28500: 连接 ORACLE 到非 Oracle 系统时返回此信息:
[Transparent gateway for MSSQL]
ORA-02063: 紧接着2 lines(源于ZSMOS_CRM)
Tue Nov 16 17:35:04 2004
DISTRIB TRAN QDCJ.US.ORACLE.COM.5ae32328.6.5.887985
is local tran 6.5.887985 (hex=06.05.d8cb1)
insert pending prepared tran, scn=6606197672830 (hex=602.2010cb7e)
Tue Nov 16 17:35:07 2004
Errors in file d:\oracle\admin\xdcj\bdump\xdcj_reco_3024.trc:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]用户 'RECOVER' 登录失败。 (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from ZSMOS_CRM
Tue Nov 16 17:35:12 2004
Errors in file d:\oracle\admin\xdcj\bdump\xdcj_reco_3024.trc:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Transparent gateway for MSSQL][Microsoft][ODBC SQL Server Driver][SQL Server]用户 'RECOVER' 登录失败。 (SQL State: 28000; SQL Code: 18456)
ORA-02063: preceding 2 lines from ZSMOS_CRM
这就是事发地点了。看来是昨天下午远程事务失败,但是又没有返回造成分布式事务挂死,从而锁定了行。终于找到了具体的错误ORA-02054,进入TOAD一查,说是要等待或者提交该事务,可是怎么操作呢。还是打开官方文档搜索相关内容,在Adminstrator Guide中发现如下内容:
Discovering Problems with a Two-Phase Commit
The user application that commits a distributed transaction is informed of a problem by one of the following error messages:
ORA-02050: transaction ID rolled back,
some remote dbs may be in-douBT
ORA-02051: transaction ID committed,
some remote dbs may be in-doubt
ORA-02054: transaction ID in-doubt
A robust application should save information about a transaction if it receives any of the above errors. This information can be used later if manual distributed transaction recovery is desired.
No action is required by the administrator of any node that has one or more in-doubt distributed transactions due to a network or system failure. The automatic recovery features of Oracle transparently complete any in-doubt transaction so that the same outcome occurs on all nodes of a session tree (that is, all commit or all roll back) after the network or system failure is resolved.
In extended outages, however, you can force the commit or rollback of a transaction to release any locked data. Applications must account for sUCh possibilities.
Determining Whether to Perform a Manual Override
Override a specific in-doubt transaction manually only when one of the following situations exists:
The in-doubt transaction locks data that is required by other transactions. This situation occurs when the ORA-01591 error message interferes with user transactions.
An in-doubt transaction prevents the extents of a rollback segment from being used by other transactions. The first portion of an in-doubt distributed transaction's local transaction ID corresponds to the ID of the rollback segment, as listed by the data dictionary views DBA_2PC_PENDING and DBA_ROLLBACK_SEGS.
The failure preventing the two-phase commit phases to complete cannot be corrected in an acceptable time period. Examples of such cases include a telecommunication network that has been damaged or a damaged database that requires a long recovery time.
Normally, you should make a decision to locally force an in-doubt distributed transaction in consultation with administrators at other locations. A wrong decision can lead to database inconsistencies that can be difficult to trace and that you must manually correct.
If the conditions above do not apply, always allow the automatic recovery features of Oracle to complete the transaction. If any of the above criteria are met, however, consider a local override of the in-doubt transaction.
看来是建议差不多,后面Oracle总是试图登录SQl Server就是要自动恢复,可是总不成功。
察看视图DBA_2PC_PENDING确实发现了该事务的痕迹。要怎样操作呢?
Manually Committing an In-Doubt Transaction
Before attempting to commit the transaction, ensure that you have the proper privileges. Note the following requirements:
If the transaction was committed by... Then you must have this privilege...
You
FORCE TRANSACTION
Another user
FORCE ANY TRANSACTION
Committing Using Only the Transaction ID
The following SQL statement commits an in-doubt transaction:
COMMIT FORCE 'transaction_id';
The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.
For example, assume that you query DBA_2PC_PENDING and determine that LOCAL_TRAN_ID for a distributed transaction is 1:45.13.
You then issue the following SQL statement to force the commit of this in-doubt transaction:
COMMIT FORCE '1.45.13';
Committing Using an SCN
Optionally, you can specify the SCN for the transaction when forcing a transaction to commit. This feature allows you to commit an in-doubt transaction with the SCN assigned when it was committed at other nodes.
Consequently, you maintain the synchronized commit time of the distributed transaction even if there is a failure. Specify an SCN only when you can determine the SCN of the same transaction already committed at another node.
For example, assume you want to manually commit a transaction with the following global transaction ID:
SALES.ACME.COM.55d1c563.1.93.29
First, query the DBA_2PC_PENDING view of a remote database also involved with the transaction in question. Note the SCN used for the commit of the transaction at that node. Specify the SCN when committing the transaction at the local node. For example, if the SCN is 829381993, issue:
COMMIT FORCE 'SALES.ACME.COM.55d1c563.1.93.29', 829381993;
See Also:
Oracle9i SQL Reference for more information about using the COMMIT statement
Manually Rolling Back an In-Doubt Transaction
Before attempting to roll back the in-doubt distributed transaction, ensure that you have the proper privileges. Note the following requirements:
If the transaction was committed by... Then you must have this privilege...
You
FORCE TRANSACTION
Another user
FORCE ANY TRANSACTION
The following SQL statement rolls back an in-doubt transaction:
ROLLBACK FORCE 'transaction_id';
The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.
For example, to roll back the in-doubt transaction with the local transaction ID of 2.9.4, use the following statement:
ROLLBACK FORCE '2.9.4';
于是登陆数据库
COMMIT FORCE '6.5.887985';
然后查看DBA_2PC_PENDING发现状态已经改为'COMMIT FORCE',SELECT该表相关行,一切正常。
至此,故障解决。
总体来看,直接INSERT ... TABLENAME@SQLDBLK还是很危险的,遇上不能正常返回就出问题了。Oracle的文档是推荐使用包或者存储过程来解决,此后建议同事改用此方法,目前已经测试通过。