Errors in file /data/Oracle/spFTPrd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:19:03 2005
CKPT: terminating instance due to error 1110
Instance terminated by CKPT, pid = 19896
数据库版本8.1.7.3
从alert.log里面看到很多I/O错误,均发生在datafile的第一个block(block # 1),应该是数据文件头(datafile header).
具体错误如下:
SQL> select * from v$recover_file;
FILE# ONLINE ERROR CHANGE# TIME
---------- ------- --------------------------------------------------
6 ONLINE 7.8537E+12 20051120 01:35:02
615 OFFLINE 7.8537E+12 20051120 01:35:02
696 ONLINE 7.8537E+12 20051120 01:35:02
894 ONLINE CANNOT READ HEADER 0
983 ONLINE CANNOT READ HEADER 0
1031 ONLINE 7.8537E+12 20051120 01:35:02
1035 ONLINE 7.8537E+12 20051120 01:35:02
1480 ONLINE CANNOT READ HEADER 0
8 rows selected.
ALERT LOG
Sun Nov 20 20:15:48 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:16:27 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:17:05 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01171: datafile 615 going offline due to error advancing checkpoint
ORA-01122: database file 615 failed verification check
ORA-01110: data file 615: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_14.dbf'
ORA-01208: data file is an old version - not Accessing current version
Sun Nov 20 20:17:06 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:17:46 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:18:25 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:19:03 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:19:03 2005
Errors in file /data/oracle/spftprd1/admin/bdump/spftprd1_ckpt_19896.trc:
ORA-01110: data file 696: '/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
ORA-01115: IO error reading block from file 696 (block # 1)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Sun Nov 20 20:19:03 2005
CKPT: terminating instance due to error 1110
Instance terminated by CKPT, pid = 19896
有8个数据文件需要恢复,但使用dbv没有查到有corrupted block.类似
DBVERIFY: Release 8.1.7.3.0 - ProdUCtion on Mon Nov 21 01:22:00 2005
(c) Copyright 2000 Oracle Corporation. All rights reserved.
DBVERIFY - Verification starting : FILE = /data/oracle/spftprd1/index2/spftprd1_build_index_01.dbf
DBVERIFY - Verification complete
Total Pages Examined : 256000
Total Pages Processed (Data) : 2121
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 247426
Total Pages Failing (Index): 0
Total Pages Processed (Other): 53
Total Pages Empty : 6400
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
由于没有合适的备份(数据库archive log 模式,但1天前刚resetlog过,还没有备份);上一班的DBA挺boring的,认为是数据文件有坏块,连尝试recover database都没有作。
刚开始我也误如歧途;还想着offline这些数据文件(经过确认全部是索引表空间,没有存储数据段),再重建这1T左右的索引数据。
后来想确认还有哪些datafile有问题,便一个tablespace一个tablespace的恢复,最后把数据库给起来了。真好笑。手生了。
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 6 needs media recovery
ORA-01110: data file 6:
'/data/oracle/spftprd1/index2/spftprd1_build_index_01.dbf'
SQL> recover tablespace build_index;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 696 needs media recovery
ORA-01110: data file 696:
'/data/oracle/spftprd1/index2/spftprd1_CALC_MEDIUM1_INDEX_16.dbf'
SQL> recover tablespace CALC_MEDIUM1_INDEX;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 983 needs media recovery
ORA-01110: data file 983:
'/data/oracle/spftprd1/index13/spftprd1_CAVJ_LARGE1_INDEX_70.dbf'
SQL> recover tablespace CAVJ_LARGE1_INDEX;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1031 needs media recovery
ORA-01110: data file 1031:
'/data/oracle/spftprd1/index2/spftprd1_CALC_LARGE1_INDEX_98.dbf'
SQL> recover tablespace CALC_LARGE1_INDEX;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1035 needs media recovery
ORA-01110: data file 1035:
'/data/oracle/spftprd1/index6/spftprd1_BOXV2_MEDIUM2_INDEX_162.dbf'
SQL> recover tablespace BOXV2_MEDIUM2_INDEX;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1480 needs media recovery
ORA-01110: data file 1480:
'/data/oracle/spftprd1/index13/spftprd1_atds_large1_index_30.dbf'
SQL> recover tablespace atds_large1_index;
Media recovery complete.
SQL> alter database open;
Database altered.
SQL> alter system checkpoint;
System altered.
早知如此就recover database了。
猜测问题是磁盘子系统的异步I/O出了internal的问题;不然怎么会坏在database header.
同时,Oracle support提供了一个bug.还是我们的版本太低了,CKPT不够稳定。
This problem is due to the next bug:
==================================================================
BugTag: Support notes on Bug 2271499 - DDR info BugDesc 2271499
Affects: RDBMS (8-A0)
NB: FIXED
Abstract: CKPT may crash the instance if datafile cannot be accessed
Fixed-Releases: 8175 9014 9202 A000
Tags: CRASH
Details:
If a datafile from a non-system tablespace is inaccessible
the CKPT process may bring down the instance rather than taking
the datafile offline.
==================================================================
SOLUTION / ACTION PLAN
======================
1) Please apply the Oracle Server Patchset 8.1.7.4.0 (32-bit) for Sun Sparc Solaris.
2) Then please apply the next one-off patch:
==================================================================
Patch: 4603673
Description: DIAG MERGE LABEL REQUEST ON TOP OF 8.1.7.4 FOR BUG#4576254 AND MORE
Product: Oracle Database Family
Release: Oracle 8.1.7.4
Platform or Language: Solaris Operating System (SPARC 32-bit)
Last Updated: 17-OCT-2005
Size: 18M (19480540 bytes)
==================================================================