Questions and Answers
Data Blocks and Freelists
Transaction and process freelists
26 October 1998
You mentioned that there are different types of free lists. Could you please explain a bit more about all this?
Each segment has at least a master free list. This free list is implemented as a linked list. The segment header block contains a pointer to the first block on the free list. In the block header for that block is a pointer to the next free list block and so on. The free list pointer in the block header of the last block on a free list is null. The free list header record in the segment header block also contains a pointer to the last block in the free list.
If a segment is created with multiple free lists, then the segment header block also contains a free list header record for each of these process free lists in addition to the master free list.
When a process performs DML that causes a block to fall below the PCTUSED value for the table/cluster, or entirely frees an index leaf block (however, the index case is more complex, because blocks on a free list are not unlinked from the B*-tree structure until an attempt is made to reuse them), that block is put onto a transaction free list. Transaction free lists are dynamically created as necessary, and also have a header record in the segment header block. The transaction free list header also records the transaction id.
When a transaction needs to insert/migrate a row, it uses the first block on its transaction free list, if it has previously freed some blocks (except, for an index, because block splits always occur in a recursive transaction). Otherwise a process free list is used, based on the process number (PID) mod the number of process free lists. If the process free list is empty, or if there are no process free lists, then the master free list is used.
If the master free list is empty, and if there are some transaction free lists for transactions that have committed, then the transaction free list is marked as unused and the blocks are merged into the master free list. Otherwise, the high water mark is raised, initially by 1 block at a time for the first 5 data blocks in the segment, and thereafter by the greater of 5 blocks or _bump_highwater_mark_count blocks (which defaults to 0) times the number of process free lists plus 1 (for the master free list), up to the number of blocks remaining in the extent. These blocks are newed, which means that a free buffer is allocated in the cache and the block header is formatted to make the blocks part of the segment. Raising the high water mark may involve dynamic extension. Once the master free list is not empty, up to 5 blocks are moved to the target process free list if any, from where they can be used.
The number of free list headers that can fit into the segment header block is limited by the database block size. At least half of the free list slots must be available for transaction free lists. You can see the exact number with the query
select kviival from x$kvii where kviitag = 'ktsmtf';
connected as SYS (or internal). On busy segments, more transaction free lists than that may be dynamically created, slots permitting.
If a segment is created with multiple free list groups, one block after the segment header is used for the free list header records for each free list group. There is still a master free list in the segment header, as well as a master free list in each group. Free list group selection is based on the instance number mod the number of free list groups.
As mentioned previously, free list contention occurs when multiple processes using the same free list attempt to modify the data block on the head of the free list concurrently. It is shown in V$WAITSTAT against the data block class. V$WAITSTAT can also show contention for the segment header and free list blocks. This occurs where multiple transaction in the same free list group need to update their free list header records simultaneously. There are various ways of addressing these problems such as rebuilding the table with more free list groups, or increasing _bump_highwater_mark_count, or the novel idea of fixing the application.
To drill down on which segments are causing data block contention, I suggested using event 10046, level 8. This creates a trace file much like to one produced by the sql_trace facility, except that for each event wait a line is printed to the trace file. In particular, each buffer busy wait is recorded together with the P1 and P2 values which are the data file and block number of the wait. So to find which blocks a process has been waiting on, you just grep the trace file for buffer busy waits lines and produce a histogram of the file and block numbers most commonly waited for. Once you have suspect file and block numbers, you can relate them to a segment by querying DBA_EXTENTS. In the case of free list contention on a table it is common to have several hot blocks just below the high water mark for the segment.
There is no CPU cost to having multiple process free lists, and a trivial cost to having multiple free list groups. If you have too many process free lists the worst you get is more free space below the HWM that is not accessible to any particular process. You also get a lower limit on the number of possible transaction free lists you can have.
Data block and free list waits
1 April 1999
In querying the V$WAITSTAT, the results show that 'data block' has a count of 797344 and 'free list' has a count of 0. I have read that if the 'data block' count is high, then I should increase the freelists on the tables that are subject to concurrent inserts. However, if there were freelist contention, would it not show up with a count for the class of freelist? What then does a count for 'data block' mean?
Unless you create multiple freelist groups, you will never see waits for free list blocks in V$WAITSTAT. The freelist headers only go into separate free list block if there are multiple free list groups - otherwise, they go into the segment header block. V$WAITSTAT shows buffer busy waits by the data block class and if there are no free list blocks, then you cannot wait for them!
However, waits for 'data block' class blocks are commonly due to insufficient process freelists. If multiple insert processes use the same process free list concurrently, then they will all attempt to modify the data block on the head of that free list. The associated buffer busy waits are recorded as 'data block' class waits because that is the class of block on which they are trying to establish a buffer lock.
Discrete transactions hold buffer locks for the duration of the transaction and can also cause data block buffer busy waits.
V$WAITSTAT
5 April 1999
I'm looking for help on how to resolve "data block" waits that are found in the V$WAITSTAT table. What causes them and what's the best way to resolve this?
The main cause of this is two processes trying to modify the block at the same time. However, it also occurs when one process is busy reading the block into the buffer cache, but it is not yet there, and another process needs to access the same block. You can tell the difference by looking at P3 in V$SESSION_WAIT if you are lucky enough to catch some of the buffer busy waits there. A 0 or 1014 value is the read case; all other values are the modification contention case.
Your next step should be to try to identify which database segments are involved. Once again, if you can catch some waits in V$SESSION_WAIT then you can use the P1 & P2 (file & block) values to lookup the segment name in DBA_EXTENTS. If it is a table, then 9 times out of 10 all you need to do is to recreate the table with more process freelists. One way of working out how many freelists to create is to dump some blocks from the segment not very far below the high-water mark and take a look at the interested transaction list size. The peak number of interested transactions plus one is the minimum number of process freelists that you need.
Transaction and process freelists
18 April 1999
How can I determine how many transaction freelists are available and how much space they consume? Does it depend on whether the FREELISTS parameter is specified in the STORAGE clause?
Other than the freelist headers in the segment header block (or freelist block if multiple freelist groups have been specified), freelists do not consume any space at all. Blocks are linked into a freelist using a fixed pointer in the block header.
The number of freelists available depends on your database block size and Oracle version. For a 2K block size under 8.0.5 there can be up to 47 process and transaction freelists in total. At least 25 must be reserved for transaction freelists. You can see this number with the following query:
select kviival from x$kvii where kviitag = 'ktsmtf';
Row overhead
19 April 1999
I would like to know if the AVG_ROW_LEN column in USER_TABLES that is populated by the ANALYZE command includes the row directory overhead of 3 * UB1 for each row stored or not.
The row directory overhead is 2 bytes per row, not 3. There is also a 3 byte overhead per row in the data section of the block: 1 for the total row length, 1 for flags, and 1 for the column count. AVG_ROW_LEN does not include the row directory overhead, but does include the other three bytes of row data overhead.
Blocks on freelists
25 May 1999
How can you calculate the number of blocks that are on a freelist, and how can you tell whether or not a block is on a freelist and which one it is on if there are more than one?
You can use DBMS_SPACE.FREE_BLOCKS to count the blocks on the freelists for a particular freelist group, but not down to the level of distinct freelists. It is not feasible to determine which freelist a particular block is on, because the freelists are singly linked. The code would have to start at the freelist header block (segment header) and follow all the freelists one block at a time to determine this, which makes it infeasible on performance grounds.
Freelist groups
24 June 1999
You suggested 2 freelist groups. Now every manual I know of relates this setting to multi-instance Oracle and I believe that the group is just on the instance number. So why 2?
You have buffer busy waits on the segment header blocks for your key tables. By having two freelist groups, you effectively move the freelists out of the segment header block into a separate freelist block. There will be two such freelist blocks, one of which will never be used. But the load that you presently have on the segment header block will be divided between the new segment header block and the first freelist block.
INITRANS and MAXTRANS
29 June 1999
I found an Oracle book that said that the value of INITRANS and MAXTRANS can be low if the index entries are large. Could you tell me why?
When a transaction modifies a block, it needs exclusive use of a transaction slot, from that moment, until it commits (or rolls back). This is used to implement row level locking. If a table block only contains one row, then only one transaction slot can be needed. Similarly, if an index block only has 3 keys, then at most 4 transaction slots can be needed (the extra one is for a recursive transaction for an index block split).
Although the number of row/keys in a block sets the maximum for the number of transaction slots that might be needed, it is most unusual to require that many slots. Indeed, given that Oracle can dynamically allocate extra transaction slots from internal block free space as required, it is questionable whether you should use a non-default INITRANS setting at all, except in unusual circumstances (such as parallel DML). Another exception is where rows/keys are updated in sequence by distinct transactions.
So your author is right about INITRANS in general. However it has more to do with Oracle's ability to dynamically allocate transaction slots than the questionable assumption of random data access. I would also caution against tampering with MAXTRANS.
First INSERT slow
30 June 1999
The main account table is processed during batch by an array fetch and update. The first INSERT to the account table does 55000 disk reads and takes 15 minutes. Subsequent INSERTs work normally.
This sounds like a freelist issue. If your overnight process first drops block space usage below PCTUSED, and then raises it close to PCTFREE again, you will thereby create a very long transaction freelist. The first insert has to migrate that transaction free list to the master freelist, and unlink the leading blocks on the freelist that do not have enough free space for the row to be inserted. You can control the maximum number of blocks that will be unlinked from the freelist by the first insert with the _release_insert_threshold parameter. I think this is supposed to default to 5 blocks, but if my hunch is correct, it seems to be much higher in your case. Alternately, you can reduce PCTUSED to avoid the problem entirely.
Strange enqueue wait
1 October 1999
Why would Oracle incur an enqueue wait on update statements issued by two database sessions that touch two different records if both records are on the same block, but otherwise succeed if the statements update two records from different blocks? SESSION_ID TYPE ID1 ID2 MODE_HELD MODE_REQUESTED
---------- ---- ---------- ---------- -------------- --------------
19 TX 262182 71967 None Share
18 TX 262182 71967 Exclusive None
You can tell from the fact that the waiting session wants a SHARE mode lock that it is waiting for a free transaction slot in the block. The problem is that there is not enough free space in the block to dynamically allocate another transaction slot.
Impact of multiple freelists
25 October 1999
I have found out that freelists on one table was set up to 10. Generally you set freelists to greater than 1 (maybe 2) for heavy inserting tables. My question is, What kind negative impact do high freelists have on a table aside from more storage requirements?
Multiple process freelist have a minimal impact on table storage requirements. On average each process freelist will have either 2 or 3 blocks. That is, the difference between 2 freelists and 10 freelists in terms of storage space is likely to be just 20 more blocks below the highwater mark.
The freelist headers themselves have no storage requirements, as they go into the freelist group block or segment header block. Having a high number of process freelists does limit the space in that block for the dynamic creation of transaction freelists to which blocks that fall below PCTUSED may be returned. However, there is a database block size based minimum to prevent the creation of too many process freelists. That minimum can be seen with
select kviidsc, kviival from x$kvii where kviitag = 'ktsmtf';
Block header size
25 October 1999
I'd like to know the space actually reserved for row data in a block. I suppose it's (total block size - block header size). In this case the problem becomes knowing the block header size.
The common block header takes 20 bytes, and the tail takes 4 bytes. The transaction control header takes 24 bytes, and there are 24 bytes for each transaction slot. Then there is a 14-byte common header for the data layer. Cluster blocks have a table directory, followed by a row directory. Each slot in the table directory takes 4 bytes, and each slot in the row directory takes 2 bytes. Table blocks are the same as cluster block, except that there can only be one table in the table directory. Index blocks have a 16-byte common index block header following the common data block header, and then either a 24-byte branch block header, or a 32-byte leaf block header, depending on the block type. These index headers are followed by a row directory, as for clusters and tables. There is also a 3-byte row header on each row in the data area. So there you have it ... it depends.
Minimum row length
17 November 1999
In a recent article in Oracle Professional magazine, it says
PCTFREE will always default to at least 10 percent, even if its set lower than 10 percent
and to confirm this the author suggests
Create a new tablespace; create a single table with PCTFREE set to 1 or so, and load short rows of just a few characters. It should be easy to find the block and see. Use od on Unix. Anyway, the data appears from the back and moves forward in the block. I would say if there is a significant blank spot ('00') at the beginning of the data, then it is indeed leaving behind some space".
Can you throw any light on this?
The author is mistaken. This phenomenon that he has observed is due to the minimum row length requirement. If he were to use rows longer than 6 bytes of real data, he would be able to fill up a block completely. It is not necessary to resort to od to see the available space, it is much preferable to use an Oracle blockdump because Oracle only logically deletes rows in blocks by changing a flag until such time as the space is needed.
Freelists
8 December 1999
Will an insert statement lock the table being inserted into? If so, is there any way to avoid this? I have one table into which all transactions are inserting. I feel this table is acting as a bottleneck and slowing down all the users. What is the best way to approach this?
Such a table should be recreated with multiple process FREELISTS. To get the appropriate number of freelists, take a blockdump of some of the blocks in the table, and use the maximum number of interested transaction list entries that you see in any one block.
Minimum row length
9 December 1999
I am unaware of a minimum row length. Can you enlighten me some more here?
The minimum row length requirement is that at least 11 bytes of data area are required for each row, to allow for the possibility of all the rows in the block needing to be migrated. The 11 bytes are 2 for a row directory entry, 3 for the row header and 6 bytes for the rowid of the migrated row piece. Therefore no more than (data area size / 11) rows can be placed in any one block.
Freelists
11 December 1999
Is there any benefit in increasing the number of freelists on a single CPU machine? Also, is there any point in doing so if there are no deletes, because all inserts will go to the high-water mark, rather than using freelists?
Yes, I have seen benefit in multiple freelists on a single CPU machine. While buffer lock duration is brief, it is still long enough relative to a time-slice to make a significant impact. The frequency of deletes is not a factor. All inserts use freelists, even when inserting at the high-water mark.
Freelists
27 December 1999
I am trying to understand transaction and process freelists. What is the procedure when Oracle needs to get a block from a freelist?
A process will first check its own transaction freelist, for free space released earlier in the same transaction. Failing that, it goes to the process freelist. If the process freelist is empty, it is populated with up to 5 blocks from the master free list. If the master freelist is empty, then if possible the freelist for a committed transaction will be merged with it - that is all the blocks freed by that transaction are moved to the master free list. Otherwise the HWM is raised and the blocks are formatted and placed on the master freelist. The number of blocks formatted is normally 5, but if _bump_highwater_mark_count has been set, it is the number of process freelists plus 1 times the _bump_highwater_mark_count setting, or the number of unused blocks in the extent, whichever is less. Once the master freelist is populated, a group of up to 5 blocks is moved to the process freelist, from where a block can be taken for use.
Freelist contention
5 January 2000
You suggested that my buffer busy waits were probably due to freelist contention. However, I have an article that explains how to check for freelist contention with the query select * from v$waitstat where class = 'free list';
and it appears that I don't have any freelist contention at all. What else could the problem be?
That article is wrong. That query shows contention on freelist group blocks. By default tables don't have any freelist group blocks, so you are not likely to see contention for them. The freelist headers are stored in the segment header block. Freelist group blocks only exist if you specify multiple freelist groups when creating the table. Even then, contention on the freelist group block (or segment header) is only contention for modifying the freelist header, not the freelist data blocks themselves.
Real freelist contention is when two or more processes look at the freelist header, without modifying it, and then attempt to insert into to the data block on the head of the freelist. The contention occurs on the data block itself and is shown as such in V$WAITSTAT.
If that query were to show a significant number of waits, then that would probably indicate too small a gap between PCTFREE and PCTUSED on some tables.
INITRANS
28 January 2000
In a database with a large concurrent user population and default INITRANS settings, can the cost of dynamically creating transaction slots be significant?
The cost of dynamically creating a transaction slot is trivial. There is an extra change vector in the undo record and two more in the redo record, but no additional locking or latching. It is better to keep data density high, than to attempt to avoid this cost.
Number datatype storage
1 February 2000
I have been doing some testing on the internal representation/storage of the number datatype. What I have found so far puzzled me and would like an expert's analysis of the output shown below. DUMP_NO COL1 VSIZE(COL1) LENGTH(COL1)
======================================== ========== =========== ============
Typ=2 Len=3: 194,10,21 920 3 3
Typ=2 Len=2: 194,93 9200 2 4
The question is, Why would 920 be stored using 3 bytes while 9200 used only 2 bytes?
The base for the exponent is 100 (stored in excess 64 notation with a high-order sign bit). Thus each pair of significant digits is represented in one byte of the mantissa. 9200 only requires one byte in the mantissa, representing 92. Whereas 920 requires two -- one representing 09, and one representing 20.
buffer busy waits
9 February 2000
We have some big tables that all users insert to and update. I noticed some waits on data block and segment header blocks. I could recreate the tables with more FREELISTS, but I can't find a block with an ITL count higher than 1 to get the highest number of concurrent transactions on the table. If I increase the number of FREELISTS, does it help even if it is not as high as the number of concurrent transactions? Is there a drawback in specifying many FREELISTS?
I assume you've been doing some block dumps to look at the ITL count. Well done. That means that you have understood a lot of the issues here already. Increasing the number of FREELISTS will not hurt, but in this case it will probably not help either. There is a second reason for data block waits that you should consider, namely that of reference data that is not being cached in a KEEP buffer pool being simultaneously required by two sessions. You can identify these from the p3 parameter value of the waits, which you can get with our trace_waits.sql script. The segment header contention is possibly caused by having PCTFREE and PCTUSED too close together on some important tables.
Free space not reused
18 February 2000
We have a strange problem on Oracle 7.3.4. For a certain table we replace (delete and insert) 20% of the rows each day. But the free space from the deletes is not re-used by the inserts, so the table acquires a new extent each day. This table contains 1 LONG column, but there are no chained rows (analyze shows: chained_rows = 0 and avg_row_len = 650). The database block size is 2K - a little small, I know! Also PCTFREE is 10 and PCTUSED is 90 so PCTFREE + PCTUSED = 100. What are the consequences of having PCTFREE + PCTUSED = 100, and is this the cause of our problem?
Having PCTFREE + PCTUSED = 100 does not cause free space to go unused, as in your case, but it does increase the risk of buffer busy waits against the segment header block if the table is modified by multiple concurrent sessions. However, that is a different problem.
Assuming default INITRANS you have 1958 bytes available in the data area per block. Each row takes an extra 2 bytes in the row directory, so 3 average rows would require 1956 bytes. It is easy to imagine that some rows being deleted are smaller than average. So let us say that 1% of the rows being returned to the freelist only have 600 bytes free. When searching for a block from the freelist, Oracle will look at no more than the first 5 blocks (by default). Any block that cannot fit the new row, but is still below PCTUSED, is skipped over stays on the head of the freelist. In this case, the delete operation will return all the blocks from which a row has been deleted to the head of the freelist, and the inserts will begin to use those blocks. However, you will soon have 5 of the blocks with only 600 bytes of free space on the head of the freelist. This means that most inserts will scan these 5 blocks, skip them, and get a fresh block at the HWM.
This is part of why it is recommended that there be greater than 1 average row between PCTUSED and PCTFREE. In your case, I would recommend firstly that you drop PCTUSED to 50%. That will enable the unusable blocks on the head of the freelist to be unlinked, and you will soon be using all the presently unusable free space. Long term, this will mean that two rows need to be deleted from any block before it goes on the freelist, but overall this will be a more space efficient solution. Then, as soon as you have opportunity, this database should be rebuilt with an 8K or 16K block size.
I don't understand very well the impact of PCTUSED in this case. How does a smaller PCTUSED resolve the problem of these 5 blocks?
The impact of the PCTUSED setting is that when the block on the head of the freelist cannot accommodate the new row, Oracle can unlink it from the freelist if and only if it is fuller than PCTUSED. If Oracle were to unlink blocks with less free space than PCTUSED, then that block would be most unlikely to ever end up on a freelist again, and once all the rows had been deleted the block would be "lost".
Setting PCTUSED enables blocks with relatively little free space to be unlinked from the freelist, so that blocks with more space further down the freelist can be reached. This is because Oracle will only look at the first 5 blocks by default before formatting a new block above the high-water mark. The number of blocks is configurable via the _walk_insert_threshold parameter. There is also a _release_insert_threshold parameter that also defaults to 5. This limits the number blocks that will be unlinked, rather than just skipped (walked). So in the worst case, an insert may unlink 4 blocks and walk 5 or vice versa before going to the high-water mark.
Thanks. I'll change PCTUSED. When I do, will it take effect immediately, or do I have to reorganize the table?
If you just change PCTUSED the situation will not get any worse. However, to fix the situation properly, you should rebuild the table as soon as possible (if not the whole database).
Tuning table storage
19 February 2000
I'm looking for a tuning utility to help me detecting block chaining and fragmentation, to suggest values for PCTFREE and PCTUSED, and so on.
There are two scripts on this web site that might help you. The first, called row_migration.sql, reports the degree of row migration as a percentage of the number of rows, for each table with migrated rows. It ignores tables with LONGs or possible row lengths greater than the block size, because it is not easy to distinguish row migration from chaining for such tables. A suggested PCTFREE is calculated as the free space left by one less than the number of average rows that can fit in a block. A new PCTUSED is suggested to allow for a little more than one row between PCTFREE and PCTUSED.
The second, called sparse_tables.sql, reports the data density for sparse tables as a percentage of the number of rows that could fit below the high-water mark. A new PCTFREE of 1 is recommended, on the assumption that there is no risk of row migration for such tables. A new PCTUSED is suggested as before.
Warnings: Both these scripts use the statistics recorded in the data dictionary, so you need to have analyzed everything recently if the results are to be trusted. Also, they are based on the assumption of a pseudo-random pattern of insert/update/delete. There are some tables that need more generous values for PCTFREE and PCTUSED, because they have an unusual DML pattern. Nevertheless, you can certainly take this as a better starting point than the defaults of 10 and 40!
Gap between PCTFREE and PCTUSED
23 February 2000
Why does having too small or no gap between PCTFREE and PCTUSED increase the risk of buffer busy waits?
If you have a PCTFREE and PCTUSED too close together, most inserts and most deletes will require the target block to be either taken off or added to a freelist. That involves a change to the segment header or freelist group block that stores the freelist headers. If there is high concurrency of inserts and deletes on the table, they will be single-threaded on the buffer lock for the buffer containing the current mode copy of that freelist header block. This will be shown as buffer busy waits in V$SYSTEM_EVENT. V$WAITSTAT will classify these waits as against segment header class blocks, or free list class blocks, depending on whether the table has multiple freelist groups.
Row migration
15 March 2000
Is there a way to find out which rows are chained? I have a good number of chained rows in a table, and I need to find out which rows and why. The maximum size of the rows cannot exceed 150 bytes, and the database block size is 16K. Updates occur only to NUMBER columns (the values increase) on a daily basis, and to CHAR & DATE columns on a monthly basis. Am I correct that updates on NUMBER columns do not lead to an increase in size?
Assuming that a block has been filled up entirely and Oracle needs to expand a row in that block, will the row migrate to a new block (as it would definitely find a block in which to fit the entire row) or will it chain across to another block? I assume that Oracle would rather migrate than chain. If it migrates then why does it show that it has so many chained rows? Does it mean that migration is reflected as row-chaining?
Updating NUMBER columns to larger values can increase the storage needed. When Oracle reports "chained rows" it includes migrated rows in the count. In your case, they are all migrated rows, because the row length is less than the block size, and as you say, Oracle will migrate in preference to chaining.
You can get the migrated rows using ANALYZE LIST CHAINED ROWS, but I would just rebuild the whole table with a suitable PCTFREE setting. You can use our row_migration.sql script to get a better value for PCTFREE.
MAXTRANS
16 March 2000
What are the pro's and con's of setting the MAXTRANS to the maximum of 255?
That's the default, but you actually cannot get that many, because there is a lower limit imposed by the block size, unless you have a 16K block size and very short rows.
Where did you find this info regarding MAXTRANS? I have checked several books and had no luck with an exact definition. Did you find this info in a particular book, or did it come from a wealth of knowledge?
The bit about the block size limit is in my book. I did not bother to explain how many ITL entries can fit, because it is academic. However, if your curious ...
The minimum row length is 11 bytes. To need 255 ITL entries, you would need 255 rows in the block. That makes 2805 bytes of data. Each ITL is 24 bytes. So 255 would need 6120 bytes. That will not fit in an 8192 byte block, without regard for headers even.
Or put another way, a 16K block with headers and 255 ITLs would have 10198 bytes for data. To get 255 rows in the block, the row length would have to be less than 40 bytes.
Chained fetch ratio
29 March 2000
In my statistics I am getting chained fetch ratio 0.3099 PCTFREE too low for a table
Is there any way I can find out which table has PCTFREE too low?
This could be chaining, rather than row migration. If it is row migration, and if your schema is analyzed, then our row_migration.sql script might help.
Block headers
10 April 2000
I'm starting to take some blockdumps, but it is hard to know what I'm looking at. Can you help me to understand what all those header fields are used for? They seem to take up quite a lot of space.
There are three sets of header information, corresponding to the three layers of the Oracle kernel that handle data blocks. I'll only explain the first two for now, because the third is too big a question to answer. Firstly, all datafile blocks are written and read by the cache layer (KCB) generally through the database block buffer cache. There is a 20-byte header and 4-byte tail that are read and maintained by the cache layer. The cache layer header is also called the common block header.
Blocks that contain user data are further interpreted and maintained by the transaction layer (KTB). These blocks have a transaction layer header immediately following the common block header. The transaction header is comprised of a 48-byte fixed header, including one 24-byte interested transaction slot, and a variable number of additional interested transaction list slots.
The remainder of each block, the data area, is used by the data layer (KD) to store user data. The data area contains a header, free space and row data. The internal structure of the data layer header and row data area is dependent on the block type. Cluster blocks, table blocks, index branch blocks and index leaf blocks each have different stuctures. As I said, I'll not attempt to explain the structure of this area in detail for now, but I will tell you how big the headers are. Here's how it looks in a diagram.
The Cache Header and Tail This is what the cache header and tail look like in a block dump.
buffer tsn: 0 rdba: 0x004000f8 (1/248)
scn: 0x0000.0001ab63 seq: 0x01 flg: 0x00 tail: 0xab630601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
The cache layer header is comprised of the following fields.
database
block address
4 bytes
The tablespace relative database block address (RDBA). This is constructed from the tablespace relative file number, and the block number of the data block within that file.
SCN
6 bytes
The SCN at which the block was last changed. The low-order 4 bytes are called the SCN base, and the high-order 2 bytes are called the SCN wrap.
sequence
1 byte
A sequence number incremented for each change to a block at the same SCN. A new SCN is allocated if the sequence number wraps.
flag
1 byte
(I don't know.)
format
1 byte
The format of the cache header was changed for Oracle8. Under Oracle8, the value is always 2. Previously, it was 1.
checksum
2 bytes
An optional checksum of the block contents. When a block is written, the checksum is either cleared or set depending on the setting of the db_block_checksum parameter. When a block is read, the checksum is verified if present.
block type
1 byte
The most common block types are:
1 = undo segment header block
2 = undo data block
5 = data segment header block
6 = data block
unused
4 bytes
Unused space, possibly for backward or forward compatibility.
The tail is comprised of the low-order two bytes of the SCN base followed by the block type and the sequence number. The consistency of the header and tail is checked whenever a block is read. This detects most block corruptions, in particular split blocks from hot backups.
The Transaction Header
The transaction header consists of 24 bytes of control information followed by an array of one or more transaction slots. This array is called the interested transaction list (ITL), and each slot may contain an interested transaction entry (ITE). The initial number of ITL slots is set by the INITRANS parameter for the segment, except that index branch blocks only have one slot. Cluster and index segments have a minimum INITRANS value of 2. If there is sufficient free space in the data area of the block, additional transaction slots can be dynamically allocated as required, up to a limit dictated by the database block size or the MAXTRANS parameter for the segment.
This is what the transaction header looks like in a block dump.
Block header dump: 0x00400035
Object id on Block? Y
seg/obj: 0x2 csc: 0x00.21ab6 itc: 2 flg: - typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 xid: 0x0000.00f.00000005 uba: 0x0040000b.0000.12 C--- 0 scn 0x0000.000000ac
0x02 xid: 0x0002.003.00000027 uba: 0x00c00004.0019.01 --U- 1 fsc 0x0000.00021b7c
The control information in the transaction header contains the following fields.
object number
4 bytes
The object number of the segment in OBJ$.
cleanout SCN
6 bytes
The SCN at which the last full cleanout was performed on the block.
ITL count
1 byte
Number of entries in the ITL.
flag
2 bytes
A 0 indicates that the block is on a freelist. Otherwise the flag is -.
block type
1 byte
1 = data
2 = index
ITL
freelist slot
1 byte
Index to the first slot on the ITL freelist.
next freelist
block
4 bytes
The RDBA of the next block on the segment freelist.
Null if this block is not on a freelist.
version
1 byte
(I don't know.)
unused
4 bytes
Unused space, possibly for backward or forward compatibility.
Each interested transaction list entry contains the following fields.
transaction id
8 bytes
This is comprised of the rollback segment number (2 bytes), the slot number in the transaction table of that rollback segment (2 bytes), and the number of times use of that transaction table has wrapped (4 bytes).
undo block
address
8 bytes
The location of the undo for the most recent change to this block by this transaction. This is comprised of the DBA of the rollback segment block (4 bytes), the sequence number (2 bytes), and the record number for the change in that undo block (1 byte), plus 1 unused byte.
flags
1 nibble
---- = transaction is active, or committed pending cleanout
C--- = transaction has been committed and locks cleaned out
-B-- = this undo record contains the undo for this ITL entry
--U- = transaction committed (maybe long ago); SCN is an upper bound
---T = transaction was still active at block cleanout SCN
locks
3 nibbles
The number of row-level locks held in the block by this transaction.
SCN or free
space credit
6 bytes
If the transaction has been cleaned out, this is the commit SCN or an upper bound thereof. Otherwise the leading two bytes contain the free space credit for the transaction - that is, the number of bytes freed in the block by the transaction.
The Data Area Immediately following the transaction layer header is the 14-byte common header for the data layer (KDB). The remainder of the data layer header differs for cluster, table and index blocks. Cluster blocks have a table directory, followed by a row directory. Each slot in the table directory takes 4 bytes, and each slot in the row directory takes 2 bytes. Table blocks are the same as cluster block, except that there can only be one table in the table directory. Index blocks have a 16-byte common index block header following the common data block header, and then either a 24-byte branch block header, or a 32-byte leaf block header, depending on the block type. These index headers are followed by a row directory, as for clusters and tables.
The body of each block contain the row data and free space. In general rows are inserted from the bottom of the block. However, block free space is not coalesced when rows are deleted or updated such that the overall row length is reduced. Block free space is only coalesced when more contiguous space is needed for an insert or update than that which is available.