ORA FAQ 性能调整系列之——当索引第一列由序列产生，一个逆序索引有什么用？

ORA FAQ 性能调整系列之——

The Oracle (tm) Users' Co-Operative FAQ

Why would a reverse index be useful when the leading column of the index is generated from a sequence ?

当索引第一列由序列产生，一个逆序索引有什么用？

--------------------------------------------------------------------------------

Author's name: Norman Dunbar

Author's Email: Oracle@bountifulsolutions.co.uk

Date written: 18/04/2004

Oracle version(s): 9.2.0.3

--------------------------------------------------------------------------------

When you store data in an indexed table, certain columns of data are copied into the index alongside the rowid of the data row in the table. The data in the table is stored 'randomly', or at least, not necessarily in the order you put them there.

当你在一个索引表中存储数据时，数据的某些列同表中数据行的ROWID被拷贝进索引。表中的数据是“随机地”存放的，或者至少不需要按照你放入它们的顺序。

The index entries, on the other hand, must be stored in order, otherwise the usability of the index is removed. If you could walk through the entries in an index, you would see that they are in order, usually ascending, but since 8i, descending also works.

而索引项应当按顺序存储，否则其毫无用途。假如你深入一个索引的各项，会看到他们是有序的，通常升序，但从8i开始，也可以降序。

Entries are stored in order of their internal representation, not necessarily the same as what you see on screen when you SELECT the columns from a table.

项目是根据内部表示的顺序存储的，并不需要与你从表中SELECT列时在屏幕上看到的一致。

If the indexed column(s) contain character data (CHAR, NCHAR, VARCHAR2 or NVARCHR2) then the data will appear on screen exactly as it does in the index. For example, if the column contains 'ORACLE' the index entry will also be 'ORACLE'.

若索引列含有字符数据（CHAR, NCHAR, VARCHAR2或NVARCHR2），那么数据在屏幕上的显示将和在索引中一致。例如，若列中含有“ORACLE”，那么索引项中也是“ORACLE”。

We can use the DUMP command to show us the internal representation of any data type. This command takes four parameters. The first is the data you wish to dump, the second is the base you wish to dump it in. The default is 10 which means that DUMP will display the characters in decimal, or base 10. The other allowed values are 8 (Octal), 16 (Hexadecimal) or 17 (Characters).

我们可以使用DUMP指令来显示任何数据类型的内部表示。这一命令有四个参数，第一个是要dump的数据，第二个是要dump出的基数，缺省是10，即DUMP将显示十进制字符。其他的可选参数是8（八进制），16（十六进制），17（字符）。

The third parameter is the start position in the data you wish to dump from and the final parameter is the amount of data you wish to dump. All but the first parameter have sensible defaults.

第三个参数是数据中要dump的开始位置，最后一个参数是要dump的数据量。除了第一个参数外，其他都有合理的缺省值。

Using DUMP, we can see the individual character codes for our 'ORACLE' data :

使用DUMP，我们可以看到“ORACLE”数据的各个字符码：

SQL> select dump('ORACLE',10) from dual;

DUMP('ORACLE',10)

-------------------------------

Typ=96 Len=6: 79,82,65,67,76,69

We can prove that this is correct by converting back from decimal character codes to actual characters :

我们可以证实，可以正确的将十进制字符码转换回真实字符：

SQL> select chr(79),chr(82),chr(65),chr(67),chr(76),chr(69) from dual;

C C C C C C

- - - - - -

O R A C L E

We could have used base 17 to do the same thing :

我们可以使用17作为基数实现同样的事情：

SQL> select dump('ORACLE',17) from dual;

DUMP('ORACLE',17)

-------------------------

Typ=96 Len=6: O,R,A,C,L,E

Numeric columns are very mUCh different. The internal format of a number is different from that which appears on screen after a SELECT because the internal format is converted to ASCII format so that it can be displayed. We can see this in the following, first in character format :

数字列是不同的。

数的内部格式与SELECT后在屏幕上的显示是不同的，内部格式转换为ASCII格式才能正常显示出来。我们可以如下察看，首先是字符格式：

SQL> select '1234' as "1234",

2 dump('1234', 17)

3 from dual;

1234 DUMP('1234',17)

---- ---------------------

1234 Typ=96 Len=4: 1,2,3,4

Then in internal format :

然后内部格式：

SQL> select 1234 as "a number",

2 dump(1234, 17)

3 from dual;

a number DUMP(1234,17)

---------- --------------------

1234 Typ=2 Len=3: c2,^M,#

The first columns in both examples look identical, but this is only because SQLPlus has converted the internal format of the number 1,234 into the character format so that the display device (the monitor screen) is able to show it. Binary characters have a nasty tendency to disrupt character devices like computer monitors when running in text mode.

两个例子的第一列看起来一样，但这只是因为SQLPlus将数1,234的内部格式转换为了字符格式，从而显示设备（显示器屏幕）可以显示它。二进制字符总是会讨厌的扰乱运行于文本模式的类似计算机显示器这样的字符设备。

Take a look at the second column in the above examples and notice the difference. In the first example we see the individual characters '1', '2', '3' and '4' while the second example shows only three bytes in the internal format of the number 1,234. Lets change the DUMP calls slightly, and do the whole lot in one command :

看看上面例子的第二列并注重不同点。第一个例子中我们看到一个一个字符'1', '2', '3'和'4'，而第二个例子只显示数1,234内部格式中的三个字节。让我们稍微修改一下DUMP的调用，用一个指令完成全部比较：

SQL> select '1234' as "1234",

2 dump('1234', 10),

3 1234 as "a number",

4 dump(1234, 10)

5 from dual;

1234 DUMP('1234',10) a number DUMP(1234,10)

---- ------------------------- ---------- ----------------------

1234 Typ=96 Len=4: 49,50,51,52 1234 Typ=2 Len=3: 194,13,35

This time, we see the actual character codes used internally. Once again columns 2 and 4 differ. Column 4 is showing three bytes and these three bytes are the internal binary representation of the number 1,234.

这次，我们看到内部真正的字符码。列2和4还是不同。列4显示了三个字节，这三个字节就是数1,234的内部二进制表示。

It is this binary representation that is used in the index entry when a number column is indexed.

当数字列建立索引时，正是这个二进制表示用于索引项中。

Take a few minutes and eXPeriment with dumping a few other numbers - stick to integers for now as those are what sequences generate.

用几分钟来实验一下dump一些其他数字——现在关注整数，正是序列生成的类型。

SQL> create table test (a number);

Table created.

SQL> begin

2 for x in 1 .. 1e6

3 loop

4 insert into test values (x, substr(dump(x,10), 14));

5 end loop;

6 end;

7 /

PL/SQL procedure successfully completed.

If we have a look at the 'b' column of the table, we can see that each entry is ascending in a similar manner to the 'a' column. Here are the first 20 rows :

假如我们看看表的b列，可以看到每个条目是递增的，正和a列类似。

这里是前20行：

SQL> col b format a20 wrap

SQL> select a,b from test where a < 21;

A B

---------- ----------

1 193,2

2 193,3

3 193,4

4 193,5

5 193,6

6 193,7

7 193,8

8 193,9

9 193,10

10 193,11

11 193,12

12 193,13

13 193,14

14 193,15

15 193,16

16 193,17

17 193,18

18 193,19

19 193,20

20 193,21

The entries are very similar and all have the same leading byte.

各条目非常类似且均有一个前导字节。

How sequences affect indexes.

As mentioned above, index entries have to be stored in order, however, the table data need not be. If your indexed column is fed by a sequence, the data will be similar to the 20 rows shown above.

序列如何影响索引

如前所述，索引项必须按序存储，但是表中的数据不需要。若你的索引列来自一个序列，数据将类似于上面显示的20行。

Similar entries will group together in the index, so the index blocks will split as necessary and new entries will end up all hitting the same block until it too fills up and splits.

相似的项将在索引中同一组中，所以索引块将按需分割，且新的项将填入同一个块，直至该块已满并分割。

If you have one person running the application, this isn't too much of a problem. If the application is multi-user then it means that every user will tend to write into the same index block and buffer busy waits will be the outcome as transactions 'queue' to write data to the hottest index block around.

若单用户执行应用程序，这不是什么问题。若应用是多用户的，则意味着每个用户将写入同一个索引块。事务“排队”来写数据到最“热”的索引块中将造成缓冲忙等待。

Back in our small test, if you select more data from the test table, you will find that in the 1 million rows, there are only 4 different values for the leading byte on the internal numeric format and even worse, most of the entries in the index have the same leading byte value :

回到我们的小实验，若从测试表选择更多的数据，会发现在一百万行中，内部数据格式的前导（/起始）字节仅有4个不同的值，甚至更糟，索引中大多数项有相同的前导（/起始）字节：

SQL> select substr(b,1,3),count(*)

2 from test

3 group by substr(b,1,3);

SUB COUNT(*)

--- ----------

193 99

194 9900

195 990000

196 1

I cheated and discovered that there was a comma in position 4 of every row in the table that's how I knew to use a three character length in my SUBSTR.

我讨了个巧并发现表中每行的第4位是一个逗号，这就是为什么我知道在SUBSTR中使用一个三位字符长度。

What the above shows is that in an index of 1 million sequential entries, the vast majority have the same leading byte and so will all be trying to get into the same block in the index.

上面所示的是在一个一百万序列项的索引中，绝大多数有一个相同的前导（/起始）字节，从而所有（操作）将使用索引中的同一块。

How reverse indexes cure the problem.

A reverse key index stores the bytes of the indexed column(s) in reverse order, so the data 'ORACLE' is actually stored in the index as 'ELCARO'. Using a reverse index on a column fed by a sequence spreads the location of sequential numbers across a wider range of leaf blocks and the problem of a single hot block is removed because the index entries are stored in reverse order.

逆序索引如何解决这个问题

一个逆序键索引按照逆序存储索引列的字节，所以数据“ORACLE”在索引中保存为“ELCARO”。

在序列填充的列上使用逆序索引将序列值分摊到一个更广的叶块上，由于索引项存储为逆序，一个单独的“热”块的问题就不存在了。

SQL> alter table test add (c varchar2(30));

Table altered.

SQL> update test set c = substr(dump(reverse(a),10),14);

1000000 rows updated.

SQL> select substr(c,1,instr(c,',')-1),count(*)

2 from test

3 group by substr(c,1,instr(c,',')-1)

4 order by to_number(substr(c,1,instr(c,',')-1))

SUB COUNT(*)

--- ----------

2 10102

3 10101

4 10101

5 10101

All other numbers between 6 and 95 inclusive, have 10,101 entries each.

96 10101

97 10101

98 10101

99 10101

100 10101

99 rows selected.

This time, our 1 million row index entry has it's leading byte value spread across 99 (100 if you include a value for zero) different values, rather than just 4. In addition, the actual reversed bytes are fairly randomly scattered across each of the different values too.

这次，我们一百万行的索引项将前导（/起始）字节分摊到99（若包括了0则有100个）个不同的值，而不是仅为4。进一步的，事实上逆序字节还相当随机的分散在每一个不同的值上。

As more entries are added to the index, blocks will be split to accomodate the new entries in their proper location. As the data is arriving almost 'randomly' by means of the reversing of the actual data bytes for the index, the index itself will be extended to accomodate these new values. However, rather than always being stored in the same single 'hot' index block, new entries will be spread across a number of existing blocks (assuming the index has been around for a while) thus reducing contention. Of course, block splits will still occur on these blocks as new values fill up the existing block but it's happening all over the index not just in one place.

随着更多的项目添加到索引中，各块将分割以在合适的位置容纳新的项目。由于从索引中数据字节逆序处理来说，数据是几乎“随机”到来的，索引自身扩展以容纳新值。但是，与总是存储到同一个“热”块中不同，新的项目将分摊到多个已有的块（假定索引已运作了一段时间）中。当然，当新值填满这些块时仍会发生块分割，但这将在索引的各部分发生，而不是一个地方。

This is the reason why reversing the index when its leading column is fed by a sequence reduces buffer contention, removes the hot block problem and by doing so, reduces the potential for buffer busy waits on a multi-user system.

这正是为何当前导（/起始）列由序列生成时索引的逆序处理可以降低缓存争用，解决热块问题，并且这样做可以在一个多用户系统中减少缓存忙等待的机会。

Drawbacks to Reverse Key Indexes

Of course, there are drawbacks as well. By setting up a reverse key index you are increasing the clustering factor of the index. The clustering factor (from USER_INDEXES) is used by the optimiser (CBO) to determine how best to Access data in an INDEX RANGE SCAN. If the clustering factor is roughly equal to BLOCKS minus FREE_BLOCKS from USER_TABLES then the chances are that a range scan will read one index block, and locate all (or nearly all) of the data rows in needs in one or more adjacent blocks in the table.

逆序键索引的不足点

当然，也有不足之处。

设置一个逆序索引，增加了索引的簇因子（Clustering Factor）。优化器（CBO）利用簇因子（来自USER_INDEXES）来决定在一次INDEX RANGE SCAN中如何最好的访问数据。若簇因子几乎与BLOCKS - FREE_BLOCKS（来自USER_TABLES）相近，一个区间扫描将可能读一个索引块，然后根据需要在一个或更多的临近块中定位所有（或接近所有）数据行。

On the other hand, if the clustering factor is close to NUM_ROWS in USER_TABLES then the chances are that the entries stored together in one index block are likely to be scattered throughout a wide range of table blocks - so the index range scan may not be chosen as a good method of access.

另一方面，若簇因子接近USER_TABLES中的NUM_ROWS，一个索引块中存储接近的项目就很可能分散到很多的表块中——所以索引区间扫描就可能不是一个访问的好方法了。

Obviously the above applies to an analysed table and index.

显然上面应用于一个分析了的表和索引。

--------------------------------------------------------------------------------

In a quick test on a table with 100,000 rows loaded using a sequence, a normal index was used for most queries returning up to 30 rows, as was the reverse key index, however, when the number of rows went up to 1,000,000 the reverse key index was never used and a full table scan was used every time.

在一个用一个序列载入并含100,000行的表上做一个快速的测试，分别用普通索引和逆序索引查询并返回30行数据，当行数增长到1,000,000，将不再使用逆序索引而是每次使用全表扫描。