使用索引的误区之五:空值的妙用
并不是因为完全为空的条目不被记录到索引中,就坚决不能使用空值,相反,有时候合理使用oracle的空值会为我们的查询带来几倍甚至几十倍的效率提高。
举个例子,加入有一个表,里面有个字段是“处理时间”,如果没有处理的事务,该列就为空,并且在大部分情况下,处理的事务总是在总记录数的10%或者更少,而等待处理的记录(“处理时间”这列为空)总是绝大多数的记录,那么在“等待时间”这列上建立索引,索引中就总是会保存很少的记录,我们希望的访问方式是,当访问表中所有代处理的记录(即10%或者更多的记录数目)时,我们希望通过全表扫描的方式来检索;然而,当我们希望访问已经处理的事务(即5%或者更少的记录数目)时,我们希望通过索引来访问,因为索引中的记录数目很少,请看下面的例子:
SQL> create table tt as select * from sys.dba_objects;
Table created
Executed in 0.601 seconds
SQL> alter table tt add (t int);
Table altered
Executed in 0.061 seconds
SQL> select count(*) from tt;
COUNT(*)
----------
6131c
Executed in 0.01 seconds
SQL> UPDATE tt set t=1 where owner='DEMO';
10 rows updated
Executed in 0.03 seconds
SQL> COMMIT;
Commit complete
Executed in 0 seconds
SQL> select count(*) from tt where OWNER='DEMO';
COUNT(*)
----------
10 ――――――――――――――已经处理的数目
Executed in 0.08 seconds
s
SQL> select count(*) from tt;
COUNT(*)
----------
6131 ――――――――――――――总记录数目
Executed in 0.01 seconds
下面的查询因为访问表中的大多数记录(代处理的记录,即10%以上的记录数目),可以看见,它如我们所希望的那样使用了全表扫描:
select object_name from tt where t is null;
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
|* 1 | TABLE ACCESS FULL | TT | | | |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TT"."T" IS NULL)
Note: rule based optimization
14 rows selected
Executed in 0.05 seconds
下面的查询因为要访问表中的少数记录,我们希望通过索引来访问:
select object_name from tt where t=1;
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
|* 1 | TABLE ACCESS FULL | TT | | | |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TT"."T"=1)
Note: rule based optimization
14 rows selected
Executed in 0.06 seconds
请注意,这里并没有如我们所希望的那样使用索引,而是使用了全表扫描,这里有一个结论:
建立了索引后,要想在CBO下合理的使用索引,一定要定期的更新统计信息
下面我们分析一下索引,看看有什么效果:
SQL> analyze index tt_idx validate structure;
Index analyzed
Executed in 0 seconds
SQL> select lf_rows from index_stats;
LF_ROWS
----------
10 ――――――――――索引中总共有10行
Executed in 0.05 seconds
SQL> exec dbms_stats.gather_index_stats('DEMO','TT_IDX');
PL/SQL procedure successfully completed
Executed in 0.03 seconds
SQL> SELECT DISTINCT_KEYS FROM USER_INDEXES;
DISTINCT_KEYS
-------------
1 ――――――――――只有一个键值
Executed in 0.05 seconds
SQL> select * from tt where t is null;
已选择6121行。
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'TT'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
485 consistent gets
0 physical reads
0 redo size
355012 bytes sent via SQL*Net to client
4991 bytes received via SQL*Net from client
410 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6121 rows processed
SQL> select * from tt where t=5;
未选定行
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TT'
2 1 INDEX (RANGE SCAN) OF 'TT_IDX' (NON-UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1 consistent gets
0 physical reads
0 redo size
964 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL> select * from tt where t=1;
已选择10行。
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TT'
2 1 INDEX (RANGE SCAN) OF 'TT_IDX' (NON-UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1639 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL> update tt set t=2 where t=1;
已更新10行。
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'TT'
2 1 INDEX (RANGE SCAN) OF 'TT_IDX' (NON-UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
14 db block gets
1 consistent gets
0 physical reads
3216 redo size
616 bytes sent via SQL*Net to client
527 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
10 rows processed
SQL> set autotrace traceonly
SQL> update tt set t=3 where t is null;
6121 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'TT'
2 1 TABLE ACCESS (FULL) OF 'TT'
Statistics
----------------------------------------------------------
0 recursive calls
18683 db block gets
80 consistent gets
0 physical reads
2583556 redo size
618 bytes sent via SQL*Net to client
533 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6121 rows processed
SQL>