Oracle的in和notin――性能实践

在很多软件系统中，系统的性能很打程度上有数据库的性能决定。以前也曾经做过很多次关于性能方面的各种测试，特别是关于oracle的，我想到也应该记录下来一部分，为大家共享。

事情发生在我们的系统从sqlserver移植到oracle，用户在一个查询的操作上等待的时间无法忍受了，我们关于这个查询的处理与原来的方式一下，难道sqlserver 同oracle有什么地方不一样么，让我们来看看oracle有什么地方有问题，或者是我们使用的有问题？

业务问题大概可以这样描述，一个父表，一个子表，查询的结果是找到子表中没有使用父表id的记录，这种情况估计很多系统都会牵涉得到。让我们来举一个例子：

表一：父表 parent

表二：子表 childen

父表存储父亲，子表存储孩子，然后通过pid和父表关联，查询需要的结果是找到尚未有孩子的父亲。

我们来看一下查询语句的写法：

select * from parent where id not in (select pid

from childen)

这种标准的写法在子表存在50万条的记录的时候，查询时间超过了10秒，远远大于原来的sql server服务器的一秒。我在解决的时候想到了一个方法：

select * from parent where id in

( select id from parent minus select pid

from childen )

正常理解下，这个语句应该更加费时，但是事实完全出乎意料，这条语句不仅仅在子表存在大量记录的情况下速度良好，在子表少量数据的情况下速度也非常的好，基本在1秒内完成。

这个结果可以很明显的证明oracle 在子查询的内部处理的时候，使用 in 和 not in 的巨大区别，希望用到这种方式的用户注意，也期待有人解释其中的问题。

附录：测试数据的语句

-- create parent table

drop table parent;

create table parent(id varchar(10),name varchar(100), primary key (id) );

-- create childen table

drop table childen;

create table childen(id varchar(10),pid varchar(10), name varchar(100), primary key (id) );

-- Create/Recreate primary, unique and foreign key constraints

alter table CHILDEN

add constraint fk_123 foreign key (PID)

references parent (ID);

-- add test date for parent

-- Created on 2004-11-29 by GUIP

declare

-- Local variables here

i integer;

begin

-- Test statements here

i := 0;

delete from parent;

loop

i := i + 1;

dbms_output.put_line(i);

insert into parent(id, name) values(i, 'name ' || i);

if (i mod 100=0) then

commit;

end if;

exit when i 1000;

end loop;

commit;

end;

-- add test date for childen

-- Created on 2004-11-29 by GUIP

declare

-- Local variables here

i integer;

j integer;

begin

-- Test statements here

i := 0;

delete from childen ;

loop

j := 0;

loop

i := i + 1;

j := j + 1;

insert into childen(id, pid, name) values(i, j, 'name ' || j);

if (i mod 100=0) then

commit;

end if;

exit when j= 50;

end loop;

exit when i = 10000 * 50;

end loop;

commit;

end;