串比较的若干算法归类————普通、KMP算法

KMP算法不是我想出来的，搜索一下网上描述很多，这里会在最后引用一段描述，给出的仅仅是算法。而且你会惊奇的发现，这个算法简直就是动过小手术的一个算法的重载，这里写下就是方便你的记忆，下面开始：

//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

// Produced By fishstudio @ Sep. 2005

//@description:this(or maybe these)program contains several algorthims in only one function in the C

// programming language.But, if you want to use it,please select the one you want to use

// for copying all of it will cause a mount of errors while compiling.Maybe I should make it

// in a more advanced IDE(e.g: MS Visual Studio.net),I could,which will give the users more

// benefits.I like pure things so that's reason.

//@author:Vincent

//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

//................................many codes before the function were discarded

//we have a main string called S,and a module string called P,also the nextval array used to record the

//nextval of each element of the module string.

typedef int Postion;//the postion of the firstly found string

Postion Compare( SString s, SString p){//compare whether there is a string P contains in S,

//if has return the firstly found postion,if not return -1 as the symbol of the failure

int len_s = strlen( s),len_p = strlen( p),i=0,j=0;

while( i < len_s && j < len_p){

if( s[ i ] == p[ j ]) { ++i; ++j; }

else {

i = i - j + 1;

j = 1;

}

//===========================================================the normal function is over

//here comes the KMP function

while( i < len_s && j < len_p){

if( s[ i ] == p[ j ]) { ++i; ++j;}

else j = nextval[ j ];

}

//===========================================================the KMP algorthim is over

//here come the algorthim to calculate the nextval[ i ].

//Notice:the index 0 of each array contains the length of the array

i =1; nextval[ 1 ] = 0; j = 0;

while( i < s[ 0 ]){

if( j == 0 || p[ i ] == p[ j ]){ ++i; ++j;

if( p[ i ] != p[ j ])//if the next element doesn't equal to the last one

nextval[ i ] = j;

else nextval[ i ] = nextval[ j ];

}

else j = nextval[ j ];

}

//============================================================here comes the end

if( j > len_p) return ( i - len_p);

else return -1;

}//the end of the string's compare algorthim

-------------------------------------------------------------------------------------------------------------------------------------------

若干描述，这是网上多见的版本，不知道作者，如果出现版权问题，请与本人联系，与csdn无关

一理论论述：

（1）我们在求主串的匹配位置的时候，可以用游标卡尺的模型，即相对移动来思维，当找到匹配时或者是游标到底时游标停止移动，而移动越快，则耗时越短，而i就是主标，j就是游标；

（2）关于为什么要求next[j]

大家知道子串与模式串失配时，一定前边所比较过的一定匹配，即有"S[i-j+1],S[i-j+2],...S[i-1]"="T[1],T[2],...T[j-1]",在这个时候，我们一定要找出在不遗漏可能的匹配串条件下,子串相对于主串所能移动的最大元素个数，也等效于找出next[j]；

（3）求next[j]本意是找出应当和当前失配标度i(主尺标度）所指S[i]所比较的模式串的标度next[j]（游尺标度）所指的T[next[j]]，而要做到这一点，只要找出next[j]即可；

（4）求出k=next[j]，相当与模式串相对于主串前进（j-k）个元素；

（5）关于如何求next[j] 普通算法

核心是充分利用已经知道的信息，我们知道我们人（即古典算法）在看最多能移动多少长度时，一定是先拿T[1]和从S[i-j+1]开始起的元素进行比较，找到相同的再比较下个，由于目前所知道信息不包括S[i]（其实失配时，还是获得信息,在改进算法中就利用到了这一点）以及其以后的元素，一定无法利用，也就是说如果存在移动最大长度时，一定有"T[1],T[2],...,T[k-1]"="S[i-k+1],S[i-k+2],...,S[i-1]}"也就是

“将Next[j]解释为p1 p2 …… pj 中最大相同前缀子串和后缀子串（真子串）的长度较容易理解。

提供一个Next[j]的计算方法：

当j=1时，Next[j]=0；

当j>1时，Next[j]的值为模式串的位置从1到j-1构成的串中所出现的首尾相同的子串的最大长度加1，无首尾相同的子串时 Next[j]的值为1。”

（引用自青玉案的《KMP》）。

具体的求法书上有介绍,这个我只补充一点:我觉得书上所说求模式串与其自身匹配位置太过抽象,尽管本质确实如此,下边我会有更具体的方式.

(6)关于求next[j]的改进算法

其实本改进算法中唯一添加了的就是一个判断语句,它的核心正如上边所提到的,在于利用了求nextval[j]时,nextval[j]一定失配;并且将利用的工作引入准备工作---求nextval[].