KMP算法不是我想出来的,搜索一下网上描述很多,这里会在最后引用一段描述,给出的仅仅是算法。而且你会惊奇的发现,这个算法简直就是动过小手术的一个算法的重载,这里写下就是方便你的记忆,下面开始:
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// Produced By fishstudio @ Sep. 2005
//@description:this(or maybe these)program contains several algorthims in only one function in the C
// programming language.But, if you want to use it,please select the one you want to use
// for copying all of it will cause a mount of errors while compiling.Maybe I should make it
// in a more advanced IDE(e.g: MS Visual Studio.net),I could,which will give the users more
// benefits.I like pure things so that's reason.
//@author:Vincent
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
//................................many codes before the function were discarded
//we have a main string called S,and a module string called P,also the nextval array used to record the
//nextval of each element of the module string.
typedef int Postion;//the postion of the firstly found string
Postion Compare( SString s, SString p){//compare whether there is a string P contains in S,
//if has return the firstly found postion,if not return -1 as the symbol of the failure
int len_s = strlen( s),len_p = strlen( p),i=0,j=0;
while( i < len_s && j < len_p){
if( s[ i ] == p[ j ]) { ++i; ++j; }
else {
i = i - j + 1;
j = 1;
}
}
//===========================================================the normal function is over
//here comes the KMP function
while( i < len_s && j < len_p){
if( s[ i ] == p[ j ]) { ++i; ++j;}
else j = nextval[ j ];
}
//===========================================================the KMP algorthim is over
//here come the algorthim to calculate the nextval[ i ].
//Notice:the index 0 of each array contains the length of the array
i =1; nextval[ 1 ] = 0; j = 0;
while( i < s[ 0 ]){
if( j == 0 || p[ i ] == p[ j ]){ ++i; ++j;
if( p[ i ] != p[ j ])//if the next element doesn't equal to the last one
nextval[ i ] = j;
else nextval[ i ] = nextval[ j ];
}
else j = nextval[ j ];
}
//============================================================here comes the end
if( j > len_p) return ( i - len_p);
else return -1;
}//the end of the string's compare algorthim
-------------------------------------------------------------------------------------------------------------------------------------------
若干描述,这是网上多见的版本,不知道作者,如果出现版权问题,请与本人联系,与csdn无关
一 理论论述:
(1)我们在求主串的匹配位置的时候,可以用游标卡尺的模型,即相对移动来思维,当找到匹配时或者是游标到底时游标停止移动,而移动越快,则耗时越短,而i就是主标,j就是游标;
(2)关于为什么要求next[j]
大家知道子串与模式串失配时,一定前边所比较过的一定匹配,即有"S[i-j+1],S[i-j+2],...S[i-1]"="T[1],T[2],...T[j-1]",在这个时候,我们一定要找出在不遗漏可能的匹配串条件下,子串相对于主串所能移动的最大元素个数,也等效于找出next[j];
(3)求next[j]本意是找出应当和当前失配标度i(主尺标度)所指S[i]所比较的模式串的标度next[j](游尺标度)所指的T[next[j]],而要做到这一点,只要找出next[j]即可;
(4)求出k=next[j],相当与模式串相对于主串前进(j-k)个元素;
(5)关于如何求next[j] 普通算法
核心是充分利用已经知道的信息,我们知道我们人(即古典算法)在看最多能移动多少长度时,一定是先拿T[1]和从S[i-j+1]开始起的元素进行比较,找到相同的再比较下个,由于目前所知道信息不包括S[i](其实失配时,还是获得信息,在改进算法中就利用到了这一点)以及其以后的元素,一定无法利用,也就是说如果存在移动最大长度时,一定有"T[1],T[2],...,T[k-1]"="S[i-k+1],S[i-k+2],...,S[i-1]}"也就是
“将Next[j]解释为p1 p2 …… pj 中最大相同前缀子串和后缀子串(真子串)的长度较容易理解。
提供一个Next[j]的计算方法:
当j=1时,Next[j]=0;
当j>1时,Next[j]的值为模式串的位置从1到j-1构成的串中所出现的首尾相同的子串的最大长度加1,无首尾相同的子串时 Next[j]的值为1。”
(引用自青玉案的《KMP》) 。
具体的求法书上有介绍,这个我只补充一点:我觉得书上所说求模式串与其自身匹配位置太过抽象,尽管本质确实如此,下边我会有更具体的方式.
(6)关于求next[j]的改进算法
其实本改进算法中唯一添加了的就是一个判断语句,它的核心正如上边所提到的,在于利用了求nextval[j]时,nextval[j]一定失配;并且将利用的工作引入准备工作---求nextval[].