分享
 
 
 

Linux/FreeBSD下用C语言开发PHP的so扩展模块例解

王朝php·作者佚名  2008-05-19
窄屏简体版  字體: |||超大  

引用本文请注明出处:Just Do IT (http://www.toplee.com) < Michael Lee @ toplee.com >

我从97年接触互联网的web开发,至今已经过去9年了,从最初的frontpage做html页面到学会ASP+access+IIS开始,就跟 web开发干上了,后来又依次使用了ASP+SQLServer+IIS、JSP+Oracle+Jrun(Resin/Tomcat)、PHP+ Syabse(MySQL)+Apache … 最后我定格到了 PHP+MySQL+Apache+Linux(BSD) 的架构上,也就是大家常说的LAMP架构,这说来有很多理由,网上也有很多人讨论各种架构和开发语言之间的优劣,我就不多说了,简单说一下我喜欢LAMP 的几个主要原因:

1、全开放的免费平台;

2、简单易上手、各种资源丰富;

3、PHP、MySQL、Apache与Linux(BSD)系统底层以及彼此间无缝结合,非常高效;

4、均使用最高效的语言C/C++开发,性能可靠;

5、PHP语言和C的风格基本一致,还吸取了Java和C++的诸多架构优点;

6、这是最关键的一点,那就是PHP可以非常方便的使用C/C++开发扩展模块,给了PHP无限的扩张性!

基于以上原因,我非常喜欢基于PHP语言的架构,其中最关键的一点就是最后一点,以前在Yahoo和mop均推广使用这个平台,在C扩展php方面也有一些经验,在此和大家分享一下,希望可以抛砖引玉。

用C语言编写PHP的扩展模块的方法有几种,根据最后的表现形式有两种,一种是直接编译进php,一种是编译为php的so扩展模块来被php调 用,另外根据编译的方式有两种,一种使用phpize工具(php编译后有的),一种使用ext_skel工具(php自带的),我们使用最多,也是最方 便的方式就是使用ext_skel工具来编写php的so扩展模块,这里也主要介绍这种方式。

我们在php的源码目录里面可以看到有个ext目录(我这里说的php都是基于Linux平台的php来说的,不包括windows下的),在 ext目录下有个工具 ext_skel ,这个工具可以让我们简单的开发出php的扩展模块,它提供了一个通用的php扩展模块开发步骤和模板。下面我们以开发一个在php里面进行 utf8/gbk/gb2312三种编码转换的扩展模块为例子进行说明。在这个模块中,我们要最终提供以下几个函数接口:

(1) string toplee_big52gbk(string s)

将输入字符串从BIG5码转换成GBK

(2) string toplee_gbk2big5(string s)

将输入字符串从GBK转换成BIG5码

(3) string toplee_normalize_name(string s)

将输入字符串作以下处理:全角转半角,strim,大写转小写

(4) string toplee_fan2jian(int code, string s)

将输入的GBK繁体字符串转换成简体

(5) string toplee_decode_utf(string s)

将utf编码的字符串转换成UNICODE

(6) string toplee_decode_utf_gb(string s)

将utf编码的字符串转换成GB

(7) string toplee_decode_utf_big5(string s)

将utf编码的字符串转换成BIG5

(8) string toplee_encode_utf_gb(string s)

将输入的GBKf编码的字符串转换成utf编码

首先,我们进入ext目录下,运行下面命令:

#./ext_skel –extname=toplee

这时,php会自动在ext目录下为我们生成一个目录toplee,里面包含下面几个文件

.cvsignore

CREDITS

EXPERIMENTAL

config.m4

php_toplee.h

tests

toplee.c

toplee.php

其中最有用的就是config.m4和toplee.c文件

接下来我们修改config.m4文件

#vi ./config.m4

找到里面有类似这样几行

dnl PHP_ARG_WITH(toplee, for toplee support,

dnl Make sure that the comment is aligned:

dnl [ --with-toplee Include toplee support])

dnl Otherwise use enable:

dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,

dnl Make sure that the comment is aligned:

dnl [ --enable-toplee Enable toplee support])

上面的几行意思是说告诉php编译的使用使用那种方式加载我们的扩展模块toplee,我们使用–with-toplee的方式,于是我们修改为下面的样子

PHP_ARG_WITH(toplee, for toplee support,

Make sure that the comment is aligned:

[ --with-toplee Include toplee support])

dnl Otherwise use enable:

dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,

dnl Make sure that the comment is aligned:

dnl [ --enable-toplee Enable toplee support])

然后我们要做的关键事情就是编写toplee.c,这个是我们编写模块的主要文件,如果您什么都不修改,其实也完成了一个php扩展模块的编写,里面有类似下面的几行代码

PHP_FUNCTION(confirm_toplee_compiled)

{

char *arg = NULL;

int arg_len, len;

char string[256];

if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &arg, &arg_len) == FAILURE) {

return;

}

len = sprintf(string, "Congratulations! You have successfully modified ext/%.78s/config.m4. Module %.78s is now compiled into PHP.", "toplee", arg);

RETURN_STRINGL(string, len, 1);

}

如果我们在后面完成php的编译时把新的模块编译进去,那么我们就可以在php脚本中调用函数toplee(),它会输出一段字符串 “Congratulations! You have successfully modified ext/toplee/config.m4. Module toplee is now compiled into PHP.”

下面是我们对toplee.c的修改,让其支持我们预先规划的功能和接口,下面是toplee.c的源代码

/*

+----------------------------------------------------------------------+

| PHP Version 4 |

+----------------------------------------------------------------------+

| Copyright (c) 1997-2002 The PHP Group |

+----------------------------------------------------------------------+

| This source file is subject to version 2.02 of the PHP license, |

| that is bundled with this package in the file LICENSE, and is |

| available at through the world-wide-web at |

| http://www.php.net/license/2_02.txt. |

| If you did not receive a copy of the PHP license and are unable to |

| obtain it through the world-wide-web, please send a note to |

| license@php.net so we can mail you a copy immediately. |

+----------------------------------------------------------------------+

| Author: |

+----------------------------------------------------------------------+

$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $

*/

#ifdef HAVE_CONFIG_H

#include "config.h"

#endif

#include "php.h"

#include "php_ini.h"

#include "ext/standard/info.h"

#include "php_gbk.h"

#include "toplee_util.h"

/* If you declare any globals in php_gbk.h uncomment this:

ZEND_DECLARE_MODULE_GLOBALS(gbk)

*/

/* True global resources - no need for thread safety here */

static int le_gbk;

/* {{{ gbk_functions[]

*

* Every user visible function must have an entry in gbk_functions[].

*/

function_entry gbk_functions[] = {

PHP_FE(toplee_decode_utf, NULL)

PHP_FE(toplee_decode_utf_gb, NULL)

PHP_FE(toplee_decode_utf_big5, NULL)

PHP_FE(toplee_encode_utf_gb, NULL)

PHP_FE(toplee_big52gbk, NULL)

PHP_FE(toplee_gbk2big5, NULL)

PHP_FE(toplee_fan2jian, NULL)

PHP_FE(toplee_normalize_name, NULL)

{NULL, NULL, NULL} /* Must be the last line in gbk_functions[] */

};

/* }}} */

/* {{{ gbk_module_entry

*/

zend_module_entry gbk_module_entry = {

#if ZEND_MODULE_API_NO >= 20010901

STANDARD_MODULE_HEADER,

#endif

"gbk",

gbk_functions,

PHP_MINIT(gbk),

PHP_MSHUTDOWN(gbk),

PHP_RINIT(gbk), /* Replace with NULL if there's nothing to do at request start */

PHP_RSHUTDOWN(gbk), /* Replace with NULL if there's nothing to do at request end */

PHP_MINFO(gbk),

#if ZEND_MODULE_API_NO >= 20010901

"0.1", /* Replace with version number for your extension */

#endif

STANDARD_MODULE_PROPERTIES

};

/* }}} */

#ifdef COMPILE_DL_GBK

ZEND_GET_MODULE(gbk)

#endif

/* {{{ PHP_INI

*/

/* Remove comments and fill if you need to have entries in php.ini*/

PHP_INI_BEGIN()

PHP_INI_ENTRY("gbk2uni", "", PHP_INI_SYSTEM, NULL)

PHP_INI_ENTRY("uni2gbk", "", PHP_INI_SYSTEM, NULL)

PHP_INI_ENTRY("uni2big5", "", PHP_INI_SYSTEM, NULL)

PHP_INI_ENTRY("big52uni", "", PHP_INI_SYSTEM, NULL)

PHP_INI_ENTRY("big52gbk", "", PHP_INI_SYSTEM, NULL)

PHP_INI_ENTRY("gbk2big5", "", PHP_INI_SYSTEM, NULL)

// STD_PHP_INI_ENTRY("gbk.global_value", "42", PHP_INI_ALL, OnUpdateInt, global_value, zend_gbk_globals, gbk_globals)

// STD_PHP_INI_ENTRY("gbk.global_string", "foobar", PHP_INI_ALL, OnUpdateString, global_string, zend_gbk_globals, gbk_globals)

PHP_INI_END()

/* }}} */

/* {{{ php_gbk_init_globals

*/

/* Uncomment this function if you have INI entries

static void php_gbk_init_globals(zend_gbk_globals *gbk_globals)

{

gbk_globals->global_value = 0;

gbk_globals->global_string = NULL;

}

*/

/* }}} */

char gbk2uni_file[256];

char uni2gbk_file[256];

char big52uni_file[256];

char uni2big5_file[256];

char gbk2big5_file[256];

char big52gbk_file[256];

//utf file init flag

static int initutf=0;

/* {{{ PHP_MINIT_FUNCTION

*/

PHP_MINIT_FUNCTION(gbk)

{

/* If you have INI entries, uncomment these lines

ZEND_INIT_MODULE_GLOBALS(gbk, php_gbk_init_globals, NULL);*/

REGISTER_INI_ENTRIES();

memset(gbk2uni_file, 0, sizeof(gbk2uni_file));

memset(uni2gbk_file, 0, sizeof(uni2gbk_file));

memset(big52uni_file, 0, sizeof(big52uni_file));

memset(uni2big5_file, 0, sizeof(uni2big5_file));

memset(gbk2big5_file, 0, sizeof(gbk2big5_file));

memset(big52gbk_file, 0, sizeof(big52gbk_file));

strncpy(gbk2uni_file, INI_STR("gbk2uni"), sizeof(gbk2uni_file)-1);

strncpy(uni2gbk_file, INI_STR("uni2gbk"), sizeof(uni2gbk_file)-1);

strncpy(big52uni_file, INI_STR("big52uni"), sizeof(big52uni_file)-1);

strncpy(uni2big5_file, INI_STR("uni2big5"), sizeof(uni2big5_file)-1);

strncpy(gbk2big5_file, INI_STR("gbk2big5"), sizeof(uni2big5_file)-1);

strncpy(big52gbk_file, INI_STR("big52gbk"), sizeof(uni2big5_file)-1);

//InitMMResource();

InitResource();

if ((uni2gbk_file[0] == '\0') || (uni2big5_file[0] == '\0')

|| (gbk2big5_file[0] == '\0') || (big52gbk_file[0] == '\0')

|| (gbk2uni_file[0] == '\0') || (big52uni_file[0] == '\0'))

{

return FAILURE;

}

if (gbk2uni_file[0] != '\0')

{

if (LoadOneCodeTable(CODE_GBK2UNI, gbk2uni_file) != NULL)

{

toplee_cleanup_mmap(NULL);

return FAILURE;

}

}

if (uni2gbk_file[0] != '\0')

{

if (LoadOneCodeTable(CODE_UNI2GBK, uni2gbk_file) != NULL)

{

toplee_cleanup_mmap(NULL);

return FAILURE;

}

}

if (big52uni_file[0] != '\0')

{

if (LoadOneCodeTable(CODE_BIG52UNI, big52uni_file) != NULL)

{

toplee_cleanup_mmap(NULL);

return FAILURE;

}

}

if (uni2big5_file[0] != '\0')

{

if (LoadOneCodeTable(CODE_UNI2BIG5, uni2big5_file) != NULL)

{

toplee_cleanup_mmap(NULL);

return FAILURE;

}

}

if (gbk2big5_file[0] != '\0')

{

if (LoadOneCodeTable(CODE_GBK2BIG5, gbk2big5_file) != NULL)

{

toplee_cleanup_mmap(NULL);

return FAILURE;

}

}

if (big52gbk_file[0] != '\0')

{

if (LoadOneCodeTable(CODE_BIG52GBK, big52gbk_file) != NULL)

{

toplee_cleanup_mmap(NULL);

return FAILURE;

}

}

initutf = 1;

return SUCCESS;

}

/* }}} */

/* {{{ PHP_MSHUTDOWN_FUNCTION

*/

PHP_MSHUTDOWN_FUNCTION(gbk)

{

/* uncomment this line if you have INI entries*/

UNREGISTER_INI_ENTRIES();

toplee_cleanup_mmap(NULL);

return SUCCESS;

}

/* }}} */

/* Remove if there's nothing to do at request start */

/* {{{ PHP_RINIT_FUNCTION

*/

PHP_RINIT_FUNCTION(gbk)

{

return SUCCESS;

}

/* }}} */

/* Remove if there's nothing to do at request end */

/* {{{ PHP_RSHUTDOWN_FUNCTION

*/

PHP_RSHUTDOWN_FUNCTION(gbk)

{

return SUCCESS;

}

/* }}} */

/* {{{ PHP_MINFO_FUNCTION

*/

PHP_MINFO_FUNCTION(gbk)

{

php_info_print_table_start();

php_info_print_table_header(2, "gbk support", "enabled");

php_info_print_table_end();

/* Remove comments if you have entries in php.ini*/

DISPLAY_INI_ENTRIES();

}

/* }}} */

/* Remove the following function when you have succesfully modified config.m4

so that your module can be compiled into PHP, it exists only for testing

purposes. */

/* {{{ proto toplee_decode_utf(string s)

*/

PHP_FUNCTION(toplee_decode_utf)

{

char *s = NULL, *t=NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

t = strdup(s);

if (t==NULL)

RETURN_FALSE

DecodePureUTF(t, KEEP_UNICODE);

RETVAL_STRING(t,1);

free(t);

return;

}

/* }}} */

/* {{{ proto toplee_decode_utf_gb(string s)

*/

PHP_FUNCTION(toplee_decode_utf_gb)

{

char *s = NULL, *t=NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

t = strdup(s);

if (t==NULL)

RETURN_FALSE

DecodePureUTF(t, DECODE_UNICODE);

RETVAL_STRING(t,1);

free(t);

return;

}

/* }}} */

/* {{{ proto toplee_decode_utf_big5(string s)

*/

PHP_FUNCTION(toplee_decode_utf_big5)

{

char *s = NULL, *t=NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

t = strdup(s);

if (t==NULL)

RETURN_FALSE

DecodePureUTF(t, DECODE_UNICODE | DECODE_BIG5);

RETVAL_STRING(t,1);

free(t);

return;

}

/* }}} */

int EncodePureUTF(unsigned char* strSrc,

unsigned char* strDst, int nDstLen, int nFlag)

{

int nRet;

int pos;

unsigned short c;

unsigned short* uBuf;

int nSize;

int nLen;

int nReturn;

nLen=strlen((const char*)strSrc);

if(nDstLen < nLen*2+1)

return 0;

nSize=nLen+1;

uBuf=(unsigned short*)emalloc(sizeof(unsigned short)*nSize);

nRet=MultiByteToWideChar(936, 0, (const char*)strSrc, strlen((const char*)strSrc),

uBuf, nSize);

nReturn=0;

pos=nRet;

while(pos>0)

{

c = *uBuf;

if (c < 0x80) {

strDst[nReturn++] = (char) c;

} else if (c < 0x800) {

strDst[nReturn++] = (0xc0 | (c >> 6));

strDst[nReturn++] = (0x80 | (c & 0x3f));

} else if (c < 0x10000) {

strDst[nReturn++] = (0xe0 | (c >> 12));

strDst[nReturn++] = (0x80 | ((c >> 6) & 0x3f));

strDst[nReturn++] = (0x80 | (c & 0x3f));

} else if (c < 0x200000) {

strDst[nReturn++] = (0xf0 | (c >> 18));

strDst[nReturn++] = (0x80 | ((c >> 12) & 0x3f));

strDst[nReturn++] = (0x80 | ((c >> 6) & 0x3f));

strDst[nReturn++] = (0x80 | (c & 0x3f));

}

pos--;

uBuf++;

}

strDst[nReturn]='\0';

return nReturn;

}

/* {{{ proto toplee_encode_utf_gb(string s)

*/

PHP_FUNCTION(toplee_encode_utf_gb)

{

char *s = NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

char* sRet;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

sRet=emalloc(strlen(s)*2+1);

EncodePureUTF(s, sRet, strlen(s)*2+1, 0);

RETVAL_STRING(sRet,1);

return;

}

/* }}} */

/* {{{ proto toplee_big52gbk(string s)

*/

PHP_FUNCTION(toplee_big52gbk)

{

char *s = NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

char* sRet = NULL;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

sRet=estrdup(s);

if (NULL == sRet)

RETURN_FALSE

BIG52GBK(sRet, strlen(sRet));

RETVAL_STRING(sRet,1);

return;

}

/* }}} */

/* {{{ proto toplee_gbk2big5(string s)

*/

PHP_FUNCTION(toplee_gbk2big5)

{

char *s = NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

char* sRet = NULL;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

sRet=estrdup(s);

if (NULL == sRet)

RETURN_FALSE

GBK2BIG5(sRet, strlen(sRet));

RETVAL_STRING(sRet,1);

return;

}

/* }}} */

/* {{{ proto toplee_normalize_name(string s)

*/

PHP_FUNCTION(toplee_normalize_name)

{

char *s = NULL;

int argc = ZEND_NUM_ARGS();

int s_len;

char* sRet = NULL;

if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

NormalizeName( s );

RETURN_STRING(s, 1 );

return;

}

/* }}} */

/* {{{ proto toplee_fan2jian(int code, string s)

*/

PHP_FUNCTION(toplee_fan2jian)

{

char *s = NULL;

int argc = ZEND_NUM_ARGS();

int s_len, code;

char* sRet = NULL;

char *pSource;

char *pDest1=NULL, *pDest2=NULL;

int nSourceLen, nDestLen;

if (zend_parse_parameters(argc TSRMLS_CC, "ls", &code, &s, &s_len) == FAILURE)

return;

if (!initutf)

RETURN_FALSE

pSource = s;

nSourceLen = s_len;

pDest1 = malloc(nSourceLen * 2);

pDest2 = malloc(nSourceLen+1);

if (NULL == pDest1 || NULL == pDest2)

goto _f2j_err;

memset(pDest1, 0, nSourceLen * 2);

memset(pDest2, 0, nSourceLen + 1);

nDestLen = MultiByteToWideChar(code, 0, pSource, nSourceLen, (short *)pDest1, nSourceLen * 2);

if (0 >= nDestLen)

goto _f2j_err;

nDestLen = WideCharToMultiByte(code, 0, (short *)pDest1, nDestLen, pDest2, nSourceLen, NULL, NULL);

if (0 >= nDestLen)

goto _f2j_err;

RETVAL_STRING(pDest2, 1);

if (pDest1 != NULL)

free(pDest1);

if (pDest2 != NULL)

free(pDest2);

return;

_f2j_err:

if (pDest1 != NULL)

free(pDest1);

if (pDest2 != NULL)

free(pDest2);

RETURN_FALSE;

}

/* }}} */

/*

* Local variables:

* tab-width: 4

* c-basic-offset: 4

* End:

* vim600: noet sw=4 ts=4 fdm=marker

* vim<600: noet sw=4 ts=4

*/

.

事实上我们在这个文件里面定义了所有我们要实现的接口,剩下的部分就是我们再编写几个具体实现的C语言代码,有关C具体实现的技术细节就不在此讨 论,有个关键的大家注意就是,您可以在ext/toplee目录下加入您所有用于实现您在toplee.c里面定义的接口的C源文件和头文件,让 toplee.c在编译的时候可以调用到,这些都是标准的C语言语法。Michael就不另说,下我把我们实现的几个代码都贴出来:

chn_util.h

#ifndef __CHN_UTIL_H__

#define __CHN_UTIL_H__

#include "common.h"

#define LANG_GB 1

#define LANG_B5 2

#define GB_FULL_COUNT (20+26*2+5+4+26)

#define B5_FULL_COUNT (20+26*2+5+4+24)

BOOL FullToHalf(char *str, int nLang);

void LowerString(char* str);

void TrimString(char* str);

#endif // __CHN_UTIL_H__

.

chn_util.c

#include <stdio.h>

#include <assert.h>

#include <string.h>

#include "common.h"

#include "chn_util.h"

// 0123456789!@()-_+'<>

static char *GBFull[GB_FULL_COUNT] =

{"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",

"", "@", "(", ")", "-", "_", "+", "'", "<", ">",

"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",

"l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v",

"w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G",

"H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",

"S", "T", "U", "V", "W", "X", "Y", "Z",

"。", "·", ".", "﹒", "&",

"《", "〈", "〉", "》",

"﹐", ",", "﹔", ";", "﹕", ":", "﹖", "?", "﹗", "!", "—",

"‘", "’", "“", "”", "~", "∶", "`", "|", "[", "]", "{",

"}", "#", "$", "%"

};

static char GBEnHalf[GB_FULL_COUNT+1] =

"0123456789 @()-_+\'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

"....&<<>>,,;;::\?\?!!-\'\'\"\"~:`|[]{}#$%";

// ⒈⒉⒊⒋⒌⒍⒎⒏⌒∨∠ˇ≌≈

static char *B5Full[B5_FULL_COUNT] =

{"", "", "⒈", "⒉", "⒊", "⒋", "⒌", "⒍", "⒎", "⒏",

"", "", "", "", "⌒", "∨", "∠", "ˇ", "≌", "≈",

"㈤", "㈥", "㈦", "㈧", "㈨", "㈩", "", "", "Ⅰ", "Ⅱ", "Ⅲ",

"Ⅳ", "Ⅴ", "Ⅵ", "Ⅶ", "Ⅷ", "Ⅸ", "Ⅹ", "Ⅺ", "Ⅻ", "", "",

"", "", "", "", "⑾", "⑿", "⒀", "⒁", "⒂", "⒃", "⒄",

"⒅", "⒆", "⒇", "①", "②", "③", "④", "⑤", "⑥", "⑦", "⑧",

"⑨", "⑩", "", "", "㈠", "㈡", "㈢", "㈣",

"", "", "", "", "‘",

"", "", "", "",

"", "", "", "", "", "", "", "", "", "", "",

"ˉ", "ˇ", "¨", "〃", "°", "", "", "", "", "", "…",

"", ""

};

static char B5EnHalf[B5_FULL_COUNT+1] =

"0123456789 @()-_+\'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

"....&<<>>,,;;::\?\?!!-\'\'\"\"~|[]{}#$%";

static int _bFHSortFlag=0;

static void _sorttable(char* tableFull[], char* tableHalf, int nSize)

{

int i,j;

char* p;

char cTemp;

for(i=0;i<nSize;i++)

{

for(j=i+1;j<nSize;j++)

{

if(strcmp(tableFull[i],tableFull[j])<0)

{

p=tableFull[i];

tableFull[i]=tableFull[j];

tableFull[j]=p;

cTemp=tableHalf[i];

tableHalf[i]=tableHalf[j];

tableHalf[j]=cTemp;

}

}

}

}

BOOL FullToHalf(char *str, int nCodePage)

{

char *pSrc = str;

char *pDest = str;

char **pFull;

char *pEnHalf;

int nCount;

BOOL bContinue = FALSE;

int nHigh,nLow,nMid,nResult;

if(!_bFHSortFlag)

{

_sorttable(GBFull,GBEnHalf, GB_FULL_COUNT);

_sorttable(B5Full,B5EnHalf, B5_FULL_COUNT);

_bFHSortFlag=1;

}

assert(NULL != str);

if ((LANG_GB == nCodePage) || (936==nCodePage))

{

pFull = GBFull;

pEnHalf = GBEnHalf;

nCount = GB_FULL_COUNT;

}

else if ((LANG_B5 == nCodePage) || (950==nCodePage))

{

pFull = B5Full;

pEnHalf = B5EnHalf;

nCount = B5_FULL_COUNT;

}

else

{

assert( FALSE );

return FALSE;

}

while ('\0' != *pSrc)

{

if (0x81 <= (BYTE)*pSrc)

{

// 改用二分法,可以极大提高效率

nLow=0;

nHigh=nCount-1;

while(nLow <= nHigh)

{

nMid = (nLow+nHigh) / 2;

nResult = strncmp(pSrc, pFull[nMid], 2);;

if( 0 == nResult)

{

*pDest++ = pEnHalf[nMid];

pSrc+=2;

bContinue=TRUE;

break;

}

if( nResult > 0)

nHigh=nMid-1;

else

nLow=nMid+1;

}

if( !bContinue )

{

// 判断其他符号

if( ( 0xA1 <= (BYTE)*pSrc ) &&

( 0xA9 >= (BYTE)*pSrc ) )

{

*pDest++ = ' ';

pSrc+=2;

bContinue=TRUE;

}

}

/* for (nIndex = 0; nIndex < nCount; nIndex++)

{

assert(NULL != pFull[nIndex]);

if (NULL != pFull[nIndex])

{

if (0 == strncmp(pSrc, pFull[nIndex], 2))

{

*pDest++ = pEnHalf[nIndex]; // convert full to half

pSrc += 2;

bContinue = TRUE;

break;

}

}

}*/

if (bContinue)

{

bContinue = FALSE;

continue;

}

*pDest++ = *pSrc++; // copy head char, and the next statement copy tail char

if(*pSrc == '\0')

break;

}

*pDest++ = *pSrc++; // ascii code

}

*pDest = '\0';

return TRUE;

}

BOOL MyIsDBCSLeadByte(BYTE TestChar)

{

if((TestChar>0X80) && (TestChar<0xFF))

return TRUE;

else

return FALSE;

}

void LowerString(char* str)

{

while(*str)

{

if(!MyIsDBCSLeadByte(*str))

{

if( (*str>='A') && (*str<='Z') )

*str = (char)(*str+('a'-'A'));

}

else

{

str++;

if(!*str)

break;

}

str++;

}

return ;

}

BOOL myisspace(char c)

{

return ((c==' ') || (c=='\t') || (c=='\r') || (c=='\n'));

}

void TrimString(char* str)

{

char* pDst;

char* pSrc;

char* pLast;

char cCurrent;

int nState;

pLast=pDst=pSrc=str;

nState=0;

while(*pSrc)

{

cCurrent=*pSrc;

switch(nState)

{

case 0:

if(!myisspace(cCurrent))

{

nState=1;

continue;

}

break;

case 1:

if(myisspace(cCurrent))

{

nState=2;

*pDst=cCurrent;

}

else

{

*pDst=cCurrent;

pLast=pDst+1;

}

pDst++;

break;

case 2:

if(myisspace(cCurrent))

{

*pDst=cCurrent;

}

else

{

*pDst=cCurrent;

pLast=pDst+1;

}

pDst++;

break;

}

pSrc++;

}

*pLast='\0';

return;

}

.

toplee_util.c

......

int ToBase64(void* pSrc,int nSrcLen, char* strBase64, int* nBase64Len)

{

static char *v = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

.......... 中间代码有长达3000多行,本文省略掉了 ........

void NormalizeName( char *p )

{

FullToHalf( p, CODE_PAGE_GBK );

TrimString( p );

LowerString( p );

}

.

toplee_util.h

#ifndef __TOPLEE_UTIL_INCLUDE__

#define __TOPLEE_UTIL_INCLUDE__ 1

#include <sys/stat.h>

#include <sys/types.h>

#include <sys/mman.h>

#include <string.h>

#include <stdlib.h>

#ifdef LINUX

#include <time.h>

#endif

#include "common.h"

//#include "euc2uni.h"

/*

typedef int BOOL;

*/

#ifndef TRUE

#define TRUE 1

#define FALSE 0

#endif

#define ASCII 0

#define HZ_HEAD 1

#define HZ_TAIL 2

#ifdef BIG_ENDDING

#define DEFAULT_UNICODE 0x3000

#define DEFAULT_GBK_CODE 0xA1A1

#define DEFAULT_BIG5_CODE 0xA140

#else

#define DEFAULT_UNICODE 0x0030

#define DEFAULT_GBK_CODE 0xA1A1

#define DEFAULT_BIG5_CODE 0x40A1

#endif

#define CODE_PAGE_GBK 936

#define CODE_PAGE_BIG5 950

#define CODE_PAGE_EUC 932

#define CHARSET_DEFAULT 0

#define CHARSET_UNICODE 1

#define CHARSET_UTF8 2

// 24066 = ( 0xFE - 0x81 + 1 ) * ( 0xFE - 0x40 + 1)

#define GBK_COUNT 24066

// 16999 = ( 0xF9 - 0xA1 + 1 ) * ( 0xFE - 0x40 + 1)

#define BIG5_COUNT 16999

typedef struct tagMMapFile2

{

BOOL bUsed;

struct stat finfo;

void *mm;

} MMapFile;

//int LoadEuc2UniTable(char *strFileName);

//void FreeEuc2UniTable(void);

int ToBase64(void* pSrc,int nSrcLen, char* strBase64, int* nBase64Len);

int FromBase64(char* strSrc, int nSrcLen, void* pDest, int* nDestLen);

int htmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);

int MultiByteToWideChar(unsigned int uCodePage, unsigned long lFlags,

char *pMultiByteStr, int nMultiByte,

unsigned short *pWideChar, int nWideChar);

int WideCharToMultiByte(unsigned int uCodePage, unsigned long dwFlags,

unsigned short *pWideCharStr, int nWideChar,

char *pMultiByteStr, int nMultiByte,

const char* lpDefaultChar, int* lpUseDefaultChar);

#define ASCII 0

#define HZ_HEAD 1

#define HZ_TAIL 2

void GBK2BIG5(char *lpString, int cbString);

void BIG52GBK(char *lpString, int cbString);

void LowerString(char *str);

void TrimString(char *str);

void DecodeFormString(char *str);

void DecodeUTF(char *str);

#define DECODE_UNICODE 0

#define KEEP_UNICODE 1

#define DECODE_GBK 0

#define DECODE_BIG5 2

int DecodePureUTF(unsigned char *str, int nFlag);

#define LANG_GB 1 // used by httpstrtoint and FullToHalf

#define LANG_B5 2

#define LANG_ENG 3

#define LANG_UNKNOWN 4

int httpstrtoint(char* strHttp);

void lowerhttpprefix(char* strUrl);

#define FULL_COUNT (21+26*2+5)

BOOL FullToHalf(char *str, int nLang);

#define URLDESCSEPCHAR '|'

char* DescriptFromUrl(char* strUrl);

#define CODE_GBK2UNI 1

#define CODE_UNI2GBK 2

#define CODE_BIG52UNI 3

#define CODE_UNI2BIG5 4

#define CODE_GBK2BIG5 5

#define CODE_BIG52GBK 6

const char *mmapOneFile(char *pFileName, MMapFile *mmapfile);

void toplee_cleanup_mmap(void *dummy);

void InitMMResource(void);

const char* LoadOneCodeTable(int nType, char* strFileName);

int getcuryear();

char* mstrncpy(char* strDest, char* strSrc, size_t nCount);

int formurlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);

int wmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);

int htmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);

#define MAX_INTERNAL_BUFF 16384

int gb2uni_encode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);

int unicodeencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);

char *stristr(const char *big, const char *little);

typedef struct auto_string

{

int len, inc_len;

char *strval;

}struAutoString;

#define DEF_INC_LEN (1024)

#define DEF_INT_LEN 12

void init_auto_string(struAutoString *astr, int inc_len);

int add_auto_string(struAutoString *astr, char *new_str);

void free_auto_string(struAutoString *astr);

int unistrcmp(const char *str1, int str1len, const char *str2, int str2len);

void NormalizeName( char *p );

#endif // __TOPLEE_UTIL_INCLUDE__

.

php_toplee.h

/*

+----------------------------------------------------------------------+

| PHP Version 4 |

+----------------------------------------------------------------------+

| Copyright (c) 1997-2002 The PHP Group |

+----------------------------------------------------------------------+

| This source file is subject to version 2.02 of the PHP license, |

| that is bundled with this package in the file LICENSE, and is |

| available at through the world-wide-web at |

| http://www.php.net/license/2_02.txt. |

| If you did not receive a copy of the PHP license and are unable to |

| obtain it through the world-wide-web, please send a note to |

| license@php.net so we can mail you a copy immediately. |

+----------------------------------------------------------------------+

| Author: |

+----------------------------------------------------------------------+

$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $

*/

#ifndef PHP_GBK_H

#define PHP_GBK_H

extern zend_module_entry gbk_module_entry;

#define phpext_gbk_ptr &gbk_module_entry

#ifdef PHP_WIN32

#define PHP_GBK_API __declspec(dllexport)

#else

#define PHP_GBK_API

#endif

#ifdef ZTS

#include "TSRM.h"

#endif

PHP_MINIT_FUNCTION(gbk);

PHP_MSHUTDOWN_FUNCTION(gbk);

PHP_RINIT_FUNCTION(gbk);

PHP_RSHUTDOWN_FUNCTION(gbk);

PHP_MINFO_FUNCTION(gbk);

PHP_FUNCTION(confirm_gbk_compiled); /* For testing, remove later. */

PHP_FUNCTION(toplee_decode_utf);

PHP_FUNCTION(toplee_decode_utf_gb);

PHP_FUNCTION(toplee_decode_utf_big5);

PHP_FUNCTION(toplee_encode_utf_gb);

PHP_FUNCTION(toplee_big52gbk);

PHP_FUNCTION(toplee_gbk2big5);

PHP_FUNCTION(toplee_fan2jian);

PHP_FUNCTION(toplee_normalize_name);

/*

Declare any global variables you may need between the BEGIN

and END macros here:

ZEND_BEGIN_MODULE_GLOBALS(gbk)

int global_value;

char *global_string;

ZEND_END_MODULE_GLOBALS(gbk)

*/

/* In every utility function you add that needs to use variables

in php_gbk_globals, call TSRM_FETCH(); after declaring other

variables used by that function, or better yet, pass in TSRMLS_CC

after the last function argument and declare your utility function

with TSRMLS_DC after the last declared argument. Always refer to

the globals in your function as GBK_G(variable). You are

encouraged to rename these macros something shorter, see

examples in any other php module directory.

*/

#ifdef ZTS

#define GBK_G(v) TSRMG(gbk_globals_id, zend_gbk_globals *, v)

#else

#define GBK_G(v) (gbk_globals.v)

#endif

#endif /* PHP_GBK_H */

/*

* Local variables:

* tab-width: 4

* c-basic-offset: 4

* indent-tabs-mode: t

* End:

*/

.

至此,我们完成了所有C 代码的编写,本模块实现还需要用到几个码表文件,比如gb2b5.tab,uni2gb.tab之类的,这些码表文件我就不提供了,可以查一些文档如何生成,网上也有很多这样的tab码表文件下载。

接下来,我们就可以进行测试和编译了

回到php源码的根目录,运行命令

#./buildconf

#./configure –with-toplee=shared ……

#./make

#./make install

此时,就完成了模块往php里面的编译,由于加上了shared参数,toplee模块将编译后生成 toplee.so,可以在php.ini或者extensions.ini文件里面使用extension=toplee.so来调用,也可以在php 中使用dl()函数动态调用,然后就可以在php里面使用之前我们定义好的几个函数接口了。

因Michael技术实力有限,本文有不正确之处请高手指正,也希望通过本文起到抛砖引玉之效果,让更多的php爱好者一起来分享个人的宝贵经验!

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有