引用本文请注明出处:Just Do IT (http://www.toplee.com) < Michael Lee @ toplee.com >
我从97年接触互联网的web开发,至今已经过去9年了,从最初的frontpage做html页面到学会ASP+access+IIS开始,就跟 web开发干上了,后来又依次使用了ASP+SQLServer+IIS、JSP+Oracle+Jrun(Resin/Tomcat)、PHP+ Syabse(MySQL)+Apache … 最后我定格到了 PHP+MySQL+Apache+Linux(BSD) 的架构上,也就是大家常说的LAMP架构,这说来有很多理由,网上也有很多人讨论各种架构和开发语言之间的优劣,我就不多说了,简单说一下我喜欢LAMP 的几个主要原因:
1、全开放的免费平台;
2、简单易上手、各种资源丰富;
3、PHP、MySQL、Apache与Linux(BSD)系统底层以及彼此间无缝结合,非常高效;
4、均使用最高效的语言C/C++开发,性能可靠;
5、PHP语言和C的风格基本一致,还吸取了Java和C++的诸多架构优点;
6、这是最关键的一点,那就是PHP可以非常方便的使用C/C++开发扩展模块,给了PHP无限的扩张性!
基于以上原因,我非常喜欢基于PHP语言的架构,其中最关键的一点就是最后一点,以前在Yahoo和mop均推广使用这个平台,在C扩展php方面也有一些经验,在此和大家分享一下,希望可以抛砖引玉。
用C语言编写PHP的扩展模块的方法有几种,根据最后的表现形式有两种,一种是直接编译进php,一种是编译为php的so扩展模块来被php调 用,另外根据编译的方式有两种,一种使用phpize工具(php编译后有的),一种使用ext_skel工具(php自带的),我们使用最多,也是最方 便的方式就是使用ext_skel工具来编写php的so扩展模块,这里也主要介绍这种方式。
我们在php的源码目录里面可以看到有个ext目录(我这里说的php都是基于Linux平台的php来说的,不包括windows下的),在 ext目录下有个工具 ext_skel ,这个工具可以让我们简单的开发出php的扩展模块,它提供了一个通用的php扩展模块开发步骤和模板。下面我们以开发一个在php里面进行 utf8/gbk/gb2312三种编码转换的扩展模块为例子进行说明。在这个模块中,我们要最终提供以下几个函数接口:
(1) string toplee_big52gbk(string s)
将输入字符串从BIG5码转换成GBK
(2) string toplee_gbk2big5(string s)
将输入字符串从GBK转换成BIG5码
(3) string toplee_normalize_name(string s)
将输入字符串作以下处理:全角转半角,strim,大写转小写
(4) string toplee_fan2jian(int code, string s)
将输入的GBK繁体字符串转换成简体
(5) string toplee_decode_utf(string s)
将utf编码的字符串转换成UNICODE
(6) string toplee_decode_utf_gb(string s)
将utf编码的字符串转换成GB
(7) string toplee_decode_utf_big5(string s)
将utf编码的字符串转换成BIG5
(8) string toplee_encode_utf_gb(string s)
将输入的GBKf编码的字符串转换成utf编码
首先,我们进入ext目录下,运行下面命令:
#./ext_skel –extname=toplee
这时,php会自动在ext目录下为我们生成一个目录toplee,里面包含下面几个文件
.cvsignore
CREDITS
EXPERIMENTAL
config.m4
php_toplee.h
tests
toplee.c
toplee.php
其中最有用的就是config.m4和toplee.c文件
接下来我们修改config.m4文件
#vi ./config.m4
找到里面有类似这样几行
dnl PHP_ARG_WITH(toplee, for toplee support,
dnl Make sure that the comment is aligned:
dnl [ --with-toplee Include toplee support])
dnl Otherwise use enable:
dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [ --enable-toplee Enable toplee support])
上面的几行意思是说告诉php编译的使用使用那种方式加载我们的扩展模块toplee,我们使用–with-toplee的方式,于是我们修改为下面的样子
PHP_ARG_WITH(toplee, for toplee support,
Make sure that the comment is aligned:
[ --with-toplee Include toplee support])
dnl Otherwise use enable:
dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [ --enable-toplee Enable toplee support])
然后我们要做的关键事情就是编写toplee.c,这个是我们编写模块的主要文件,如果您什么都不修改,其实也完成了一个php扩展模块的编写,里面有类似下面的几行代码
PHP_FUNCTION(confirm_toplee_compiled)
{
char *arg = NULL;
int arg_len, len;
char string[256];
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &arg, &arg_len) == FAILURE) {
return;
}
len = sprintf(string, "Congratulations! You have successfully modified ext/%.78s/config.m4. Module %.78s is now compiled into PHP.", "toplee", arg);
RETURN_STRINGL(string, len, 1);
}
如果我们在后面完成php的编译时把新的模块编译进去,那么我们就可以在php脚本中调用函数toplee(),它会输出一段字符串 “Congratulations! You have successfully modified ext/toplee/config.m4. Module toplee is now compiled into PHP.”
下面是我们对toplee.c的修改,让其支持我们预先规划的功能和接口,下面是toplee.c的源代码
/*
+----------------------------------------------------------------------+
| PHP Version 4 |
+----------------------------------------------------------------------+
| Copyright (c) 1997-2002 The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 2.02 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available at through the world-wide-web at |
| http://www.php.net/license/2_02.txt. |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| license@php.net so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Author: |
+----------------------------------------------------------------------+
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "php.h"
#include "php_ini.h"
#include "ext/standard/info.h"
#include "php_gbk.h"
#include "toplee_util.h"
/* If you declare any globals in php_gbk.h uncomment this:
ZEND_DECLARE_MODULE_GLOBALS(gbk)
*/
/* True global resources - no need for thread safety here */
static int le_gbk;
/* {{{ gbk_functions[]
*
* Every user visible function must have an entry in gbk_functions[].
*/
function_entry gbk_functions[] = {
PHP_FE(toplee_decode_utf, NULL)
PHP_FE(toplee_decode_utf_gb, NULL)
PHP_FE(toplee_decode_utf_big5, NULL)
PHP_FE(toplee_encode_utf_gb, NULL)
PHP_FE(toplee_big52gbk, NULL)
PHP_FE(toplee_gbk2big5, NULL)
PHP_FE(toplee_fan2jian, NULL)
PHP_FE(toplee_normalize_name, NULL)
{NULL, NULL, NULL} /* Must be the last line in gbk_functions[] */
};
/* }}} */
/* {{{ gbk_module_entry
*/
zend_module_entry gbk_module_entry = {
#if ZEND_MODULE_API_NO >= 20010901
STANDARD_MODULE_HEADER,
#endif
"gbk",
gbk_functions,
PHP_MINIT(gbk),
PHP_MSHUTDOWN(gbk),
PHP_RINIT(gbk), /* Replace with NULL if there's nothing to do at request start */
PHP_RSHUTDOWN(gbk), /* Replace with NULL if there's nothing to do at request end */
PHP_MINFO(gbk),
#if ZEND_MODULE_API_NO >= 20010901
"0.1", /* Replace with version number for your extension */
#endif
STANDARD_MODULE_PROPERTIES
};
/* }}} */
#ifdef COMPILE_DL_GBK
ZEND_GET_MODULE(gbk)
#endif
/* {{{ PHP_INI
*/
/* Remove comments and fill if you need to have entries in php.ini*/
PHP_INI_BEGIN()
PHP_INI_ENTRY("gbk2uni", "", PHP_INI_SYSTEM, NULL)
PHP_INI_ENTRY("uni2gbk", "", PHP_INI_SYSTEM, NULL)
PHP_INI_ENTRY("uni2big5", "", PHP_INI_SYSTEM, NULL)
PHP_INI_ENTRY("big52uni", "", PHP_INI_SYSTEM, NULL)
PHP_INI_ENTRY("big52gbk", "", PHP_INI_SYSTEM, NULL)
PHP_INI_ENTRY("gbk2big5", "", PHP_INI_SYSTEM, NULL)
// STD_PHP_INI_ENTRY("gbk.global_value", "42", PHP_INI_ALL, OnUpdateInt, global_value, zend_gbk_globals, gbk_globals)
// STD_PHP_INI_ENTRY("gbk.global_string", "foobar", PHP_INI_ALL, OnUpdateString, global_string, zend_gbk_globals, gbk_globals)
PHP_INI_END()
/* }}} */
/* {{{ php_gbk_init_globals
*/
/* Uncomment this function if you have INI entries
static void php_gbk_init_globals(zend_gbk_globals *gbk_globals)
{
gbk_globals->global_value = 0;
gbk_globals->global_string = NULL;
}
*/
/* }}} */
char gbk2uni_file[256];
char uni2gbk_file[256];
char big52uni_file[256];
char uni2big5_file[256];
char gbk2big5_file[256];
char big52gbk_file[256];
//utf file init flag
static int initutf=0;
/* {{{ PHP_MINIT_FUNCTION
*/
PHP_MINIT_FUNCTION(gbk)
{
/* If you have INI entries, uncomment these lines
ZEND_INIT_MODULE_GLOBALS(gbk, php_gbk_init_globals, NULL);*/
REGISTER_INI_ENTRIES();
memset(gbk2uni_file, 0, sizeof(gbk2uni_file));
memset(uni2gbk_file, 0, sizeof(uni2gbk_file));
memset(big52uni_file, 0, sizeof(big52uni_file));
memset(uni2big5_file, 0, sizeof(uni2big5_file));
memset(gbk2big5_file, 0, sizeof(gbk2big5_file));
memset(big52gbk_file, 0, sizeof(big52gbk_file));
strncpy(gbk2uni_file, INI_STR("gbk2uni"), sizeof(gbk2uni_file)-1);
strncpy(uni2gbk_file, INI_STR("uni2gbk"), sizeof(uni2gbk_file)-1);
strncpy(big52uni_file, INI_STR("big52uni"), sizeof(big52uni_file)-1);
strncpy(uni2big5_file, INI_STR("uni2big5"), sizeof(uni2big5_file)-1);
strncpy(gbk2big5_file, INI_STR("gbk2big5"), sizeof(uni2big5_file)-1);
strncpy(big52gbk_file, INI_STR("big52gbk"), sizeof(uni2big5_file)-1);
//InitMMResource();
InitResource();
if ((uni2gbk_file[0] == '\0') || (uni2big5_file[0] == '\0')
|| (gbk2big5_file[0] == '\0') || (big52gbk_file[0] == '\0')
|| (gbk2uni_file[0] == '\0') || (big52uni_file[0] == '\0'))
{
return FAILURE;
}
if (gbk2uni_file[0] != '\0')
{
if (LoadOneCodeTable(CODE_GBK2UNI, gbk2uni_file) != NULL)
{
toplee_cleanup_mmap(NULL);
return FAILURE;
}
}
if (uni2gbk_file[0] != '\0')
{
if (LoadOneCodeTable(CODE_UNI2GBK, uni2gbk_file) != NULL)
{
toplee_cleanup_mmap(NULL);
return FAILURE;
}
}
if (big52uni_file[0] != '\0')
{
if (LoadOneCodeTable(CODE_BIG52UNI, big52uni_file) != NULL)
{
toplee_cleanup_mmap(NULL);
return FAILURE;
}
}
if (uni2big5_file[0] != '\0')
{
if (LoadOneCodeTable(CODE_UNI2BIG5, uni2big5_file) != NULL)
{
toplee_cleanup_mmap(NULL);
return FAILURE;
}
}
if (gbk2big5_file[0] != '\0')
{
if (LoadOneCodeTable(CODE_GBK2BIG5, gbk2big5_file) != NULL)
{
toplee_cleanup_mmap(NULL);
return FAILURE;
}
}
if (big52gbk_file[0] != '\0')
{
if (LoadOneCodeTable(CODE_BIG52GBK, big52gbk_file) != NULL)
{
toplee_cleanup_mmap(NULL);
return FAILURE;
}
}
initutf = 1;
return SUCCESS;
}
/* }}} */
/* {{{ PHP_MSHUTDOWN_FUNCTION
*/
PHP_MSHUTDOWN_FUNCTION(gbk)
{
/* uncomment this line if you have INI entries*/
UNREGISTER_INI_ENTRIES();
toplee_cleanup_mmap(NULL);
return SUCCESS;
}
/* }}} */
/* Remove if there's nothing to do at request start */
/* {{{ PHP_RINIT_FUNCTION
*/
PHP_RINIT_FUNCTION(gbk)
{
return SUCCESS;
}
/* }}} */
/* Remove if there's nothing to do at request end */
/* {{{ PHP_RSHUTDOWN_FUNCTION
*/
PHP_RSHUTDOWN_FUNCTION(gbk)
{
return SUCCESS;
}
/* }}} */
/* {{{ PHP_MINFO_FUNCTION
*/
PHP_MINFO_FUNCTION(gbk)
{
php_info_print_table_start();
php_info_print_table_header(2, "gbk support", "enabled");
php_info_print_table_end();
/* Remove comments if you have entries in php.ini*/
DISPLAY_INI_ENTRIES();
}
/* }}} */
/* Remove the following function when you have succesfully modified config.m4
so that your module can be compiled into PHP, it exists only for testing
purposes. */
/* {{{ proto toplee_decode_utf(string s)
*/
PHP_FUNCTION(toplee_decode_utf)
{
char *s = NULL, *t=NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
t = strdup(s);
if (t==NULL)
RETURN_FALSE
DecodePureUTF(t, KEEP_UNICODE);
RETVAL_STRING(t,1);
free(t);
return;
}
/* }}} */
/* {{{ proto toplee_decode_utf_gb(string s)
*/
PHP_FUNCTION(toplee_decode_utf_gb)
{
char *s = NULL, *t=NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
t = strdup(s);
if (t==NULL)
RETURN_FALSE
DecodePureUTF(t, DECODE_UNICODE);
RETVAL_STRING(t,1);
free(t);
return;
}
/* }}} */
/* {{{ proto toplee_decode_utf_big5(string s)
*/
PHP_FUNCTION(toplee_decode_utf_big5)
{
char *s = NULL, *t=NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
t = strdup(s);
if (t==NULL)
RETURN_FALSE
DecodePureUTF(t, DECODE_UNICODE | DECODE_BIG5);
RETVAL_STRING(t,1);
free(t);
return;
}
/* }}} */
int EncodePureUTF(unsigned char* strSrc,
unsigned char* strDst, int nDstLen, int nFlag)
{
int nRet;
int pos;
unsigned short c;
unsigned short* uBuf;
int nSize;
int nLen;
int nReturn;
nLen=strlen((const char*)strSrc);
if(nDstLen < nLen*2+1)
return 0;
nSize=nLen+1;
uBuf=(unsigned short*)emalloc(sizeof(unsigned short)*nSize);
nRet=MultiByteToWideChar(936, 0, (const char*)strSrc, strlen((const char*)strSrc),
uBuf, nSize);
nReturn=0;
pos=nRet;
while(pos>0)
{
c = *uBuf;
if (c < 0x80) {
strDst[nReturn++] = (char) c;
} else if (c < 0x800) {
strDst[nReturn++] = (0xc0 | (c >> 6));
strDst[nReturn++] = (0x80 | (c & 0x3f));
} else if (c < 0x10000) {
strDst[nReturn++] = (0xe0 | (c >> 12));
strDst[nReturn++] = (0x80 | ((c >> 6) & 0x3f));
strDst[nReturn++] = (0x80 | (c & 0x3f));
} else if (c < 0x200000) {
strDst[nReturn++] = (0xf0 | (c >> 18));
strDst[nReturn++] = (0x80 | ((c >> 12) & 0x3f));
strDst[nReturn++] = (0x80 | ((c >> 6) & 0x3f));
strDst[nReturn++] = (0x80 | (c & 0x3f));
}
pos--;
uBuf++;
}
strDst[nReturn]='\0';
return nReturn;
}
/* {{{ proto toplee_encode_utf_gb(string s)
*/
PHP_FUNCTION(toplee_encode_utf_gb)
{
char *s = NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
char* sRet;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
sRet=emalloc(strlen(s)*2+1);
EncodePureUTF(s, sRet, strlen(s)*2+1, 0);
RETVAL_STRING(sRet,1);
return;
}
/* }}} */
/* {{{ proto toplee_big52gbk(string s)
*/
PHP_FUNCTION(toplee_big52gbk)
{
char *s = NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
char* sRet = NULL;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
sRet=estrdup(s);
if (NULL == sRet)
RETURN_FALSE
BIG52GBK(sRet, strlen(sRet));
RETVAL_STRING(sRet,1);
return;
}
/* }}} */
/* {{{ proto toplee_gbk2big5(string s)
*/
PHP_FUNCTION(toplee_gbk2big5)
{
char *s = NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
char* sRet = NULL;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
sRet=estrdup(s);
if (NULL == sRet)
RETURN_FALSE
GBK2BIG5(sRet, strlen(sRet));
RETVAL_STRING(sRet,1);
return;
}
/* }}} */
/* {{{ proto toplee_normalize_name(string s)
*/
PHP_FUNCTION(toplee_normalize_name)
{
char *s = NULL;
int argc = ZEND_NUM_ARGS();
int s_len;
char* sRet = NULL;
if (zend_parse_parameters(argc TSRMLS_CC, "s", &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
NormalizeName( s );
RETURN_STRING(s, 1 );
return;
}
/* }}} */
/* {{{ proto toplee_fan2jian(int code, string s)
*/
PHP_FUNCTION(toplee_fan2jian)
{
char *s = NULL;
int argc = ZEND_NUM_ARGS();
int s_len, code;
char* sRet = NULL;
char *pSource;
char *pDest1=NULL, *pDest2=NULL;
int nSourceLen, nDestLen;
if (zend_parse_parameters(argc TSRMLS_CC, "ls", &code, &s, &s_len) == FAILURE)
return;
if (!initutf)
RETURN_FALSE
pSource = s;
nSourceLen = s_len;
pDest1 = malloc(nSourceLen * 2);
pDest2 = malloc(nSourceLen+1);
if (NULL == pDest1 || NULL == pDest2)
goto _f2j_err;
memset(pDest1, 0, nSourceLen * 2);
memset(pDest2, 0, nSourceLen + 1);
nDestLen = MultiByteToWideChar(code, 0, pSource, nSourceLen, (short *)pDest1, nSourceLen * 2);
if (0 >= nDestLen)
goto _f2j_err;
nDestLen = WideCharToMultiByte(code, 0, (short *)pDest1, nDestLen, pDest2, nSourceLen, NULL, NULL);
if (0 >= nDestLen)
goto _f2j_err;
RETVAL_STRING(pDest2, 1);
if (pDest1 != NULL)
free(pDest1);
if (pDest2 != NULL)
free(pDest2);
return;
_f2j_err:
if (pDest1 != NULL)
free(pDest1);
if (pDest2 != NULL)
free(pDest2);
RETURN_FALSE;
}
/* }}} */
/*
* Local variables:
* tab-width: 4
* c-basic-offset: 4
* End:
* vim600: noet sw=4 ts=4 fdm=marker
* vim<600: noet sw=4 ts=4
*/
.
事实上我们在这个文件里面定义了所有我们要实现的接口,剩下的部分就是我们再编写几个具体实现的C语言代码,有关C具体实现的技术细节就不在此讨 论,有个关键的大家注意就是,您可以在ext/toplee目录下加入您所有用于实现您在toplee.c里面定义的接口的C源文件和头文件,让 toplee.c在编译的时候可以调用到,这些都是标准的C语言语法。Michael就不另说,下我把我们实现的几个代码都贴出来:
chn_util.h
#ifndef __CHN_UTIL_H__
#define __CHN_UTIL_H__
#include "common.h"
#define LANG_GB 1
#define LANG_B5 2
#define GB_FULL_COUNT (20+26*2+5+4+26)
#define B5_FULL_COUNT (20+26*2+5+4+24)
BOOL FullToHalf(char *str, int nLang);
void LowerString(char* str);
void TrimString(char* str);
#endif // __CHN_UTIL_H__
.
chn_util.c
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include "common.h"
#include "chn_util.h"
// 0123456789!@()-_+'<>
static char *GBFull[GB_FULL_COUNT] =
{"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"", "@", "(", ")", "-", "_", "+", "'", "<", ">",
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",
"l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v",
"w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G",
"H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",
"S", "T", "U", "V", "W", "X", "Y", "Z",
"。", "·", ".", "﹒", "&",
"《", "〈", "〉", "》",
"﹐", ",", "﹔", ";", "﹕", ":", "﹖", "?", "﹗", "!", "—",
"‘", "’", "“", "”", "~", "∶", "`", "|", "[", "]", "{",
"}", "#", "$", "%"
};
static char GBEnHalf[GB_FULL_COUNT+1] =
"0123456789 @()-_+\'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
"....&<<>>,,;;::\?\?!!-\'\'\"\"~:`|[]{}#$%";
// ⒈⒉⒊⒋⒌⒍⒎⒏⌒∨∠ˇ≌≈
static char *B5Full[B5_FULL_COUNT] =
{"", "", "⒈", "⒉", "⒊", "⒋", "⒌", "⒍", "⒎", "⒏",
"", "", "", "", "⌒", "∨", "∠", "ˇ", "≌", "≈",
"㈤", "㈥", "㈦", "㈧", "㈨", "㈩", "", "", "Ⅰ", "Ⅱ", "Ⅲ",
"Ⅳ", "Ⅴ", "Ⅵ", "Ⅶ", "Ⅷ", "Ⅸ", "Ⅹ", "Ⅺ", "Ⅻ", "", "",
"", "", "", "", "⑾", "⑿", "⒀", "⒁", "⒂", "⒃", "⒄",
"⒅", "⒆", "⒇", "①", "②", "③", "④", "⑤", "⑥", "⑦", "⑧",
"⑨", "⑩", "", "", "㈠", "㈡", "㈢", "㈣",
"", "", "", "", "‘",
"", "", "", "",
"", "", "", "", "", "", "", "", "", "", "",
"ˉ", "ˇ", "¨", "〃", "°", "", "", "", "", "", "…",
"", ""
};
static char B5EnHalf[B5_FULL_COUNT+1] =
"0123456789 @()-_+\'<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
"....&<<>>,,;;::\?\?!!-\'\'\"\"~|[]{}#$%";
static int _bFHSortFlag=0;
static void _sorttable(char* tableFull[], char* tableHalf, int nSize)
{
int i,j;
char* p;
char cTemp;
for(i=0;i<nSize;i++)
{
for(j=i+1;j<nSize;j++)
{
if(strcmp(tableFull[i],tableFull[j])<0)
{
p=tableFull[i];
tableFull[i]=tableFull[j];
tableFull[j]=p;
cTemp=tableHalf[i];
tableHalf[i]=tableHalf[j];
tableHalf[j]=cTemp;
}
}
}
}
BOOL FullToHalf(char *str, int nCodePage)
{
char *pSrc = str;
char *pDest = str;
char **pFull;
char *pEnHalf;
int nCount;
BOOL bContinue = FALSE;
int nHigh,nLow,nMid,nResult;
if(!_bFHSortFlag)
{
_sorttable(GBFull,GBEnHalf, GB_FULL_COUNT);
_sorttable(B5Full,B5EnHalf, B5_FULL_COUNT);
_bFHSortFlag=1;
}
assert(NULL != str);
if ((LANG_GB == nCodePage) || (936==nCodePage))
{
pFull = GBFull;
pEnHalf = GBEnHalf;
nCount = GB_FULL_COUNT;
}
else if ((LANG_B5 == nCodePage) || (950==nCodePage))
{
pFull = B5Full;
pEnHalf = B5EnHalf;
nCount = B5_FULL_COUNT;
}
else
{
assert( FALSE );
return FALSE;
}
while ('\0' != *pSrc)
{
if (0x81 <= (BYTE)*pSrc)
{
// 改用二分法,可以极大提高效率
nLow=0;
nHigh=nCount-1;
while(nLow <= nHigh)
{
nMid = (nLow+nHigh) / 2;
nResult = strncmp(pSrc, pFull[nMid], 2);;
if( 0 == nResult)
{
*pDest++ = pEnHalf[nMid];
pSrc+=2;
bContinue=TRUE;
break;
}
if( nResult > 0)
nHigh=nMid-1;
else
nLow=nMid+1;
}
if( !bContinue )
{
// 判断其他符号
if( ( 0xA1 <= (BYTE)*pSrc ) &&
( 0xA9 >= (BYTE)*pSrc ) )
{
*pDest++ = ' ';
pSrc+=2;
bContinue=TRUE;
}
}
/* for (nIndex = 0; nIndex < nCount; nIndex++)
{
assert(NULL != pFull[nIndex]);
if (NULL != pFull[nIndex])
{
if (0 == strncmp(pSrc, pFull[nIndex], 2))
{
*pDest++ = pEnHalf[nIndex]; // convert full to half
pSrc += 2;
bContinue = TRUE;
break;
}
}
}*/
if (bContinue)
{
bContinue = FALSE;
continue;
}
*pDest++ = *pSrc++; // copy head char, and the next statement copy tail char
if(*pSrc == '\0')
break;
}
*pDest++ = *pSrc++; // ascii code
}
*pDest = '\0';
return TRUE;
}
BOOL MyIsDBCSLeadByte(BYTE TestChar)
{
if((TestChar>0X80) && (TestChar<0xFF))
return TRUE;
else
return FALSE;
}
void LowerString(char* str)
{
while(*str)
{
if(!MyIsDBCSLeadByte(*str))
{
if( (*str>='A') && (*str<='Z') )
*str = (char)(*str+('a'-'A'));
}
else
{
str++;
if(!*str)
break;
}
str++;
}
return ;
}
BOOL myisspace(char c)
{
return ((c==' ') || (c=='\t') || (c=='\r') || (c=='\n'));
}
void TrimString(char* str)
{
char* pDst;
char* pSrc;
char* pLast;
char cCurrent;
int nState;
pLast=pDst=pSrc=str;
nState=0;
while(*pSrc)
{
cCurrent=*pSrc;
switch(nState)
{
case 0:
if(!myisspace(cCurrent))
{
nState=1;
continue;
}
break;
case 1:
if(myisspace(cCurrent))
{
nState=2;
*pDst=cCurrent;
}
else
{
*pDst=cCurrent;
pLast=pDst+1;
}
pDst++;
break;
case 2:
if(myisspace(cCurrent))
{
*pDst=cCurrent;
}
else
{
*pDst=cCurrent;
pLast=pDst+1;
}
pDst++;
break;
}
pSrc++;
}
*pLast='\0';
return;
}
.
toplee_util.c
......
int ToBase64(void* pSrc,int nSrcLen, char* strBase64, int* nBase64Len)
{
static char *v = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
.......... 中间代码有长达3000多行,本文省略掉了 ........
void NormalizeName( char *p )
{
FullToHalf( p, CODE_PAGE_GBK );
TrimString( p );
LowerString( p );
}
.
toplee_util.h
#ifndef __TOPLEE_UTIL_INCLUDE__
#define __TOPLEE_UTIL_INCLUDE__ 1
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <string.h>
#include <stdlib.h>
#ifdef LINUX
#include <time.h>
#endif
#include "common.h"
//#include "euc2uni.h"
/*
typedef int BOOL;
*/
#ifndef TRUE
#define TRUE 1
#define FALSE 0
#endif
#define ASCII 0
#define HZ_HEAD 1
#define HZ_TAIL 2
#ifdef BIG_ENDDING
#define DEFAULT_UNICODE 0x3000
#define DEFAULT_GBK_CODE 0xA1A1
#define DEFAULT_BIG5_CODE 0xA140
#else
#define DEFAULT_UNICODE 0x0030
#define DEFAULT_GBK_CODE 0xA1A1
#define DEFAULT_BIG5_CODE 0x40A1
#endif
#define CODE_PAGE_GBK 936
#define CODE_PAGE_BIG5 950
#define CODE_PAGE_EUC 932
#define CHARSET_DEFAULT 0
#define CHARSET_UNICODE 1
#define CHARSET_UTF8 2
// 24066 = ( 0xFE - 0x81 + 1 ) * ( 0xFE - 0x40 + 1)
#define GBK_COUNT 24066
// 16999 = ( 0xF9 - 0xA1 + 1 ) * ( 0xFE - 0x40 + 1)
#define BIG5_COUNT 16999
typedef struct tagMMapFile2
{
BOOL bUsed;
struct stat finfo;
void *mm;
} MMapFile;
//int LoadEuc2UniTable(char *strFileName);
//void FreeEuc2UniTable(void);
int ToBase64(void* pSrc,int nSrcLen, char* strBase64, int* nBase64Len);
int FromBase64(char* strSrc, int nSrcLen, void* pDest, int* nDestLen);
int htmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
int MultiByteToWideChar(unsigned int uCodePage, unsigned long lFlags,
char *pMultiByteStr, int nMultiByte,
unsigned short *pWideChar, int nWideChar);
int WideCharToMultiByte(unsigned int uCodePage, unsigned long dwFlags,
unsigned short *pWideCharStr, int nWideChar,
char *pMultiByteStr, int nMultiByte,
const char* lpDefaultChar, int* lpUseDefaultChar);
#define ASCII 0
#define HZ_HEAD 1
#define HZ_TAIL 2
void GBK2BIG5(char *lpString, int cbString);
void BIG52GBK(char *lpString, int cbString);
void LowerString(char *str);
void TrimString(char *str);
void DecodeFormString(char *str);
void DecodeUTF(char *str);
#define DECODE_UNICODE 0
#define KEEP_UNICODE 1
#define DECODE_GBK 0
#define DECODE_BIG5 2
int DecodePureUTF(unsigned char *str, int nFlag);
#define LANG_GB 1 // used by httpstrtoint and FullToHalf
#define LANG_B5 2
#define LANG_ENG 3
#define LANG_UNKNOWN 4
int httpstrtoint(char* strHttp);
void lowerhttpprefix(char* strUrl);
#define FULL_COUNT (21+26*2+5)
BOOL FullToHalf(char *str, int nLang);
#define URLDESCSEPCHAR '|'
char* DescriptFromUrl(char* strUrl);
#define CODE_GBK2UNI 1
#define CODE_UNI2GBK 2
#define CODE_BIG52UNI 3
#define CODE_UNI2BIG5 4
#define CODE_GBK2BIG5 5
#define CODE_BIG52GBK 6
const char *mmapOneFile(char *pFileName, MMapFile *mmapfile);
void toplee_cleanup_mmap(void *dummy);
void InitMMResource(void);
const char* LoadOneCodeTable(int nType, char* strFileName);
int getcuryear();
char* mstrncpy(char* strDest, char* strSrc, size_t nCount);
int formurlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
int wmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
int htmlencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
#define MAX_INTERNAL_BUFF 16384
int gb2uni_encode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
int unicodeencode(char* strInput, int nInputLen, char* strOutBuf, int nOutBufLen);
char *stristr(const char *big, const char *little);
typedef struct auto_string
{
int len, inc_len;
char *strval;
}struAutoString;
#define DEF_INC_LEN (1024)
#define DEF_INT_LEN 12
void init_auto_string(struAutoString *astr, int inc_len);
int add_auto_string(struAutoString *astr, char *new_str);
void free_auto_string(struAutoString *astr);
int unistrcmp(const char *str1, int str1len, const char *str2, int str2len);
void NormalizeName( char *p );
#endif // __TOPLEE_UTIL_INCLUDE__
.
php_toplee.h
/*
+----------------------------------------------------------------------+
| PHP Version 4 |
+----------------------------------------------------------------------+
| Copyright (c) 1997-2002 The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 2.02 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available at through the world-wide-web at |
| http://www.php.net/license/2_02.txt. |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| license@php.net so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Author: |
+----------------------------------------------------------------------+
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/
#ifndef PHP_GBK_H
#define PHP_GBK_H
extern zend_module_entry gbk_module_entry;
#define phpext_gbk_ptr &gbk_module_entry
#ifdef PHP_WIN32
#define PHP_GBK_API __declspec(dllexport)
#else
#define PHP_GBK_API
#endif
#ifdef ZTS
#include "TSRM.h"
#endif
PHP_MINIT_FUNCTION(gbk);
PHP_MSHUTDOWN_FUNCTION(gbk);
PHP_RINIT_FUNCTION(gbk);
PHP_RSHUTDOWN_FUNCTION(gbk);
PHP_MINFO_FUNCTION(gbk);
PHP_FUNCTION(confirm_gbk_compiled); /* For testing, remove later. */
PHP_FUNCTION(toplee_decode_utf);
PHP_FUNCTION(toplee_decode_utf_gb);
PHP_FUNCTION(toplee_decode_utf_big5);
PHP_FUNCTION(toplee_encode_utf_gb);
PHP_FUNCTION(toplee_big52gbk);
PHP_FUNCTION(toplee_gbk2big5);
PHP_FUNCTION(toplee_fan2jian);
PHP_FUNCTION(toplee_normalize_name);
/*
Declare any global variables you may need between the BEGIN
and END macros here:
ZEND_BEGIN_MODULE_GLOBALS(gbk)
int global_value;
char *global_string;
ZEND_END_MODULE_GLOBALS(gbk)
*/
/* In every utility function you add that needs to use variables
in php_gbk_globals, call TSRM_FETCH(); after declaring other
variables used by that function, or better yet, pass in TSRMLS_CC
after the last function argument and declare your utility function
with TSRMLS_DC after the last declared argument. Always refer to
the globals in your function as GBK_G(variable). You are
encouraged to rename these macros something shorter, see
examples in any other php module directory.
*/
#ifdef ZTS
#define GBK_G(v) TSRMG(gbk_globals_id, zend_gbk_globals *, v)
#else
#define GBK_G(v) (gbk_globals.v)
#endif
#endif /* PHP_GBK_H */
/*
* Local variables:
* tab-width: 4
* c-basic-offset: 4
* indent-tabs-mode: t
* End:
*/
.
至此,我们完成了所有C 代码的编写,本模块实现还需要用到几个码表文件,比如gb2b5.tab,uni2gb.tab之类的,这些码表文件我就不提供了,可以查一些文档如何生成,网上也有很多这样的tab码表文件下载。
接下来,我们就可以进行测试和编译了
回到php源码的根目录,运行命令
#./buildconf
#./configure –with-toplee=shared ……
#./make
#./make install
此时,就完成了模块往php里面的编译,由于加上了shared参数,toplee模块将编译后生成 toplee.so,可以在php.ini或者extensions.ini文件里面使用extension=toplee.so来调用,也可以在php 中使用dl()函数动态调用,然后就可以在php里面使用之前我们定义好的几个函数接口了。
因Michael技术实力有限,本文有不正确之处请高手指正,也希望通过本文起到抛砖引玉之效果,让更多的php爱好者一起来分享个人的宝贵经验!