Gb2312转utf-8(vbs+js) - 王朝网络宽屏版

昨天看了一下cocoon counter的代码，发现里面是用vbScript转的，费了以上午时间来研究，还是被搞得晕糊糊- -

他的vb转换函数是这样的：

Function DeCodeAnsi(s)

Dim i, sTmp, sResult, sTmp1

sResult = ""

For i=1 To Len(s)

If Mid(s,i,1)="%" Then

sTmp = "&H" & Mid(s,i+1,2)

If isNumeric(sTmp) Then

If CInt(sTmp)=0 Then

i = i + 2

ElseIf CInt(sTmp)>0 And CInt(sTmp)<128 Then

sResult = sResult & Chr(sTmp)

i = i + 2

Else

If Mid(s,i+3,1)="%" Then

sTmp1 = "&H" & Mid(s,i+4,2)

If isNumeric(sTmp1) Then

sResult = sResult & Chr(CInt(sTmp)*16*16 + CInt(sTmp1))

i = i + 5

End If

Else

sResult = sResult & Chr(sTmp)

i = i + 2

End If

Else

sResult = sResult & Mid(s,i,1)

End If

Else

sResult = sResult & Mid(s,i,1)

End If

DeCodeAnsi = sResult

End Function

也就是用chr()函数把10进制的ANSI 字符代码转换成文字。文字本身应该是unicode，也就是vbs自动完成了gb-utf的转换，下面是我测试的一些数据：

测试代码：（需要把上面的代码加在前面）

Response.write("<br/>strx = chr(54992):");

Response.write(strx);

Response.write("<br/>strx.charCodeAt(0):");

Response.write(strx.charCodeAt(0));

Response.write("<br/>\"中\".charCodeAt(0):");

Response.write("中".charCodeAt(0));

Response.write("<br/>escape(strx):");

Response.write(escape(strx));

Response.write("<br/>encodeURI(strx):");

Response.write(encodeURI(strx));

Response.write("<br/>escape(\"中\"):");

Response.write(escape("中"));

Response.write("<br/>String.fromCharCode(20013):");

Response.write(String.fromCharCode(20013));

</SCRIPT>

分别调整文件存储格式，codepage，charset得到的结果：

文件为ansi格式：

codepage=936：

Response.Charset = "gb2312";

strx = chr(54992)

strx:中

strx.charCodeAt(0):20013

"中".charCodeAt(0):20013

escape(strx):%u4E2D

encodeURI(strx):%E4%B8%AD

escape("中"):%u4E2D

String.fromCharCode(20013):中

Response.Charset = "utf-8";

strx = chr(54992)

strx:֐

strx.charCodeAt(0):20013

"֐".charCodeAt(0):20013

escape(strx):%u4E2D

encodeURI(strx):%E4%B8%AD

escape("֐"):%u4E2D

String.fromCharCode(20013):֐

codepage=65001:

Response.Charset = "gb2312";

strx = chr(54992)

strx:涓

strx.charCodeAt(0):20013

"".charCodeAt(0):-1.#IND

escape(strx):%u4E2D

encodeURI(strx):%E4%B8%AD

escape(""):

String.fromCharCode(20013):涓

Response.Charset = "utf-8";

strx = chr(54992)

strx:㝤

strx.charCodeAt(0):14180

"".charCodeAt(0):-1.#IND

escape(strx):%u3764

encodeURI(strx):%E3%9D%A4

escape(""):

String.fromCharCode(20013):中

文件为utf-8格式：

codepage=65001:

Response.Charset = "gb2312";

strx = chr(54992)

strx:涓

strx.charCodeAt(0):20013

"涓?.charCodeAt(0):20013

escape(strx):%u4E2D

encodeURI(strx):%E4%B8%AD

escape("涓?):%u4E2D

String.fromCharCode(20013):涓

Response.Charset = "utf-8";

strx = chr(54992)

strx:中

strx.charCodeAt(0):20013

"中".charCodeAt(0):20013

escape(strx):%u4E2D

encodeURI(strx):%E4%B8%AD

escape("中"):%u4E2D

String.fromCharCode(20013):中

codepage=936：

Active Server Pages 错误 'ASP 0245'

代码页值的混合使用

/referer_alapha/test2.asp，行 1

指定的 @CODEPAGE 值与包括文件的 CODEPAGE 或文件的保存格式的值不一致。

哈哈，是不是看晕了？我也晕，搞不明白为什么文件存储的格式跟chr(54992)这个函数怎么会扯上关系，而String.fromCharCode(20013)可以得到正确结果（测试的第四部分数据）。大概是Vbs里面逻辑太混乱了。

不管怎样，有了这个方法，gb2312转utf-8简单多了。