分享
 
 
 

Inside MSXML Performance(MSXML性能分析)

王朝other·作者佚名  2006-01-08
窄屏简体版  字體: |||超大  

Inside MSXML Performance

MSXML性能分析

Chris Lovett

Microsoft Corporation

February 21, 2000

Download the source code for this article (1.17MB)

下载本文中示例的源代码

Contents

Metrics

MSXML Features

Working Set

Megabytes Per Second

Attributes vs. Elements

First DOM Walk Working Set Delta

createNode Overhead

Walk vs. selectSingleNode

Save

Namespaces

Free-Threaded Documents

Delayed Memory Cleanup

Virtual Memory

IDispatch

Scripting

The Dreaded "//" Operator

Prune the Search Tree

Cross-Threading Models

Conclusion

目录

度量指标

MSXML特点

工作空间

百兆字节每秒

属性与元素

第一次DOM树遍历引起的工作空间增量

提前createNode

遍历与selectSingleNode

保存

名字空间

自由线程文档

延时的内存释放

虚拟内存

IDispatch

脚本

令人担心的“//”运算符

修剪查询树

交叉线程模式

小结

I definitely got the message from your online comments that we need more "novice-level" material and some real XML applications. However, this article was already in the pipeline-and is intended for the advanced XML developer. (After all, this column is called "Extreme XML"!) That said, this article assumes you are familiar with XML and the Microsoft XML Parser (MSXML) in particular. See the MSDN XML Developer's Center for more information.

我从网上很多评论中得知,大家需要更多的是入门级的资料和一些XML的实际应用举例。但是,本文已经基本成稿并且针对的是高级XML开发人员(毕竟,本专栏的名称叫“极限XML”!)。这就是说,本文的读者应该是比较熟悉XML和Microsoft XML解析器的。要得到更多相关信息,请查阅MSDN XML Developer's Center

So, you're designing your XML-based Web application and you need to know what kind of performance to expect from your XML server. Obviously, this depends a lot on what processing you plan to do. It is hard to generalize, because there are so many variables—such as the size of the XML documents, the amount of script code required to process the documents, the amount of output generated, and so on.

因此,你可能正在设计基于XML的Web应用程序,而且你需要知道XML服务器的工作性能到底怎样。显然,这是由同你的处理过程密切相关。这很难概括来说,因为有太多的因素可以影响它的性能——如XML文档的大小,处理文档所使用的脚本代码的多少,产生输出的多少等等。

For example, major variables that can affect the performance of MSXML include:

例如,主要影响MSXML性能的因素有:

· The kind of XML data

· The ratio of tags to text

· The ratio of attributes to elements

· The amount of discarded white space

· XML数据的种类

· 标签对文字的比例

· 属性对元素的比例

· 可忽略的空格的数量

To illustrate some of these variables, I'll use four sample data files. Shown below is a snippet from each file to show you what each looks like:

为了说明各个因素,在此使用4个样本数据文件。一下就是这些文件中抽取的片段示例:

Ado.xml

This sample file is a persistently saved ADO Recordset object—and is extremely attribute heavy. Each attribute value is short, with little wasted white space, making it a data-dense document.

这个样本文件被永久保存的ADO Recordset对象,它充满了属性。每一个属性的值很短,没有什么空格,是一个数据密集的文档。

<rsSchema:row au_id='267-41-2394' au_lname='O'Leary' au_fname='Michael'

phone='408 286-2428' address='22 Cleveland Av. #14' city='San Jose' state='CA'

zip='95128' contract='True' name='systypes' id='4' uid='1' type='S ' userstat='0'

sysstat='113' indexdel='0' schema_ver='1' refdate='1900-01-01T00:00:00'

crdate='1996-04-03T03:38:57.387000000' version='0' deltrig='0' instrig='0'

updtrig='0' seltrig='0' category='0' cache='0'/>

Hamlet.xml

This sample file consists of Shakespeare's play "Hamlet." The file is a well -balanced combination of text and element markup, with no attributes.

这个文件包含了莎士比亚的剧本“哈姆雷特”。它由文字和元素标签组成,没有任何属性。

<SCENE><TITLE>SCENE I. Elsinore. A platform before the castle.</TITLE>

<STAGEDIR>FRANCISCO at his post. Enter to him BERNARDO</STAGEDIR>

<SPEECH>

<SPEAKER>BERNARDO</SPEAKER>

<LINE>Who's there?</LINE>

</SPEECH>

Ot.xml

This sample file consists of the entire Old Testament. Each tag is only one or two characters, which reduces the tag-to-text ratio.

这个文件包含了整本旧约全书。每个标签只有一到两个字符,降低了标签对文字的比例

<book>

<bktlong>The First Book of Moses, Called GENESIS.</bktlong>

<bktshort>Genesis</bktshort>

<chapter><chtitle>Chapter 1</chtitle>

<v><vn>1</vn><p>In the beginning God created the heaven and the earth.</p></v>

...

Northwind.xml

This sample file contains a portion of the Northwind database that ships with Microsoft Access. It uses elements instead of attributes, and has a high tag-to-text ratio, and has a lot of extra white space.

本样品包含了Microsoft Access附带的Northwind数据库的一部分。它使用元素而不是属性,有很高的标签对文字比例,还有很多多余的空格。

<OrderIDs>

<Item>

<OrderID> 10326</OrderID>

<OrderDate> 11/10/94</OrderDate>

<ShipAddress> C/ Araquil, 67</ShipAddress>

</Item>

...

Another major factor is whether the original file is stored as UCS-2. For most XML documents in English, UTF-8 is half the size of UCS-2 because the Latin characters compress down to a single byte in UTF-8. But this is not true for all languages. For some Asian languages, UTF-8 is actually larger than UCS-2, because it can expand to three bytes per character in the worst case. To be fair, the best format to use for measuring performance is UCS-2 on disk so that the numbers are more globally meaningful.

另一个主要因素是文件是否以UCS-2格式编码。由于大多数XML文档是英文的,UTF-8的大小是UCS-2的一半,因为拉丁字符在UTF-8中压缩到了一个字节。但是在对于其他语言来说并不一样。比如,对于一些亚洲语言,UTF-8比UCS-2更大,因为在最坏情况下它将每个字符扩展到三个字节。为了公正起见,度量性能的最好格式应该是UCS-2,这样更适应全球化的情况。

The following table shows the UCS-2 file sizes, number of unique names, number of elements and attributes, number of text nodes, and amount of text content (in Unicode characters) for each of our sample files. It also shows a "tagginess factor," which is the ratio of element and attribute name characters to the rest of the file.

下表显示了四个样品文件的UCS-2文件大小,唯一名的数量,元素和属性的数量,文本节点的数量和文字内容的数量(Unicode字符)。它还显示了标签比重,表示元素和属性名字符对文件中其他字符的比例。

Sample

样品

File size

文件大小

Unique names

唯一名

Elements and attributes元素和属性

Text nodes

文字节点

Text content (characters)

文本内容(字符数)

Tagginess (percentage)

标签比重(百分比)

Ado.xml

2,171,812

53

63,722

61,462

3890

18.7

Hamlet.xml

559,260

17

6637

5472

170,545

5.9

Ot.xml

7,663,624

12

71,417

47,302

3,236,900

1.4

Northwind.xml

488,140

12

3680

2761

31,155

6.0

The number of unique names is interesting because MSXML "atomizes" element and attribute names, meaning it creates only one string object for each unique name and points to that object from each element or attribute that shares the same name. This is important because the names of elements and attributes are typically highly repetitive. For example, the Ado.xml sample actually contains 63,722 element and attribute names, which consume a total of 407,148 bytes of the overall file size. This is a tag-to-file size ratio of over 18 percent! But out of all these names remain only 53 unique names. So instead of using 407 KB of memory to store them, they can be stored in just a few kilobytes.

唯一名数量很有趣,因为MSXML“原子化”了元素和属性的名字,这意味着它对于每个唯一名只创建一个字符串对象,指向有相同名字的元素和属性。这很重要,因为元素和属性名通常重复性很高。例如,在Ado.xml样本文件中,实际有63,722个元素和属性名,在整个文件中占了407,148字节。这里的标签对文件的比例超过了18%!但是这些名字中只有53个唯一名。所以不必用407KB的内存来存储了,只需要很少的内存就够了。

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有