分享
 
 
 

Inside MSXML Performance(MSXML性能分析) (3)

王朝other·作者佚名  2006-01-08
窄屏简体版  字體: |||超大  

MSXML Features

MSXML特点

Next, let's examine some important scenarios associated with the Document Object Model (DOM)—including loading, saving, walking a DOM tree, and creating a new DOM tree in memory.

接下去,让我们讨论一些在文档对象模型(DOM)中很重要的场景,包括载入,保存,遍历DOM树和在内存中创建一个新的DOM树。

DOM

The MSXML Document Object Model ("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument) is the starting point for all XML processing within the MSXML parser. The fastest way to load an XML document is to use the default "rental" threading model (which means the DOM document can be used by only one thread at a time; it doesn't matter which thread) with validateOnParse, resolveExternals, and preserveWhiteSpace all disabled:

MSXML文档对象模型("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument)是MSXML解析器中所有处理XML过程的起始点。载入一个XML文档的最快的方法是使用默认的“租用”线程模式(这意味着该DOM文档同时只有一个线程能使用;但它并不介意是哪一个线程使用),必须将validateOnParse, resolveExternals和 preserveWhiteSpace的属性设为False:

var doc = new ActiveXObject("Microsoft.XMLDOM");

doc.validateOnParse = false;

doc.resolveExternals = false;

doc.preserveWhiteSpace = false;

doc.load("test.xml");

Working Set

工作集

When using the DOM, the first metric to consider is the working set. Memory is used to load Msxml.dll and the other .dll files on which it depends. Some of these other .dll files are "delay loaded," which means the working set won't be affected until that .dll is used. MSXML is a COM DLL, so you typically use the standard COM APIs (CoInitialize and CoCreateInstance) to create a new XML document object. The minimum working set for a simple Visual C++ 6.0 command line application that uses COM is about one megabyte. (This includes the following .dll files: Ntdll.dll, Kernel32.dll, Ole32.dll, Rpcrt4.dll, Advapi32.dll, Gdi.dll, User32.dll, and Oleaut32.dll.) The first call to CoCreateInstance of an IXMLDOMDocument object loads Msxml.dll and Shlwapi.dll, which adds another 745 KB on top of this. Once all the .dll files are loaded, a new IXMLDOMDocument object is only about 8 KB.

当使用DOM时,首先要考虑的度量指标是工作集。内存中载入了Msxml.dll和其他必须的dll文件。这些dll文件中有的是延时载入的,就是说它们在没有使用之前并不影响工作集。MSXML是一个COM DLL,所以你通常使用标准COM API(CoInitialize 和CoCreateInstance)来创建一个新的XML文档对象。对于一个简单的使用COM的Visual C++6.0命令行应用程序最少的工作集是1兆字节左右。(这包含了以下dll文件:Ntdll.dll,Kernel32.dll,Ole32.dll,Rpcrt4.dll,Advapi32.dll,Gdi.dll,User32.dll和Oleaut32.dll。)首次调用CoCreateInstance创建IXMLDOMDocument对象时载入Msxml.dll和Shlwaip.dll,在前面的基础上又增加了745KB。一旦所有的dll文件载入后,新建的IXMLDocument对象只需要8KB空间。

The memory used by the XML data loaded into an XML document is anywhere from one to four times the size of the XML file on disk, depending on the "tagginess" of the data being loaded and whether the file was already in a Unicode format on disk. The following is a very rough formula for estimating the memory required for a given XML document:

内存中XML数据的大小可能是XML文件在磁盘上大小的一至四倍,这取决于载入数据的“标签比重”和它在磁盘上是否已经是Unicode编码格式的。以下是一个粗略的公式,用来估计给定的XML文档需要的内存空间大小:

ws = 32(n+t) + 12t + 50u + 2w;

The following table describes the parts of the formula:

下表介绍了公式中的各个部分:

Part

项目

Description

描述

ws

The working set in bytes.

工作集的大小(单位为字节)

n

The number of element and attribute nodes in the tree. Each element, attribute, attribute value, and text content has one node (for example, <element attribute = "value">text</element> = four nodes).

树中元素和属性节点的数量。每一个元素,属性,属性的值和文本内容都有一个节点(例如,<element attribute = "value">text</element> 共四个节点)

t

The number of text nodes.

文本节点的数量

u

The number of unique element and attribute names.

元素和属性的唯一名数量。

w

The number of Unicode characters in text content (including attribute values). Note that loading single-byte ANSI text into memory results in twice the number, because all text is stored as Unicode characters, which are two bytes each.

文本内容中Unicode字符的数量(包括属性值)。注意,将单字节的ANSI文本载入内存后会占用两倍的空间大小,因为它们会以Unicode字符存储,每个字符占用两个字节。

This assumes you do not set the preserveWhiteSpace flag; when you do, more nodes are created to preserve the white space between elements, using more memory.

以上公式是基于没有设置preserveWhiteSpace标志的情况;当你设置该标志时,会创建更多的节点来保留元素之间的空格,这样就会占用更多的内存空间。

For the sample data above, we see the following working set numbers (not including the initial startup working set):

对于前述的样品文件,以下表格显示了所需的工作空间大小(不包括工作空间初始化时的工作空间):

Sample

样品

Working set

工作空间

Ratio to file size

与磁盘文件大小的比例

Ado.xml

4,689,920

2.16

Hamlet.xml

704,512

1.25

Ot.xml

10,720,000

1.39

Northwind.xml

249,856

0.51

An element-heavy XML document containing a lot of white space between elements and stored in Unicode can actually be smaller in memory than on disk. Files that have a more balanced ratio of elements to text content, such as Hamlet.xml and Ot.xml, end up at about 1.25 to 1.5 the UCS-2 file size when in memory. Files that are very data-dense, such as Ado.xml, end up more than twice the disk-file size when loaded into memory.

一个元素比重很大,在各元素之间有很多空格并且以Unicode格式存储的XML文档可能在内存空间所需的空间比在磁盘上要少。而元素和文本内容比较平衡的文档,如Hamlet.xml和Ot.xml,可能在内存中所占空间与在磁盘上以UCS-2格式占用的空间大小比为1.25至1.5。而那些数据密集型的文档,就像Ado.xml那样,占用的内存空间可能会是在磁盘上大小的两倍或者更多。

Megabytes Per Second

百兆字节每秒

For the megabytes-per-second metric, I loaded each sample file 10 times in a loop on a Pentium II 450-MHz dual-processor computer running Windows 2000, measured the load times, and averaged the results.

对于百兆字节每秒这个度量指标,我通过以下试验来衡量载入时间:在Pentium II 450-MHz双处理器,运行Windows 2000的计算机上,将每个样品文件循环载入10次,得到载入时间,并进行平均,结果如下表所示:

Sample

样品

Load time (milliseconds)

载入时间(单位:毫秒)

MB/second

MB/秒

Nodes/second

节点/秒

Ado.xml

677

3.2

184,909

Hamlet.xml

104

5.3

116,432

Ot.xml

1063

7.2

111,682

Northwind.xml

62

7.8

103,887

Also shown in this table is a measure of nodes per second. Notice how this correlates with megabytes per second. The more nodes processed per buffer of input data, the slower the absolute throughput. Conversely, the more compact the nodes are (as in Ado.xml), the higher the nodes per second.

在上面的表格中还显示了节点/秒的测试结果。请注意它与百兆字节每秒之间的关系。每个输入数据的缓冲区中节点数量越多,输出的绝对量就越少。相反,节点越紧凑(就像Ado.xml那样),每秒处理的节点数就越多。

Attributes vs. Elements

属性与元素

You could conclude from this that attribute-heavy formats (such as that of Ado.xml) deliver more data per second than element-heavy formats. But this should not be the reason for you to switch everything to attributes. There are many other factors to consider in the decision to use attributes versus elements.

你可以从上面得到结论:属性比重大的格式(就像Ado.xml那样)比元素比重大的格式每秒传递的数据量更大。但是这并不是要你将所有的东西都用属性来表达。在考虑使用元素还是属性时,还有很多其他的因素要斟酌。

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有