分享
 
 
 

Parsing html markup text using MSHTML

王朝c#·作者佚名  2006-12-17
窄屏简体版  字體: |||超大  

Parsing html markup text using MSHTML

By Hendrik Swanepoel

Introduction:

Often working with content in the form of html, I have needed to manipulate the content intelligently. I accomplished this by using regular expressions to 'parse' the html to find certain tags. This enabled me to look for certain tags with certain attributes, etc.

This works well enough, but some people aren't familiar with regular expression syntax and struggle to maintain and extend the code for manipulating the markup.

A much simpler and developer-friendly option is to reference the mshtml object. I will illustrate the use of this object with an over simplified example. I am going to mention regular expressions, but I'm not going to go into the syntax or even show any statements - it's a totally different subject altogether.

-----------------------------------------

Problem scenario:

My pages in my website contains elements with formatting elements hard coded onto them, instead of having all the formatting set through a class reference to a stylesheet.

This means that I will have an element with it's bgcolor attribute set to 'blue' and it's border attribute set to '1'. For example:

<p bgcolor='blue' color='red' border='1'>bla di bla bla</p>

I want to set a class name attribute on all the elements, with a combination of these two attributes with the same values. Meaning any element having a bgcolor of 'blue' and a border of '1'. The following will qualify too:

<td bgcolor=blue id='mytd' onclick='alert('clicked');' border='1'>Hello</td>

So how can I find all the instances of tags that have these two attributes with the correct values in the markup? A normal string operation will not suffice. So a regular expression solution is sufficient. But when the border and bgcolor sequence is switched it adds a whole new level of complexity to the regular expression, for example:

<td border='1' id='mytd' onclick='alert('clicked');' bgcolor=blue>Hello</td>Now we can't assume that the bgcolor attribute will be found first and then the border attribute. And what about when we want to search on three attributes?

Now we can't assume that the bgcolor attribute will be found first and then the border attribute. And what about when we want to search on three attributes?

Solution

What we want to do is loop through the html elements in the markup and look for elements that satisfy our requirements, and we check this by accessing the attributes in a non-sequential, natural manner. If all the attributes are satisfied, then the tag qualifies for the update.

We need a way to let our method know what attributes to look for, their corresponding values and the new attribute key/value pairs to set ons this object.

Code

We have to add a reference to the mshtml object

In the solution explorer, highlight the project to which you want to add the parsing functionality

In the menu, click on Project -> Add reference

In the dialog box that is shown, under the .Net tab - choose the Microsoft.mshtml assembly

Click the select button and click on the OK button

Now we can reference this assembly

using mshtml;

Our class will contain one method, this method will take 3 parameters.

A string containing the markup to parse, an arraylist populated with key/value pairs that needs to be present on an object to qualify for the update and an arraylist populated with new key/value pairs to be set on the qualified objects.

We also have a struct to aid us as a container for our attribute key/value pairs.

IHTMLDocument2 doc = el.setAttribute(setAtt.key, setAtt.val, 0); } } }

Using the code

We want to parse this html, look for tags (of any kind) that has the following attributes:

bgcolor=blue border=1

When a tag is found which qualifies, the className (translates to class in html, but the DOM property is className) property of the element will be set to 'blueBorder'.

ArrayList searchList = ArrayList setList = markupContent = ServerParse.UpdateAttributes(markupContent , searchList, setList);

The resulting text

Conclusion

You can use this anywhere where you want to manipulate the markup based on a search.

And it's a much simpler process than using regular expressions.

It can also be used to perform functions on markup in a windows application.

Hendrik lives in South-Africa, has been developing for 4+ years and specializes in the .Net framework. You learn more about him at http://dotnet.org.za/hendrik

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有