分享
 
 
 

JDOM and XML Parsing, Part 1 (收藏来源于 www.oracle.com)

王朝java/jsp·作者佚名  2006-01-09
窄屏简体版  字體: |||超大  

jdom 使JAVA处理XML变的更加简单!

Documents are represented by the org.jdom.Documentclass. You can construct a document from scratch:

// This builds:

<root/>

Document doc = new Document(new Element("root"));

Or you can build a document from a file, stream, system ID, or URL:

// This builds a document of whatever's in the given resource

SAXBuilder builder = new SAXBuilder();

Document doc = builder.build(url);

Putting together a few calls makes it easy to create a simple document in JDOM:

// This builds: <root>This is the root</root>

Document doc = new Document();

Element e = new Element("root");

e.setText("This is the root");

doc.addContent(e);

If you're a power user, you may prefer to use "method chaining," in which multiple methods are called in sequence. This works because the set methods return the object on which they acted. Here's how that looks:

Document doc = new Document(

new Element("root").setText("This is the root"));

For a little comparison, here's how you'd create the same document, using JAXP/DOM:

// JAXP/DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.newDocument();

Element root = doc.createElement("root");

Text text = doc.createText("This is the root");

root.appendChild(text);

doc.appendChild(root);

Writing with XMLOutputter

A document can be output to many different formats, but the most common is a stream of bytes. In JDOM, the XMLOutputter class provides this capability. Its default no-argument constructor attempts to faithfully output a document exactly as stored in memory. The following code produces a raw representation of a document to a file.

// Raw output

XMLOutputter outp = new XMLOutputter();

outp.output(doc, fileStream);

If you don't care about whitespace, you can enable trimming of text blocks and save a little bandwidth:

// Compressed output

outp.setTextTrim(true);

outp.output(doc, socketStream);

If you'd like the document pretty-printed for human display, you can add some indent whitespace and turn on new lines:

outp.setTextTrim(true);

outp.setIndent(" ");

outp.setNewlines(true);

outp.output(doc, System.out);

When pretty-printing a document that already has formatting whitespace, be sure to enable trimming. Otherwise, you'll add formatting on top of formatting and make something ugly.

Navigating the Element Tree

JDOM makes navigating the element tree quite easy. To get the root element, call:

Element root = doc.getRootElement();

To get a list of all its child elements:

List allChildren = root.getChildren();

To get just the elements with a given name:

List namedChildren = root.getChildren("name");

And to get just the first element with a given name:

Element child = root.getChild("name");

The List returned by the getChildren() call is a java.util.List, an implementation of the List interface all Java programmers know. What's interesting about the List is that it's live. Any changes to the List are immediately reflected in the backing document.

// Remove the fourth child

allChildren.remove(3);

// Remove children named "jack"

allChildren.removeAll(root.getChildren("jack"));

// Add a new child, at the tail or at the head

allChildren.add(new Element("jane"));

allChildren.add(0, new Element("jill"));

Using the List metaphor makes possible many element manipulations without adding a plethora of methods. For convenience, however, the common tasks of adding elements at the end or removing named elements have methods on Element itself and don't require obtaining the List first:

root.removeChildren("jill");

root.addContent(new Element("jenny"));

One nice perk with JDOM is how easy it can be to move elements within a document or between documents. It's the same code in both cases:

Element movable = new Element("movable");

parent1.addContent(movable); // place

parent1.removeContent(movable); // remove

parent2.addContent(movable); // add

With DOM, moving elements is not as easy, because in DOM elements are tied to their build tool. Thus a DOM element must be "imported" when moving between documents.

With JDOM the only thing you need to remember is to remove an element before adding it somewhere else, so that you don't create loops in the tree. There's a detach() method that makes the detach/add a one-liner:

parent3.addContent(movable.detach());

If you forget to detach an element before adding it to another parent, the library will throw an exception (with a truly precise and helpful error message). The library also checks Element names and content to make sure they don't include inappropriate characters such as spaces. It also verifies other rules, such as having only one root element, consistent namespace declarations, lack of forbidden character sequences in comments and CDATA sections, and so on. This feature pushes "well-formedness" error checking as early in the process as possible.

Handling Element Attributes

Element attributes look like this:

<table width="100%" border="0"> ... </table>

With a reference to an element, you can ask the element for any named attribute value:

String val = table.getAttributeValue("width");

You can also get the attribute as an object, for performing special manipulations such as type conversions:

Attribute border = table.getAttribute("border");

int size = border.getIntValue();

To set or change an attribute, use setAttribute():

table.setAttribute("vspace", "0");

To remove an attribute, use removeAttribute():

table.removeAttribute("vspace");

Working with Element Text Content An element with text content looks like this:

<description>

A cool demo

</description>

In JDOM, the text is directly available by calling:

String desc = description.getText();

Just remember, because the XML 1.0 specification requires whitespace to be preserved, this returns "\n A cool demo\n". Of course, as a practical programmer you often don't want to be so literal about formatting whitespace, so there's a convenient method for retrieving the text while ignoring surrounding whitespace:

String betterDesc = description.getTextTrim();

If you really want whitespace out of the picture, there's even a getTextNormalize() method that normalizes internal whitespace with a single space. It's handy for text content like this:

<description>

Sometimes you have text content with formatting

space within the string.

</description>

To change text content, there's a setText() method:

description.setText("A new description");

Any special characters within the text will be interpreted correctly as a character and escaped on output as needed to maintain the appropriate semantics. Let's say you make this call:

element.setText("<xml/> content");

The internal store will keep that literal string as characters. There will be no implicit parsing of the content. On output, you'll see this:

&lt;xml/&gt; content<elt>

This behavior preserves the semantic meaning of the earlier setText() call. If you want XML content held within an element, you must add the appropriate JDOM child element objects.

Handling CDATA sections is also possible within JDOM. A CDATA section indicates a block of text that shouldn't be parsed. It is essentially a "syntactic sugar" that allows the easy inclusion of HTML or XML content without so many &lt; and &gt; escapes. To build a CDATA section, just wrap the string with a CDATA object:

element.addContent(new CDATA("<xml/> content"));

What's terrific about JDOM is that a getText() call returns the string of characters without bothering the caller with whether or not it's represented by a CDATA section.

Dealing with Mixed Content

Some elements contain many things such as whitespace, comments, text, child elements, and more:

<table>

<!-- Some comment -->

Some text

<tr>Some child element</tr>

</table>

When an element contains both text and child elements, it's said to contain "mixed content." Handling mixed content can be potentially difficult, but JDOM makes it easy. The standard-use cases—retrieving text content and navigating child elements—are kept simple:

String text = table.getTextTrim(); // "Some text"

Element tr = table.getChild("tr"); // A straight reference

For more advanced uses needing the comment, whitespace blocks, processing instructions, and entity references, the raw mixed content is available as a List:

List mixedCo = table.getContent();

Iterator itr = mixedCo.iterator();

while (itr.hasNext()) {

Object o = i.next();

if (o instanceof Comment) {

...

}

// Types include Comment, Element, CDATA, DocType,

// ProcessingInstruction, EntityRef, and Text

}

As with child element lists, changes to the raw content list affect the backing document:

// Remove the Comment. It's "1" because "0" is a whitespace block.

mixedCo.remove(1);

If you have sharp eyes, you'll notice that there's a Text class here. Internally, JDOM uses a Text class to store string content in order to allow the string to have parentage and more easily support XPath access. As a programmer, you don't need to worry about the class when retrieving or setting text-only when acce—sing the raw content list.

For details on the DocType, ProcessingInstruction, and EntityRef classes, see the API documentation at jdom.org.

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有