本文是XML Europe 2002会议上的一次tutorial的记录。详细讲述了各种用来定义xml的结构应该是这样的schema语言的特点以及用处。我将文章译成了三部分,这是第一部分,讲述基于规则的schema如何规范XML。
1. 简介
What is a XML schema language?
什么是 XML schema 语言?
I will insist more on this point during my comparison of XML schema languages on Wednesday morning, but one thing is sure: a XML schema language is probably not what you're expecting, and its main feature is not (or not always) to describe a class of XML documents but rather to act as a filter or firewall to protect applications from the wide diversity of well formed XML documents.
我将在星期三早上对各种 XML schema 语言进行比较的时候会强调更多事情,但是一件事情是肯定的:一种 XML schema 语言可能不是你所期望的,但是它的主要特性不是(或者不总是)去描述一类 XML 文档而是作为一个过滤器或者防火墙来把程序从各种各样合式的 XML 文档保护起来。
All over this tutorial we will use the following example:
整个教程我将使用这个例子:
<?xml version="1.0"?>
<library>
<book id="_0836217462">
<isbn>
0836217462
</isbn>
<title>
Being a Dog Is a Full-Time Job
</title>
<author-ref id="Charles-M.-Schulz"/>
<character-ref id="Peppermint-Patty"/>
<character-ref id="Snoopy"/>
<character-ref id="Schroeder"/>
<character-ref id="Lucy"/>
</book>
<book id="_0805033106">
<isbn>
0805033106
</isbn>
<title>
Peanuts Every Sunday
</title>
<author-ref id="Charles-M.-Schulz"/>
<character-ref id="Sally-Brown"/>
<character-ref id="Snoopy"/>
<character-ref id="Linus"/>
<character-ref id="Snoopy"/>
</book>
<author id="Charles-M.-Schulz">
<name>
Charles M. Schulz
</name>
<nickName>
SPARKY
</nickName>
<born>
1992-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="Peppermint-Patty">
<name>
Peppermint Patty
</name>
<since>
1966-08-22
</since>
<qualification>
bold, brash and tomboyish
</qualification>
</character>
<character id="Snoopy">
<name>
Snoopy
</name>
<since>
1950-10-04
</since>
<qualification>
extroverted beagle
</qualification>
</character>
<character id="Schroeder">
<name>
Schroeder
</name>
<since>
1951-05-30
</since>
<qualification>
brought classical music to the Peanuts strip
</qualification>
</character>
<character id="Lucy">
<name>
Lucy
</name>
<since>
1952-03-03
</since>
<qualification>
bossy, crabby and selfish
</qualification>
</character>
<character id="Sally-Brown">
<name>
Sally Brown
</name>
<since>
1960-08-22
</since>
<qualification>
always looks for the easy way out
</qualification>
</character>
<character id="Linus">
<name>
Linus
</name>
<since>
1952-09-19
</since>
<qualification>
the intellectual of the gang
</qualification>
</character>
</library>
An application managing the library described by this document, or even a XSLT stylesheet designed to display it would probably be very confused if the name or content of the elements are not what they expect and a the main feature of a XML schema language is to provide a formal way to describe what is expected to protect the applications from these risks of errors.
一个管理本文档描述的库的程序,或者甚至是一个 XSLT 样式表被设计出来显示它。如果元素的名字或者内容不是它所期望的,它可能被完全地困住了。而 XML schema 语言的一个主要作用就是提供一种正规的办法来描述所期望的是什么并保护程序免于发生错误的风险。
2. 基于规则的语言 (XSLT & Schematron)
The most basic way to implement this firewall is to give a set of rules which need to followed by the instance documents.
最基本的实现这种防火墙的办法是给出一组需要附带一些例子文档的规则。
This is the approach followed by rule based XML schema languages which main representative is Schematron. Before presenting Schematron itself, we will have a look on how XSLT may be used as a XML schema language since this is a good exercise to understand the basics of those schema languages.
这就是基于规则的 XML schema 语言所采用的办法,其中的代表者就是 Schematron。在介绍 Schematron 之前,我们将先看看 XSLT 可以怎么用作 XML schema 语言,因为这对于理解这些 schema 语言的基本知识来说是一个很好的锻炼。
2.1. XSLT 用作基于规则的 XML schema 语言
We can use "classical" programming languages to write a rule based XML schema either general purpose using a XML API or XML specific such as XSLT or XQuery.
我们可以使用 "传统的" 编程语言来编写基于规则的 XML schema 也能通用地使用 XML API 或者特殊的 XML 例如 XSLT 或者XQuery。
To illustrate this point, let's take the following very simple snippet of our example:
为了阐明这个观点,让我们来看看下面这个我们例子中非常简单的代码片断:
<?xml version="1.0"?>
<library>
<book id="_0836217462"/>
<book id="_0805033106"/>
</library>
Why so simple? Because we will see that even if it is true that we can use XSLT as a rule based XML schema language, this is quite verbose and I don't want spend all the time allocated to this tutorial to develop our schema!
为什么如此简单?因为即使XSLT可以用作基于规则的 XML schema 语言,那也是非常冗长的我不想把所有分配给这个教程的时间用来开发我们的 schema!
To write this schema, we have basically two options which are the same than we have when we configure a firewall: the closed one where all what is not allowed is forbidden and the open one where all what is not forbidden is allowed and we will implement both schemas.
编写这个 schema,我基本上有两个选择来配置防火墙:封闭的,所有不允许的都被禁止;以及开放的,所有不禁止的都被允许。我们将两种 schema 都实现一下。
The first conclusion from this simple example is that XML applications tend to forbid much more than they allow: closed schemas are often easier to write than open schemas.
从这个简单的例子中得出的第一个推论就是 XML 程序倾向于禁止比它们允许的更多东西:封闭的 schema 比开放的 schema 一般更容易写一些。
On the other hand, it's easier to define user friendly error messages in a open schema since the context in which something is forbidden is always determined.
另一方面,在开放 schema 中定义用户友好的错误信息更加容易,因为被禁止的东西的上下文总是确定的。
2.1.1. 开放的 XSLT schema
To implement an open schema with XSLT, we will start defining a default template which will accept anything:
为了用 XSLT 实现开放的 schema,我们将从定义个一个缺省的允许所有东西的模板开始:
<xsl:template match="*|@*|text()">
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
With this single template, our "schema" would accept any well formed XML document and never raise any error and we need to add templates to define what's forbidden.
有了这个简单的模板,我们的 "schema" 将接受任何合式的 XML 文档,不会抛出任何错误并且我们需要添加模板来定义什么是被禁止的。
Like with the design of any XSLT transformation, we have the choice to implement the tests as conditions in the "match" attribute of templates or within the templates using if or choose statements. When we are using if or choose statements, we have also the choice of the location where we will will do the test.
像任何一个 XSLT 转换的设计一样,我们可以选择把这个测试实现为在模板的 "match" 属性之中或者在模板之中使用 if 或者 choose 语句。当我们使用 if 或者 choose 语句的时候,我们还可以决定哪儿进行这样的测试。
To check that the document element is "library", we can for instance:
为了检查文档元素是 "library",我可以这样:
Now that we've set up the background, we can generalize it and a pretty much complete "schema" including a test for unicity of the identifiers could be:
现在我们台子搭起来了,归纳一下,一个更完善的包括测试标识符单一性的 "schema" 可能是这样的:
Note that we have left a degree of opening and that arbitrary element and text nodes can be added to the book element.
注意,我们留下了一定程度的扩展机会,任意元素和文本都能被添加到 book 元素之中。
2.1.1.1. 在 match 表达式中测试
We can write a template to allow library as document element:
我们可以编写一个模板来允许 library 作为文档元素:
<xsl:template match="/library">
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
But we also need to forbid other document elements:
但是我们还需要禁止其他的文档元素:
<xsl:template match="/*">
<xsl:message terminate="no">
The document element should be "library".
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
Or, alternatively, we can rely on the default template and replace both templates by a slightly more complex match expression:
或者,我们还能依赖缺省模板并且把两个模板换成一个 match 表达式稍微复杂一些的模板:
<xsl:template match="/*[not(self::library)]">
<xsl:message terminate="no">
The document element should be "library".
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
2.1.1.2. 在模板中测试
We can also perform the test in a template for the root of the document:
我们还能在模板中执行对文档的根的测试:
<xsl:template match="/">
<xsl:if test="not(library)">
<xsl:message terminate="no">
The document element should be "library".
</xsl:message>
</xsl:if>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
or do the same test in a template for document element:
或者在模板中对模板元素进行相同测试:
<xsl:template match="/*">
<xsl:if test="not(self::library)">
<xsl:message terminate="no">
The document element should be "library".
</xsl:message>
</xsl:if>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
2.1.1.3. 完整的 XSLT 实现的开放 schema <?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:template match="*|@*|text()">
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="/*[not(self::library)]">
<xsl:message terminate="no">
<xsl:text>The document element should be "library", not "</xsl:text>
<xsl:value-of select="name()"/>
<xsl:text>"!</xsl:text>
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="/*/*[not(self::book)]">
<xsl:message terminate="no">
<xsl:text>The children elements of library should be "book", not "</xsl:text>
<xsl:value-of select="name()"/>
<xsl:text>"!</xsl:text>
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="library/@*">
<xsl:message terminate="no">
<xsl:text>The "library" element should have no attribute, </xsl:text>
<xsl:value-of select="name()"/>
<xsl:text> shouldn't appear!</xsl:text>
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="library/text()[normalize-space()]">
<xsl:message terminate="no">
<xsl:text>The "library" element should have no text, "</xsl:text>
<xsl:value-of select="normalize-space()"/>
<xsl:text>" shouldn't appear!</xsl:text>
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="book/@*">
<xsl:message terminate="no">
<xsl:text>The "book" element should have no other attribute than "id", </xsl:text>
<xsl:value-of select="name()"/>
<xsl:text> shouldn't appear!</xsl:text>
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="book/@id">
<xsl:if test=". = ../preceding-sibling::book/@id">
<xsl:message terminate="no">
<xsl:text>The "book" id should be unique, </xsl:text>
<xsl:value-of select="."/>
<xsl:text> is duplicated.</xsl:text>
</xsl:message>
</xsl:if>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
</xsl:stylesheet>
2.1.2. 封闭的 XSLT 实现的 schema
A closed schema is the other way round and will define defaults templates which are forbidding everything (except eventually "empty" text nodes):
封闭的 schema 是反其道而行之,定义一个禁止所有东西的缺省模板(除了最终的 "空" 文本):
<xsl:template match="*">
<xsl:message terminate="no">
<xsl:text>Forbidden element:</xsl:text>
<xsl:value-of select="name()"/>
</xsl:message>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="@*">
<xsl:message terminate="no">
<xsl:text>Forbidden attribute:</xsl:text>
<xsl:value-of select="name()"/>
</xsl:message>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="text()[normalize-space()]">
<xsl:message terminate="no">
<xsl:text>Forbidden text:</xsl:text>
<xsl:value-of select="."/>
</xsl:message>
</xsl:template>
and then define everything which is allowed, ie in fact very few things:
然后定义出所有允许的东西,事实上也就是一点东西:
<xsl:template match="/library">
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="library/book">
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
<xsl:template match="book/@id[not (.=../preceding-sibling::book/@id)]">
<xsl:apply-templates select="*|@*|text()"/>
</xsl:template>
2.2. Schematron
Technically speaking, Schematron is a concise formalization of one of the examples which we have seen and generates a XSLT transformation which is an open schema (everything which has not been forbidden is allowed) with tests inside the templates.
技术上讲,Schematron 是对我们看过的例子中的一个精确的形式化,并且产生的是一个 XSLT 转换,它在模板中进行开放 schema (所有没有被禁止的东西都被允许)测试。
That being said, XSLT is totally hidden from the Schematron user who needs to know the Schematron syntax and XPath which is used to express the rules.
也就是,XSLT 对 Schematron 的用户完全的隐藏起来了。但是他们需要知道 Schematron 的语法以及用来表达这些规则的 XPath。
A Schematron schema is composed of a set of patterns each pattern including one or more rules and each rule being composed of asserts and reports, however, to present the syntax used by Schematron, we'll take it "bottom/up" and start with asserts and reports before seeing how they are associated into rules, patterns and schemas.
Schematron schema 由一套 pattern 组成。每个 pattern 包括一个或者多个规则,而且每个规则都由 assert 和 report 组成。然而,为了展示 Schematron 使用的语法,我们将反过来,在看它们如何关联到规则,pattern 以及 schema 之中之前,从 assert 和 report 入手。
2.2.1. assert(s) 和 report(s)
The "assert" and "report" elements are where the rules are defined in a Schematron schema. Both carry a "test" attribute which is an XPath expression they differ in a couple of ways:
"assert" 和 "report" 元素是在 Schematron schema 中定义规则的地方。都带一个 XPath 表达式的 "test" 属性,它们在几个方面不同:
They are the opposite one of each other: the test must be true to pass in an "assert" element and false to pass in a "report" element.
The original purpose of assert is more for "fatal errors" and report more for things that should just be "reported", but this distinction is not relevant any longer in the latest version (1.5).
他们互相是对方的反面:要通过,在 "assert" 元素中 test 必须是真,而在 "report" 元素中必须是假。
assert 起初的目的更多是用于 "致命的错误" 而 report 更多是用于那些仅仅需要 "通报" 的东西,但是这种差别在最新的版本 (1.5) 中不存在了。
There are some goodies which we will not cover in this tutorial, but the basic syntax is:
在本教程中有许多好东西是讲不到的,但是基本的语法是这样的:
<sch:assert test="library">
The document element should be 'library'.
</sch:assert>
Which raises an error with the corresponding message if there is no "library" element under the context node, or:
如果在上下文节点中没有 "library" 元素,它抛出一个错误和对应的信息,或者:
<sch:report test="@*">
The library element should not contain attributes.
</sch:report>
Which raises an error if there is any attribute under the context node.
如果在上下文节点中有任何属性,它抛出一个错误。
In both cases, the context node is set by the "rule" parent element of the report or assert node.
在两种情况中,上下文节点都是由 report 或者 assert 节点的 "rule" 父元素设置的。
2.2.2. rule(s)
Schematron rule elements are roughly equivalent to XSLT templates and are used to define the context under which a set of assert and report elements will be performed.
Schematron rule 元素大体上和 XSLT 模板相当并且被用来定义 assert 和report 元素将施加于的上下文。
An example of rule (without bells and whistles) performing the tests done in our open schema on the book element could be:
一个在我们的开放 schema 中对 book 元素执行测试的规则(没有所有的花哨的东西)的例子可能是:
<sch:rule context="book">
<sch:report test="@*[name() != 'id']">
The book element should not include any attribute other than "id".
</sch:report>
<sch:assert test="@*[namespace-uri() = '']">
The book element should not include any attribute other than "id" (namespace).
</sch:assert>
<sch:report test="@id = preceding-sibling::book/@id">
The book id should be unique.
</sch:report>
</sch:rule>
Some notes about rules:
一些关于规则的注意事项是:
The context of a rule cannot be set on an attribute.
The order in which the tests (report and assert elements) are performed is not guaranteed and the sequence of tests stops after the first failure.
规则的上下文不能设置为属性。
测试(report 和 assert 元素)执行的顺序是不被保证的,而且一系列测试在第一个错误发生的时候终止。.
2.2.3. pattern(s)
Pattern elements are sets of rules which are evaluated independently (technically using different modes in the XSLT stylesheet generated out of the Schematron schema).
Pattern 元素是一批独立执行的规则(技术上讲,在 Schematron schema 之外是使用 XSLT 样式表中的不同 mode 来保证的)。
An example of pattern roughly equivalent to our open XSLT schema could be:
一个和我们的开放 XSLT schema 相当的例子可能是这样:
<sch:pattern>
<sch:rule context="/">
<sch:assert test="library">
The document element should be 'library'.
</sch:assert>
</sch:rule>
<sch:rule context="library">
<sch:report test="*[not(self::book)]">
The library element should contain only book elements.
</sch:report>
<sch:report test="@*">
The library element should not contain attributes.
</sch:report>
<sch:report test="text()[normalize-space()]">
The library element should not contain attributes.
</sch:report>
</sch:rule>
<sch:rule context="book">
<sch:report test="@*[name() != 'id']">
The book element should not include any attribute other than "id".
</sch:report>
<sch:assert test="@*[namespace-uri() = '']">
The book element should not include any attribute other than "id" (namespace).
</sch:assert>
<sch:report test="@id = preceding-sibling::book/@id">
The book id should be unique.
</sch:report>
</sch:rule>
</sch:pattern>
One of the differences with what we had implemented is that Schematron will stop the evaluation of a pattern after the first error found (following the order of the source tree) and if we wanted to be potentially able to raise several errors, we would have to spread our rules within several different patterns.
与我们已经实现的不同之处的一个是 Schematron 将在第一个错误发现的时候停止那个 pattern 的执行(按照源代码树的顺序)而且如果你想要能抛出好几个错误,我们需要把规则分散到好几个不同的 pattern 之中去。
2.2.4. schema
Finally, the schema element is the document element of a Schematron schema and basically include a title and one or more patterns. To implement our rules within separated patterns, we could write:
最后,schema 元素是 Schematron schema 的文档元素而且主要包括一个标题以及一个或者多个 pattern。要把我们的 rule 实现于好几个 pattern 之中,我们可以这么写:
<?xml version="1.0" encoding="utf-8"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:title>Example Schematron Schema</sch:title>
<sch:pattern>
<sch:rule context="/">
<sch:assert test="library">
The document element should be 'library'.
</sch:assert>
</sch:rule>
</sch:pattern>
<sch:pattern>
<sch:rule context="library">
<sch:report test="*[not(self::book)]">
The library element should contain only book elements.
</sch:report>
<sch:report test="@*">
The library element should not contain attributes.
</sch:report>
<sch:report test="text()[normalize-space()]">
The library element should not contain attributes.
</sch:report>
</sch:rule>
</sch:pattern>
<sch:pattern>
<sch:rule context="book">
<sch:report test="@*[name() != 'id']">
The book element should not include any attribute other than "id".
</sch:report>
<sch:assert test="@*[namespace-uri() = '']">
The book element should not include any attribute other than "id" (namespace).
</sch:assert>
<sch:report test="@id = preceding-sibling::book/@id">
The book id should be unique.
</sch:report>
</sch:rule>
</sch:pattern>
</sch:schema>
2.2.5. 合起来
Solving the same problem with Schematron and XSLT shows the nature of Schematron which is a subset of XSLT tailored to XML validation through open rule based schemas.
用 Schematron 和 XSLT 解决同一个问题揭示了 Schematron 的本质,即它是剪裁用于 XML 验证的实现基于开放规则 schema 的 XSLT 子集 。
Why do I insist that much on the openness of Schematron schemas?
为什么我强调 Schematron schema 的开放性?
Because the default behavior of Schematron is to be open, but it is still possible to write closed (or semi closed schemas) with Schematron even though it isn't a common practice.
因为 Schematron 的缺省行为是开放的,但是也可以用 Schematron 编写封闭的(或者是半封闭的)schema,虽然这不是一个常见的行为。
The main trick when doing so is to note that the rules within a Schematron pattern are evaluated in lexical order instead of following the rules of priorities as defined by XSLT. The default rules which will forbid content not described in any rule need therefore to be located after all the other rules, such as in:
要那样做的主要技巧是记住 Schematron pattern 中的规则是以词法顺序执行而不是由 XSLT 定义的优先级来决定的。缺省禁止没有在任何规则中描述的内容的规则因而需要放在所有其他规则之后,像这样:
<?xml version="1.0" encoding="utf-8"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:title>Example Schematron Schema</sch:title>
<sch:pattern>
<sch:rule context="/library">
<sch:report test="@*">
The library element should not contain attributes.
</sch:report>
</sch:rule>
<sch:rule context="library/book">
<sch:report test="@*[name() != 'id']">
The book element should not include any attribute other than "id".
</sch:report>
<sch:assert test="@*[namespace-uri() = '']">
The book element should not include any attribute other than "id" (namespace).
</sch:assert>
<sch:report test="@id = preceding-sibling::book/@id">
The book id should be unique.
</sch:report>
</sch:rule>
<sch:rule context="*">
<sch:report test="1">
Element "<sch:name/>" forbidden under "<sch:name path=".."/>".
</sch:report>
</sch:rule>
<sch:rule context="text()[normalize-space()]">
<sch:report test="1">
Text forbidden in "<sch:name path=".."/>" element.
</sch:report>
</sch:rule>
</sch:pattern>
</sch:schema>
Note also the usage of the "name" element and the fact that rules can't be defined for attributes.
还要注意 "name" 元素的用法以及不能给属性定义规则。