Rich Text Format (RTF) 中文版规范,版本 1.6
Rich Text Format (RTF) 中文版规范,版本 1.6
介绍 RTF 语法 RTF 查看器约定 形式语法 RTF 文件内容 头 RTF 版本 字符集 Unicode RTF 字体表 文件表 颜色表 样式表 列表符号表 跟踪改变 (修订标记) 文档区域 信息组 文档格式属性 章节文本 段落文本 字符文本 文档变量 书签 图片 对象 绘图对象 Word 97-2000 RTF for Drawing Objects (Shapes) 脚注 注释 Fields Form Fields 索引项 目录 双向语言支持亚洲语言支持 附录 A: Sample RTF Reader Application How to Write an RTF Reader A Sample RTF Reader Implementation Notes on Implementing Other RTF Features Other Problem Areas in RTF 附录 B: Index of RTF Control Words 附录 C: Control Words Introduced by Other Microsoft Products Pocket Word Exchange (Used in RTF<->HTML Conversions) 介绍
The Rich Text Format (RTF) 规范是一种在应用程序间转换格式化文本和图形简易编码方法。当前,用户依赖特殊的转换软件在MS-DOS™,Microsoft™ Windows™,OS/2,Macintosh™和Power Macintosh™的应用程序之间来转换字处理文档。
The Rich Text Format (RTF) Specification is a method of encoding formatted text and graphics for easy transfer between applications. Currently, users depend on special translation software to move word-processing documents between different MS-DOS™, Microsoft™ Windows™, OS/2, Macintosh™, and Power Macintosh™ applications.
RTF规范提供一种交互的文本和图形格式,可以在不同输出设备,操作环境和操作系统上使用。RTF使用美国国家标准化组织(ANSI),PC-8,Macintosh,或IBM PC的字符集来控制文档的外观和格式,不管文档是显示在屏幕上还是从打印机打印出来。由于RTF规范,不同操作系统、不同软件创建的文档可以在其他的操作系统和程序中被识别出来。Macintoch和Power Macintosh版的Word 6.0(及以后版本)创建的RTF文件提供一种文件类型——“RTF”。
The RTF Specification provides a format for text and graphics interchange that can be used with different output devices, operating environments, and operating systems. RTF uses the American National Standards Institute (ANSI), PC-8, Macintosh, or IBM PC character set to control the representation and formatting of a document, both on the screen and in print. With the RTF Specification, documents created under different operating systems and with different software applications can be transferred between those operating systems and applications. RTF files created in Word 6.0 (and later) for the Macintosh and Power Macintosh have a file type of "RTF."
将一个格式化的文件转换为RTF文件的软件成为编辑器。一个RTF编辑器分离现有文本中原程序的控制信息,并且生成一个包含着原文本和RTF组的新文件。将一个RTF文件转换成一个格式化文件的软件称为查看器。
Software that takes a formatted file and turns it into an RTF file is called a writer. An RTF writer separates the application's control information from the actual text and writes a new file containing the text and the RTF groups associated with that text. Software that translates an RTF file into a formatted file is called a reader.
这里提供了一个RTF查看器的例子(阅读本文档的附录A:RTF查看器程序示例)。它应用此规范而设计出来。希望它对那些有兴趣开发自己RTF查看器的开发者有所帮助。附录A介绍了这个程序的结构和用法。这个RTF查看器不是一个商业产品,Microsoft公司不对RTF查看器代码和RTF规范提供技术及其他形式的支持。关于如何从Microsoft下载中心下载此示例的更多信息,请访问以下Web地址:www.microsoft.com/downloads/search.asp 然后搜索“RTF Reader”。
A sample RTF reader application is available (see the Appendix A: Sample RTF Reader Application section of this document). It is designed for use with the specification to assist those interested in developing their own RTF readers. This application and its use are described in Appendix A. The sample RTF reader is not a for-sale product, and Microsoft does not provide technical or any other type of support for the sample RTF reader code or the RTF specification. For more information on how to download the sample RTF reader from the Microsoft Download Center, please visit the following Web address: www.microsoft.com/downloads/search.asp and then search on "RTF Reader."
RTF版本1.6包含所有在Microsoft Word for Windows 95 v7.0, Word 97 for Windows, Word 98 for the Macintosh, and Word 2000 for Windows, 也包括微软其它产品中介绍的新的控制字。
RTF Version 1.6 includes all new control words introduced by Microsoft Word for Windows 95 version 7.0, Word 97 for Windows, Word 98 for the Macintosh, and Word 2000 for Windows, as well as other Microsoft products.
RTF语法
一个RTF文件由未格式化文本、控制字、控制符号和组组成。为了更容易的转换,一个标准的RTF文件应该仅包含7位ASCII码字符。RTF文件没有限制文件的行的最大长度。(再议:maximun line length是指行的字符数还是指文档的行数?)
An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.
控制字是一种特殊的RTF用来标记打印机控制符的格式化命令,也是程序用来管理文档样式的格式化信息。(再议:措辞不好。)一个控制字不能超过32个字符。一个控制字类似以下形式:
\LetterSequence<Delimiter>
注意:每个控制字是以反斜杠开始的。
LetterSequence由小写字母字符(a-z)组成。RTF是大小写敏感的。
A control word is a specially formatted command that RTF uses to mark printer control codes and information that applications use to manage documents. A control word cannot be longer than 32 characters. A control word takes the following form:
\LetterSequence<Delimiter>
The LetterSequence is made up of lowercase alphabetic characters (a-z). RTF is case sensitive.
以下Word 97-2000关键字并不遵守以上所说的关键字不允许包含任何的大写字母的要求。所有编辑器应该仍然遵守这个规定,而Word的下一个版本也将是关键字完全使用小写字母的版本。同时,建议那些查看器将以下关键字作为例外: \clFitText \clftsWidthN \clNoWrap \clwWidthN \tdfrmtxtBottomN \tdfrmtxtLeftN \tdfrmtxtRightN \tdfrmtxtTopN \trftsWidthAN \trftsWidthBN \trftsWidthN \trwWidthAN \trwWidthBN \trwWidthN \sectspecifygenN
The following Word 97-2000 keywords do not currently follow the requirement that keywords may not contain any uppercase alphabetic characters. All writers should still follow this rule, and Word will also emit completely lowercase versions of all these keywords in the next version. In the meantime, those implementing readers are advised to treat them as exceptions:
\clFitText \clftsWidthN \clNoWrap \clwWidthN \tdfrmtxtBottomN \tdfrmtxtLeftN \tdfrmtxtRightN \tdfrmtxtTopN \trftsWidthAN \trftsWidthBN \trftsWidthN \trwWidthAN \trwWidthBN \trwWidthN \sectspecifygenN 一个RTF控制字的结束由分隔符标记,以下字符可以作为分隔符:
一个空格。在这种情况下,空格作为关键字的一部分。 一个数字或连字符(-), 意味着它是一个数字参数。这数字序列的长度由其后的一个空格或除了字母和数字的其他字符划定。这个参数可以是正数或者负数,它的取值范围通常是从-32767到32767。然而,Word的取值范围可以到达由-31680到31680。Word 允许关键字的小数字参数取值范围在-2,147,483,648到2,147,483,648(特别的,\bin, \revdttm,和一些图像属性)。(再议:a small number of keywords不知所指,应该指这些二进制文件吧。)一个RTF解析器应该能够将一个随意写出的数字字符串转换为一个关键字的合法值。如果一个数值参数紧跟着控制字,这个参数就是控制字的一部分。这时,控制字通过一个空格或非字母数字字符分隔出来,和分隔其他控制字的方式相同。 除了字母和数字的其他字符。这种情况下,此分隔字符结束控制字,而它并不属于控制字的一部分。The delimiter marks the end of an RTF control word, and can be one of the following:
A space. In this case, the space is part of the control word. A digit or a hyphen (-), which indicates that a numeric parameter follows. The subsequent digital sequence is then delimited by a space or any character other than a letter or a digit. The parameter can be a positive or a negative number. The range of the values for the number is generally –32767 through 32767. However, Word tends to restrict the range to –31680 through 31680. Word allows values in the range -2,147,483,648 to 2,147,483,648 for a small number of keywords (specifically \bin, \revdttm, and some picture properties). An RTF parser must handle an arbitrary string of digits as a legal value for a keyword. If a numeric parameter immediately follows the control word, this parameter becomes part of the control word. The control word is then delimited by a space or a nonalphabetic or nonnumeric character in the same manner as any other control word. Any character other than a letter or a digit. In this case, the delimiting character terminates the control word but is not actually part of the control word.
如果是第一种情况,空格并不会出现在文档中。分隔符之后的所有字符,包括空格,将被写入文档。基于这个理由,你应该尽在需要的情况下使用空格,不要只是将空格用在分隔RTF代码。
If a space delimits the control word, the space does not appear in the document. Any characters following the delimiter, including spaces, will appear in the document. For this reason, you should use spaces only where necessary; do not use spaces merely to break up RTF code.
RTF 文件内容
一个 RTF 文件符合以下语法:
<File>'{' <header> <document> '}'
本语法是标准的RTF语法,任何RTF查看器都应该可以正确的解释以此语法格式写出的RTF文件。有必要重申的是:RTF查看器没有必要包含所有的控制字,但它必须能够无害的忽略它不知道(或者未使用)的控制字,并且必须能正确的略过被控制字符号标记的部分。然而,生成RTF的编辑器有可能并没完全符合这个语法规范,同样地,RTF查看器应该有足够能力去处理一些细微变更的控制字。虽然如此,如果一个生成RTF的编辑器符合本规范,那么任何一个正确的RTF查看器都应该能够完美的解释它。
An RTF file has the following syntax:
<File>'{' <header> <document> '}'
This syntax is the standard RTF syntax; any RTF reader must be able to correctly interpret RTF written to this syntax. It is worth mentioning again that RTF readers do not have to use all control words, but they must be able to harmlessly ignore unknown (or unused) control words, and they must correctly skip over destinations marked with the \control* symbol. There may, however, be RTF writers that generate RTF that does not conform to this syntax, and as such, RTF readers should be robust enough to handle some minor variations. Nonetheless, if an RTF writer generates RTF conforming to this specification, then any correct RTF reader should be able to interpret it.