RFC1952的部分翻译及原文 - 王朝网络宽屏版

以下内容只是RFC1952中的一部分，其余内容请参照原文。

2. Detailed specification

2.1. Overall conventions

下面的图形表示一个字节：

+---+

| | <-- the vertical bars might be missing

+---+

下面的图形表示若干字节：

+==============+

| |

+==============+

计算机中所存贮的字节并不存在“位顺序”，因为字节本身被看作是一个单元。

但是，当一个字节被看作是一个0到255之间的整数时，就会有一些最重要的或是最不重

要的位。通常我们会将一个字节中最重要的位写在左边，将几个字节中，最重要的字节

写在左边。在图表中，我们将一个字节中的各位标上序号：位0表示最不重要的位等等：

Bytes stored within a computer do not have a "bit order", since

they are always treated as a unit. However, a byte considered as

an integer between 0 and 255 does have a most- and least-

significant bit, and since we write numbers with the most-

significant digit on the left, we also write bytes with the most-

significant bit on the left. In the diagrams below, we number the

bits of a byte so that bit 0 is the least-significant bit, i.e.,

the bits are numbered:

+--------+

|76543210|

+--------+

这篇文档不适用于位传输的情况，因为这里所说的数据格式都是以字节为单位的。

This document does not address the issue of the order in which

bits of a byte are transmitted on a bit-sequential medium, since

the data format described here is byte- rather than bit-oriented.

在计算机中，一个数可能占用几个字节。这里所说的多字节数据都是将不重要的

部分存贮在低地址的字节中，如520被保存为：

Within a computer, a number may occupy multiple bytes. All

multi-byte numbers in the format described here are stored with

the least-significant byte first (at the lower memory address).

For example, the decimal number 520 is stored as:

0 1

+--------+--------+

|00001000|00000010|

+--------+--------+

^ ^

| |

| + more significant byte = 2 x 256

+ less significant byte = 8

2.2. File format

gzip文件是由一系列连续的成员(被压缩的数据单元）组成的。每一个成员格式

的说明见后面的章节。这些成员在文件中都是一个接一个的排列的，而没有其它的附加信息。

A gzip file consists of a series of "members" (compressed data

sets). The format of each member is specified in the following

section. The members simply appear one after another in the file,

with no additional information before, between, or after them.

2.3. Member format

成员格式：每个成员都有如下的结构：

Each member has the following structure:

+---+---+---+---+---+---+---+---+---+---+

|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)

+---+---+---+---+---+---+---+---+---+---+

(if FLG.FEXTRA set)

+---+---+=================================+

| XLEN |...XLEN bytes of "extra field"...| (more-->)

+---+---+=================================+

(if FLG.FNAME set)

+=========================================+

|...original file name, zero-terminated...| (more-->)

+=========================================+

(if FLG.FCOMMENT set)

+===================================+

|...file comment, zero-terminated...| (more-->)

+===================================+

(if FLG.FHCRC set)

+---+---+

| CRC16 |

+---+---+

+=======================+

|...compressed blocks...| (more-->)

+=======================+

0 1 2 3 4 5 6 7

+---+---+---+---+---+---+---+---+

| CRC32 | ISIZE |

+---+---+---+---+---+---+---+---+

2.3.1. Member header and trailer

成员的头部及尾部：

ID1 (IDentification 1)

ID2 (IDentification 2)

这两个字节是标识符用来识别gzip文件，有固定值：ID1 = 31,ID2 = 139；

These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139

(0x8b, \213), to identify the file as being in gzip format.

CM (Compression Method)

这个字节标识了文件的压缩方式。CM = 0-7的值是被保留的，CM = 8表示

“deflate”压缩的方式，通常被gzip及使用。

This identifies the compression method used in the file. CM

= 0-7 are reserved. CM = 8 denotes the "deflate"

compression method, which is the one customarily used by

gzip and which is documented elsewhere.

FLG (FLaGs)

这个字节被拆分成单独的位：

This flag byte is divided into individual bits as follows:

bit 0 FTEXT

bit 1 FHCRC

bit 2 FEXTRA

bit 3 FNAME

bit 4 FCOMMENT

bit 5 reserved

bit 6 reserved

bit 7 reserved

如果FTEXT位被设置：则文件可能是ASCII文本文件。这是一个可选的

标识符。压缩程序可以检查很小一部分的输入数据，看看有没有非ASCII码的字符，如

果没有，则可以设置这位。如果存在怀疑，可以清除这位，表示一个二进制文件。对于

有不同文件格式（ASCII及二进制）的系统来说，可以根据FTEXT来选择适当的格式。

我们不指定设置这一位的规则，压缩程序可以始终设置这一位为0，解压程序也会

始终忽略这一位而让其它的程序进行数据转换工作。

If FTEXT is set, the file is probably ASCII text. This is

an optional indication, which the compressor may set by

checking a small amount of the input data to see whether any

non-ASCII characters are present. In case of doubt, FTEXT

is cleared, indicating binary data. For systems which have

different file formats for ascii text and binary data, the

decompressor can use FTEXT to choose the appropriate format.

We deliberately do not specify the algorithm used to set

this bit, since a compressor always has the option of

leaving it cleared and a decompressor always has the option

of ignoring it and letting some other program handle issues

of data conversion.

如果FHCRC位被设置，则gzip的头部中，在被压缩的数据前面，有

CRC16的部分。CRC16中包含有两字节的内容，它们是整个头部内容（不包括CRC16

这两字节）的CRC32中两个不重要的字节。[FHCRC位永远不会被1.2.4版本以上的

gzip所设置，即使它被1.2.4版本定义为不同的含义]

If FHCRC is set, a CRC16 for the gzip header is present,

immediately before the compressed data. The CRC16 consists

of the two least significant bytes of the CRC32 for all

bytes of the gzip header up to and not including the CRC16.

[The FHCRC bit was never set by versions of gzip up to

1.2.4, even though it was documented with a different

meaning in gzip 1.2.4.]

如果FEXTRA位被设置，则存在有可选的附加文件。将在后几节中叙述。

If FEXTRA is set, optional extra fields are present, as

described in a following section.

如果FNAME位设置，则提供了原始的文件名称，由0字节终止。

名称必须由ISO8859-1中所定义的字符所组成。当操作系统使用EBCDIC或其它字符集

生成文件名的时候，文件名必须被转换到ISOLATIN－1字符集中。这个是被压缩的

文件的原始名字，不包括目录部分。如果操作系统对文件名称的大小写字母不敏感，

则将文件名称中的所有的字母强制转换成小写。如果数据不是从一个源始文件压缩而

来的，则不存在原始文件的名称。

If FNAME is set, an original file name is present,

terminated by a zero byte. The name must consist of ISO

8859-1 (LATIN-1) characters; on operating systems using

EBCDIC or any other character set for file names, the name

must be translated to the ISO LATIN-1 character set. This

is the original name of the file being compressed, with any

directory components removed, and, if the file being

compressed is on a file system with case insensitive names,

forced to lower case. There is no original file name if the

data was compressed from a source other than a named file;

for example, if the source was stdin on a Unix system, there

is no file name.

如果设置了FCOMMENT位，则提供有一个O－终结的文件内容。这段内

容不被解释，它只是被用来为人们所用。这部分内容必须包含有ISO8859-1(LATIN-1)

字符。行终结符应该是0x0A。

If FCOMMENT is set, a zero-terminated file comment is

present. This comment is not interpreted; it is only

intended for human consumption. The comment must consist of

ISO 8859-1 (LATIN-1) characters. Line breaks should be

denoted by a single line feed character (10 decimal).

保留的FLG位必须是0。

Reserved FLG bits must be zero.

MTIME (Modification TIME)

MTIME：修改时间。这个部分提供了原始文件在压缩前的最新的修改时间。

时间是Unix格式的，是自从1970年1月1日0时0分0秒开始的秒数。如果被压缩的内容不是

文件，MTIME被设置为压缩的开始时间。

This gives the most recent modification time of the original

file being compressed. The time is in Unix format, i.e.,

seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this

may cause problems for MS-DOS and other systems that use

local rather than Universal time.) If the compressed data

did not come from a file, MTIME is set to the time at which

compression started. MTIME = 0 means no time stamp is

available.

XFL (eXtra FLags)

这个标志会被特殊的压缩方法所用到。“deflate”方法会这样设置：

These flags are available for use by specific compression

methods. The "deflate" method (CM = 8) sets these flags as

follows:

使用最大的压缩，最慢的算法

XFL = 2 - compressor used maximum compression,

slowest algorithm

采用最快的算法

XFL = 4 - compressor used fastest algorithm

OS (Operating System)

这个标志指明了进行压缩时系统的类型。这在用来决定文本文件的行终结

符时十分有用。

This identifies the type of file system on which compression

took place. This may be useful in determining end-of-line

convention for text files. The currently defined values are

as follows:

0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)

1 - Amiga

2 - VMS (or OpenVMS)

3 - Unix

4 - VM/CMS

5 - Atari TOS

6 - HPFS filesystem (OS/2, NT)

7 - Macintosh

8 - Z-System

9 - CP/M

10 - TOPS-20

11 - NTFS filesystem (NT)

12 - QDOS

13 - Acorn RISCOS

255 - unknown

XLEN (eXtra LENgth)

如果FLG。FEXTRA被设置了，这两个字节是可选的额外的内容的长度。

If FLG.FEXTRA is set, this gives the length of the optional

extra field. See below for details.

CRC32 (CRC-32)

这个是未压缩数据的循环冗余校验值。

This contains a Cyclic Redundancy Check value of the

uncompressed data computed according to CRC-32 algorithm

used in the ISO 3309 standard and in section 8.1.1.6.2 of

ITU-T recommendation V.42. (See http://www.iso.ch for

ordering ISO documents. See gopher://info.itu.ch for an

online version of ITU-T V.42.)

ISIZE (Input SIZE)

这是原始数据的长度以2的32次方为模的值。

This contains the size of the original (uncompressed) input

data modulo 2^32.

2.3.1.1. Extra field

如果设置了FLG.FEXTRA位，则头部中存在有这部分的内容，总长度是

XLEN字节。它包含了一系列子域：

If the FLG.FEXTRA bit is set, an "extra field" is present in

the header, with total length XLEN bytes. It consists of a

series of subfields, each of the form:

+---+---+---+---+==================================+

|SI1|SI2| LEN |... LEN bytes of subfield data ...|

+---+---+---+---+==================================+

SI1和SI2提供了子域的ID，表示为两个可以记忆的ASCII字符。SI2＝0

的值是为将来的使用而保留的。如下的ID是目前定义的：

SI1 and SI2 provide a subfield ID, typically two ASCII letters

with some mnemonic value. Jean-Loup Gailly

<gzip@prep.ai.mit.edu> is maintaining a registry of subfield

IDs; please send him any subfield ID you wish to use. Subfield

IDs with SI2 = 0 are reserved for future use. The following

IDs are currently defined:

SI1 SI2 Data

---------- ---------- ----

0x41 ('A') 0x70 ('P') Apollo file type information

LEN字段给出了子域的长度，包括最初的四个字节。

LEN gives the length of the subfield data, excluding the 4

initial bytes.

2.3.1.2. Compliance

一个压缩程序所产生的文件应该有正确的ID1，ID2，CM，CRC32，

和ISIZE。但是可以将所有其它存在于可变长度的部分的字段设置为默认值（255或

0）。必须设置所有有保留值为0；

A compliant compressor must produce files with correct ID1,

ID2, CM, CRC32, and ISIZE, but may set all the other fields in

the fixed-length part of the header to default values (255 for

OS, 0 for all others). The compressor must set all reserved

bits to zero.

解压程序必须检查ID1，ID2，CM，D而且，当这些值存在错误时，要

提供错误提示。必须要检查：FEXTRA/XLEN, FNAME, FCOMMENT 和 FHCRC至少这样

可以跳过可选字段。不需要检查其它的头部和尾部中的字段。特别是解压程序可以忽略

FTEXT和OS而总是产生二进制的输。如果保留位非0，要给出错误提示，因为这一

位可能指出了一个新字段的存在，而这又可能导致对后面数据的错误解释。

A compliant decompressor must check ID1, ID2, and CM, and

provide an error indication if any of these have incorrect

values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC

at least so it can skip over the optional fields if they are

present. It need not examine any other part of the header or

trailer; in particular, a decompressor may ignore FTEXT and OS

and always produce binary output, and still be compliant. A

compliant decompressor must give an error indication if any

reserved bit is non-zero, since such a bit could indicate the

presence of a new field that would cause subsequent data to be

interpreted incorrectly.