2 bytes GZIP标志字节:0x1f, 0x8b (\037 \213)
1 byte 压缩方法: (0..7 reserved, 8 = deflate)
1 byte 标志位:
bit 0 set: 文件可能是ASCII文本文件
bit 1 set: 附加多个gzip文件部分
bit 2 set: 存在有可选的附加 内容
bit 3 set: 提供了原始的文件名称
bit 4 set: 则提供有一个O-终结的文件内容
bit 5 set: 文件被加密
bit 6,7: 保留
4 bytes 文件更改时间(Unix时间)
1 byte 额外的标志,决定了压缩方法。 2:使用最大的压缩,最慢的算法
1 byte 这个标志指明了进行压缩时系统的类型。
0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
1 - Amiga
2 - VMS (or OpenVMS)
3 - Unix
4 - VM/CMS
5 - Atari TOS
6 - HPFS filesystem (OS/2, NT)
7 - Macintosh
8 - Z-System
9 - CP/M
10 - TOPS-20
11 - NTFS filesystem (NT)
12 - QDOS
13 - Acorn RISCOS
255 - unknown
2 bytes optional part number (second part=1) 可选的序号
2 bytes optional extra field length 可选的附加内容的长度
? bytes optional extra field 可选的附加内容
? bytes optional original file name, zero terminated
? bytes optional file comment, zero terminated
12 bytes optional encryption header
? bytes compressed data
4 bytes crc32 这个是未压缩数据的循环冗余校验值。
4 bytes uncompressed input size modulo 2^32 这是原始数据的长度以2的32次方为模的值。
The format was designed to allow single pass compression without any
backwards seek, and without a priori knowledge of the uncompressed
input size or the available size on the output media. If input does
not come from a regular disk file, the file modification time is set
to the time at which compression started.
The time stamp is useful mainly when one gzip file is transferred over
a network. In this case it would not help to keep ownership
attributes. In the local case, the ownership attributes are preserved
by gzip when compressing/decompressing the file. A time stamp of zero
is ignored.
Bit 0 in the flags is only an optional indication, which can be set by
a small lookahead in the input data. In case of doubt, the flag is
cleared indicating binary data. For systems which have different
file formats for ascii text and binary data, the decompressor can
use the flag to choose the appropriate format.
The extra field, if present, must consist of one or more subfields,
each with the following format:
subfield id : 2 bytes 子字段ID
subfield size : 2 bytes (little-endian format)子字段长度(小端字节序)
subfield data 子字段内容
The subfield id can consist of two letters with some mnemonic value.
Please send any such id to jloup@chorus.fr. Ids with a zero second
byte are reserved for future use. The following ids are defined:
Ap (0x41, 0x70) : Apollo file type information
The subfield size is the size of the subfield data and does not
include the id and the size itself. The field 'extra field length' is
the total size of the extra field, including subfield ids and sizes.
有最后一部分中有CRC32和原始数据的长度。 解压程序应该可以提示输入另外的,存在
It must be possible to detect the end of the compressed data with any
compression format, regardless of the actual size of the compressed
data. If the compressed data cannot fit in one file (in particular for
diskettes), each part starts with a header as described above, but
only the last part has the crc32 and uncompressed size. A decompressor
may prompt for additional data for multipart compressed files. It is
desirable but not mandatory that multiple parts be extractable
independently so that partial data can be recovered if one of the
parts is damaged. This is possible only if no compression state is
kept from one part to the other. The compression-type dependent flags
can indicate this.
If the file being compressed is on a file system with case insensitive
names, the original name field must be forced to lower case. There is
no original file name if the data was compressed from standard input.
Compression is always performed, even if the compressed file is
slightly larger than the original. The worst case expansion is
a few bytes for the gzip file header, plus 5 bytes every 32K block,
or an expansion ratio of 0.015% for large files. Note that the actual
number of used disk blocks almost never increases.
The encryption is that of zip 1.9. For the encryption check, the
last byte of the decoded encryption header must be zero. The time
stamp of an encrypted file might be set to zero to avoid giving a clue
about the construction of the random header.