分享
 
 
 

Graphics File Formats FAQ

王朝other·作者佚名  2006-01-08
窄屏简体版  字體: |||超大  

Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade

==================================================================

0. What's the best way to read a file header?

1. What's this business about endianness?

2. How can I determine the byte-order of a system at run-time?

3. How can I identify the format of a graphics file?

4. What are the format identifiers of some popular file formats?

--------------------------------------------------------------------Subject: 0. What's the best way to read a file header?

You wouldn't think there's a lot of mystery about reading a few bytes from

a disk file, eh? Programmer's, however, are constantly loosing time

because they don't consider a few problems that may occur and cause them

to loose time. Consider the following code:

typedef struct _Header

{

BYTE Id;

WORD Height;

WORD Width;

BYTE Colors;

} HEADER;

HEADER Header;

void ReadHeader(FILE *fp)

{

if (fp != (FILE *)NULL)

fread(&Header, sizeof(HEADER), 1, fp);

}

Looks good, right? The fread() will read the next sizeof(HEADER) bytes from

a valid FILE pointer into the Header data structure. So what could go

wrong?

The problem often encountered with this method is one of element alignment

within structures. Compilers may pad structures with "invisible" elements

to allow each "visible" element to align on a 2- or 4-byte address

boundary. This is done for efficiency in accessing the element while in

memory. Padding may also be added to the end of the structure to bring

it's total length to an even number of bytes. This is done so the data

following the structure in memory will also align on a proper address

boundary.

If the above code is compiled with no (or 1-byte) structure alignment the

code will operate as expected. With 2-byte alignment an extra two bytes

would be added to the HEADER structure in memory and make it appear as

such:

typedef struct _Header

{

BYTE Id;

BYTE Pad1; // Added padding

WORD Height;

WORD Width;

BYTE Colors;

BYTE Pad2; // Added padding

} HEADER;

As you can see the fread() will store the correct value in Id, but the

first byte of Height will be stored in the padding byte. This will throw

off the correct storage of data in the remaining part of the structure

causing the values to be garbage.

A compiler using 4-byte alignment would change the HEADER in memory as such:

typedef struct _Header

{

BYTE Id;

BYTE Pad1; // Added padding

BYTE Pad2; // Added padding

BYTE Pad3; // Added padding

WORD Height;

WORD Width;

BYTE Colors;

BYTE Pad4; // Added padding

BYTE Pad5; // Added padding

BYTE Pad6; // Added padding

} HEADER;

What started off as a 6-byte header increased to 8 and 12 bytes thanks to

alignment. But what can you do? All the documentation and makefiles you

write will not prevent someone from compiling with the wrong options flag

and then pulling their (or your) hair out when your software appears not

to work correctly.

Now considering this alternative to the ReadHeader() function:

HEADER Header;

void ReadHeader(FILE *fp)

{

if (fp != (FILE *)NULL)

{

fread(&Header.Id, sizeof(Header.Id), 1, fp);

fread(&Header.Height, sizeof(Header.Height), 1, fp);

fread(&Header.Width, sizeof(Header.Width), 1, fp);

fread(&Header.Colors, sizeof(Header.Colors), 1, fp);

}

}

What both you and your compiler now see is a lot more code. Rather than

reading the entire structure in one, elegant shot, you read in each

element separately using multiple calls to fread(). The trade-off here is

increased code size for not caring what the structure alignment option of

the compiler is set to. These cases are also true for writing structures

to files using fwrite(). Write only the data and not the padding please.

But is there still anything we've yet over looked? Will fread() (fscanf(),

fgetc(), and so forth) always return the data we expect? Will fwrite()

(fprintf(), fputc(), and so forth) ever write data that we don't want, or

in a way we don't expect? Read on to the next section...

--------------------------------------------------------------------------------

Subject: 1. What's this business about endianness?

So you've been pulling you hair out trying to discover why your elegant

and perfect-beyond-reproach code, running on your Macintosh or Sun, is

reading garbage from PCX and TGA files. Or perhaps your MS-DOS or Windows

application just can't seem to make heads or tails out of that Sun Raster

file. And, to make matters even more mysterious, it seems your most

illustrious creation will read some TIFF files, but not others.

As was hinted at in the previous section, just reading the header of a

graphics file one field is not enough to insure data is always read correctly

(not enough for portable code, anyway). In addition to structure, we must also

consider the endianness of the file's data, and the endianness of the

system's architecture our code is running on.

Here's are some baseline rules to follow:

1) Graphics files typically use a fixed byte-ordering scheme. For example,

PCX and TGA files are always little-endian; Sun Raster and Macintosh

PICT are always big-endian.

2) Graphics files that may contain data using either byte-ordering scheme

(for example TIFF) will have an identifier that indicates the

endianness of the data.

3) ASCII-based graphics files (such as DXF and most 3D object files),

have no endianness and are always read in the same way on any system.

4) Most CPUs use a fixed byte-ordering scheme. For example, the 80486

is little-endian and the 68040 is big-endian.

5) You can test for the type of endianness a system using software.

6) There are many systems that are neither big- nor little-endian; these

middle-endian systems will possibly cause such byte-order detection

tests to return erroneous results.

Now we know that using fread() on a big-endian system to read data from a

file that was originally written in little-endian order will return

incorrect data. Actually, the data is correct, but the bytes that make up

the data are arranged in the wrong order. If we attempt to read the 16-bit

value 1234h from a little-endian file, it would be stored in memory using

the big-endian byte-ordering scheme and the value 3412h would result. What

we need is a swap function to change the resulting position of the bytes:

WORD SwapTwoBytes(WORD w)

{

register WORD tmp;

tmp = (w & 0x00FF);

tmp = ((w & 0xFF00) >> 0x08) | (tmp << 0x08);

return(tmp);

}

Now we can read a two-byte header value and swap the bytes as such:

fread(&Header.Height, sizeof(Header.Height), 1, fp);

Header.Height = SwapTwoBytes(Header.Height);

But what about four-byte values? The value 12345678h would be stored as

78563412h. What we need is a swap function to handle four-byte values:

DWORD SwapFourBytes(DWORD dw)

{

register DWORD tmp;

tmp = (dw & 0x000000FF);

tmp = ((dw & 0x0000FF00) >> 0x08) | (tmp << 0x08);

tmp = ((dw & 0x00FF0000) >> 0x10) | (tmp << 0x08);

tmp = ((dw & 0xFF000000) >> 0x18) | (tmp << 0x08);

return(tmp);

}

But how do we know when to swap and when not to swap? We always know the

byte-order of a graphics file that we are reading, but how do we check

what the endianness of system we are running on is? Using the C language,

we might use preprocessor switches to cause a conditional compile based on

a system definition flag:

#define MSDOS 1

#define WINDOWS 2

#define MACINTOSH 3

#define AMIGA 4

#define SUNUNIX 5

#define SYSTEM MSDOS

#if defined(SYSTEM == MSDOS)

// Little-endian code here

#elif defined(SYSTEM == WINDOWS)

// Little-endian code here

#elif defined(SYSTEM == MACINTOSH)

// Big-endian code here

#elif defined(SYSTEM == AMIGA)

// Big-endian code here

#elif defined(SYSTEM == SUNUNIX)

// Big-endian code here

#else

#error Unknown SYSTEM definition

#endif

My reaction to the above code was *YUCK!* (and I hope yours was too!). A

snarl of fread(), fwrite(), SwapTwoBytes(), and SwapFourBytes() functions

laced between preprocessor statements is hardly elegant code, although

sometimes it is our best choice. Fortunately, this is not one of those

times.

What we first need is a set of functions to read the data from a file

using the byte-ordering scheme of the data. This effectively combines the

read\write and swap operations into one set of functions. Considering the

following:

WORD GetBigWord(FILE *fp)

{

register WORD w;

w = (WORD) (fgetc(fp) & 0xFF);

w = ((WORD) (fgetc(fp) & 0xFF)) | (w << 0x08);

return(w);

}

WORD GetLittleWord(FILE *fp)

{

register WORD w;

w = (WORD) (fgetc(fp) & 0xFF);

w |= ((WORD) (fgetc(fp) & 0xFF) << 0x08);

return(w);

}

DWORD GetBigDoubleWord(FILE *fp)

{

register DWORD dw;

dw = (DWORD) (fgetc(fp) & 0xFF);

dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);

dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);

dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);

return(dw);

}

DWORD GetLittleDoubleWord(FILE *fp)

{

register DWORD dw;

dw = (DWORD) (fgetc(fp) & 0xFF);

dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x08);

dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x10);

dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x18);

return(dw);

}

void PutBigWord(WORD w, FILE *fp)

{

fputc((w >> 0x08) & 0xFF, fp);

fputc(w & 0xFF, fp);

}

void PutLittleWord(WORD w, FILE *fp)

{

fputc(w & 0xFF, fp);

fputc((w >> 0x08) & 0xFF, fp);

}

void PutBigDoubleWord(DWORD dw, FILE *fp)

{

fputc((dw >> 0x18) & 0xFF, fp);

fputc((dw >> 0x10) & 0xFF, fp);

fputc((dw >> 0x08) & 0xFF, fp);

fputc(dw & 0xFF, fp);

}

void PutLittleDoubleWord(DWORD dw, FILE *fp)

{

fputc(dw & 0xFF, fp);

fputc((dw >> 0x08) & 0xFF, fp);

fputc((dw >> 0x10) & 0xFF, fp);

fputc((dw >> 0x18) & 0xFF, fp);

}

If we were reading a little-endian file on a big-endian system (or visa

versa), the previous code:

fread(&Header.Height, sizeof(Header.Height), 1, fp);

Header.Height = SwapTwoBytes(Header.Height);

Would be replaced by:

Header.Height = GetLittleWord(fp);

The code to write the same value to a file would be changed from:

Header.Height = SwapTwoBytes(Header.Height);

fwrite(&Header.Height, sizeof(Header.Height), 1, fp);

To the slightly more readable:

PutLittleWord(Header.Height, fp);

Note that these functions are the same regardless of the endianness of a

system. For example, the ReadLittleWord() will always read a two-byte value

from a little-endian file regardless of the endianness of the system;

PutBigDoubleWord() will always write a four-byte big-endian value, and so

forth.

--------------------------------------------------------------------------------

Subject: 2. How can I determine the byte-order of a system at run-time?

You may wish to optimize how you read (or write) data from a graphics file

based on the endianness of your system. Using the GetBigDoubleWord()

function mentioned in the previous section to read big-endian data from a

file on a big-endian system imposes extra overhead we don't really need

(although if the actual number of read/write operations in your program is

small you might not consider this overhead to be too bad).

If our code could tell what the endianness of the system was at run-time,

it could choose (using function pointers) what set of read/write functions

to use. Look at the following function:

#define BIG_ENDIAN 0

#define LITTLE_ENDIAN 1

int TestByteOrder(void)

{

short int word = 0x0001;

char *byte = (char *) &word;

return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);

}

This code assigns the value 0001h to a 16-bit integer. A char pointer is

then assigned to point at the first (least-significant) byte of the

integer value. If the first byte of the integer is 01h, then the system

is little-endian (the 01h is in the lowest, or least-significant,

address). If it is 00h then the system is big-endian.

--------------------------------------------------------------------------------

Subject: 3. How can I identify the format of a graphics file?

When writing any type of file or data stream reader it is very important

to implement some sort of method for verifying that the input data is in

the format you expect. Here are a few methods:

1) Trust the user of your program to always supply the correct data,

thereby freeing you from the tedious task of writing any type of format

identification routines. Choose this method and you will provide solid

proof that contradicts the popular claim that users are inherently far

more stupid than programmers.

2) Read the file extension or descriptor. A GIF file will always have the

extension .GIF, right? Targa files .TGA, yes? And TIFF files will have an

extension of .TIF or a descriptor of TIFF. So no problem?

Well, for the most part, this is true. This method certainly isn't

bulletproof, however. Your reader will occasionally be fed the odd-batch

of mis-label files ("I thought they were PCX files!"). Or files with

unrecognized mangled extensions (.TAR rather than .TGA or .JFI rather

than .JPG) that your reader knows how to read, but won't read because it

doesn't recognize the extensions. File extensions also won't usually tell

you the revision of the file format you are reading (with some revisions

creating an almost entirely new format). And more than one file format

share the more common file extensions (such as .IMG and .PIC). And last of

all, data streams have no file extensions or descriptors to read at all.

3) Read the file and attempt to recognize the format by specific patterns

in the data. Most file formats contain some sort of identifying pattern of

data that is identical in all files. In some cases this pattern gives and

indication of the revision of the format (such as GIF87a and GIF89a) or

the endianness of the data format.

Nothing is easy, however. Not all formats contain such identifiers (such

as PCX). And those that do don't necessarily put it at the beginning of

the file. This means if the data is in the format of a stream you many

have to read (and buffer) most or all of the data before you can determine

the format. Of course, not all graphics formats are suitable to be read as

a data stream anyway.

Your best bet for a method of format detection is a combination of methods

two and three. First believe the file extension or descriptor, read some

data, and check for identifying data patterns. If this test fails, then

attempt to recognize all other known patterns.

Run-time file format identification a black-art at best.

--------------------------------------------------------------------------------

Subject: 4. What are the format identifiers of some popular file formats?

Here are a few algorithms that you can use to determine the format of a

graphics file at run-time.

GIF: The first six bytes of a GIF file will be the byte pattern of

474946383761h ("GIF87a") or 474946383961h ("GIF89a").

JFIF: The first three bytes are ffd8ffh (i.e., an SOI marker followed

by any marker). Do not check the fourth byte, as it will vary.

JPEG: The first three bytes are ffd8ffh (i.e., an SOI marker followed

by any marker). Do not check the fourth byte, as it will vary.

This works with most variants of "raw JPEG" as well.

PNG: The first eight bytes of all PNG files are 89504e470d0a1a0ah.

SPIFF: The first three bytes are ffd8ffh (i.e., an SOI marker followed

by any marker). Do not check the fourth byte, as it will vary.

Sun: The first four bytes of a Sun Rasterfile are 59a66a95h. If you have

accidentally read this identifier using the little-endian byte order

this value will will be read as 956aa659h.

TGA: The last 18 bytes of a TGA Version 2 file is the string

"TRUEVISION-XFILE.\0". If this string is not present, then the file

is assumed to be a TGA Version 1 file.

TIFF: The first four bytes of a big-endian TIFF files are 4d4d002ah and

49492a00h for little-endian TIFF files.

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有