分享
 
 
 

Big Endian 和 Little Endian (1)

王朝java/jsp·作者佚名  2006-01-09
窄屏简体版  字體: |||超大  

谈到字节序的问题,必然牵涉到两大CPU派系。那就是Motorola的PowerPC系列CPU和Intel的x86系列CPU。PowerPC系列采用big endian方式存储数据,而x86系列则采用little endian方式存储数据。那么究竟什么是big endian,什么又是little endian呢?

其实big endian是指低地址存放最高有效字节(MSB),而little endian则是低地址存放最低有效字节(LSB)。 用文字说明可能比较抽象,下面用图像加以说明。比如数字0x12345678在两种不同字节序CPU中的存储顺序如下所示:

Big Endian

低地址 高地址

----------------------------------------->

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| 12 | 34 | 56 | 78 |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Little Endian

低地址 高地址

----------------------------------------->

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| 78 | 56 | 34 | 12 |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

从上面两图可以看出,采用big endian方式存储数据是符合我们人类的思维习惯的。而little endian,!@#$%^&*,见鬼去吧 -_-|||

为什么要注意字节序的问题呢?你可能这么问。当然,如果你写的程序只在单机环境下面运行,并且不和别人的程序打交道,那么你完全可以忽略字节序的存在。但是,如果你的程序要跟别人的程序产生交互呢?在这里我想说说两种语言。C/C++语言编写的程序里数据存储顺序是跟编译平台所在的CPU相关的,而JAVA编写的程序则唯一采用big endian方式来存储数据。试想,如果你用C/C++语言在x86平台下编写的程序跟别人的JAVA程序互通时会产生什么结果?就拿上面的0x12345678来说,你的程序传递给别人的一个数据,将指向0x12345678的指针传给了JAVA程序,由于JAVA采取big endian方式存储数据,很自然的它会将你的数据翻译为0x78563412。什么?竟然变成另外一个数字了?是的,就是这种后果。因此,在你的C程序传给JAVA程序之前有必要进行字节序的转换工作。

无独有偶,所有网络协议也都是采用big endian的方式来传输数据的。所以有时我们也会把big endian方式称之为网络字节序。当两台采用不同字节序的主机通信时,在发送数据之前都必须经过字节序的转换成为网络字节序后再进行传输。ANSI C中提供了下面四个转换字节序的宏。

big endian:最高字节在地址最低位,最低字节在地址最高位,依次排列。

little endian:最低字节在最低位,最高字节在最高位,反序排列。

endian指的是当物理上的最小单元比逻辑上的最小单元小时,逻辑到物理的单元排布关系。咱们接触到的物理单元最小都是byte,在通信领域中,这里往往是bit,不过原理也是类似的。

一个例子:

如果我们将0x1234abcd写入到以0x0000开始的内存中,则结果为

big-endian little-endian

0x0000 0x12 0xcd

0x0001 0x34 0xab

0x0002 0xab 0x34

0x0003 0xcd 0x12

目前应该little endian是主流,因为在数据类型转换的时候(尤其是指针转换)不用考虑地址问题。

PS:

这两个术语来自于 Jonathan Swift 的《《格利佛游记》其中交战的两个派别无法就应该从哪一端--小端还是大端--打开一个半熟的鸡蛋达成一致。:)

在那个时代,Swift是在讽刺英国和法国之间的持续冲突,Danny Cohen,一位网络协议的早期开创者,第一次使用这两个术语来指代字节顺序,后来这个术语被广泛接纳了

摘自《深入理解计算机系统》

很好的一本书:)

An Essay on Endian Order

Copyright (C) Dr. William T. Verts, April 19, 1996 Depending on which computing system you use, you will have to consider the byte order in which multibyte numbers are stored, particularly when you are writing those numbers to a file. The two orders are called "Little Endian" and "Big Endian".

The Basics

"Little Endian" means that the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address. (The little end comes first.) For example, a 4 byte LongInt

Byte3 Byte2 Byte1 Byte0

will be arranged in memory as follows: Base Address+0 Byte0

Base Address+1 Byte1

Base Address+2 Byte2

Base Address+3 Byte3

Intel processors (those used in PC's) use "Little Endian" byte order. "Big Endian" means that the high-order byte of the number is stored in memory at the lowest address, and the low-order byte at the highest address. (The big end comes first.) Our LongInt, would then be stored as:

Base Address+0 Byte3

Base Address+1 Byte2

Base Address+2 Byte1

Base Address+3 Byte0

Motorola processors (those used in Mac's) use "Big Endian" byte order. Which is Better?

You may see a lot of discussion about the relative merits of the two formats, mostly religious arguments based on the relative merits of the PC versus the Mac. Both formats have their advantages and disadvantages.

In "Little Endian" form, assembly language instructions for picking up a 1, 2, 4, or longer byte number proceed in exactly the same way for all formats: first pick up the lowest order byte at offset 0. Also, because of the 1:1 relationship between address offset and byte number (offset 0 is byte 0), multiple precision math routines are correspondingly easy to write.

In "Big Endian" form, by having the high-order byte come first, you can always test whether the number is positive or negative by looking at the byte at offset zero. You don't have to know how long the number is, nor do you have to skip over any bytes to find the byte containing the sign information. The numbers are also stored in the order in which they are printed out, so binary to decimal routines are particularly efficient.

What does that Mean for Us?

What endian order means is that any time numbers are written to a file, you have to know how the file is supposed to be constructed. If you write out a graphics file (such as a .BMP file) on a machine with "Big Endian" integers, you must first reverse the byte order, or a "standard" program to read your file won't work.

The Windows .BMP format, since it was developed on a "Little Endian" architecture, insists on the "Little Endian" format. You must write your Save_BMP code this way, regardless of the platform you are using.

Common file formats and their endian order are as follows:

Adobe Photoshop -- Big Endian BMP (Windows and OS/2 Bitmaps) -- Little Endian DXF (AutoCad) -- Variable GIF -- Little Endian IMG (GEM Raster) -- Big Endian JPEG -- Big Endian FLI (Autodesk Animator) -- Little Endian MacPaint -- Big Endian PCX (PC Paintbrush) -- Little Endian PostScript -- Not Applicable (text!) POV (Persistence of Vision ray-tracer) -- Not Applicable (text!) QTM (Quicktime Movies) -- Little Endian (on a Mac!) Microsoft RIFF (.WAV & .AVI) -- Both Microsoft RTF (Rich Text Format) -- Little Endian SGI (Silicon Graphics) -- Big Endian Sun Raster -- Big Endian TGA (Targa) -- Little Endian TIFF -- Both, Endian identifier encoded into file WPG (WordPerfect Graphics Metafile) -- Big Endian (on a PC!) XWD (X Window Dump) -- Both, Endian identifier encoded into file Correcting for the Non-Native Order

It is pretty easy to reverse a multibyte integer if you find you need the other format. A single function can be used to switch from one to the other, in either direction. A simple and not very efficient version might look as follows: Function Reverse (N:LongInt) : LongInt ;

Var B0, B1, B2, B3 : Byte ;

Begin

B0 := N Mod 256 ;

N := N Div 256 ;

B1 := N Mod 256 ;

N := N Div 256 ;

B2 := N Mod 256 ;

N := N Div 256 ;

B3 := N Mod 256 ;

Reverse := (((B0 * 256 + B1) * 256 + B2) * 256 + B3) ;

End ;

A more efficient version that depends on the presence of hexadecimal numbers, bit masking operators AND, OR, and NOT, and shift operators SHL and SHR might look as follows: Function Reverse (N:LongInt) : LongInt ;

Var B0, B1, B2, B3 : Byte ;

Begin

B0 := (N AND $000000FF) SHR 0 ;

B1 := (N AND $0000FF00) SHR 8 ;

B2 := (N AND $00FF0000) SHR 16 ;

B3 := (N AND $FF000000) SHR 24 ;

Reverse := (B0 SHL 24) OR (B1 SHL 16) OR (B2 SHL 8) OR (B3 SHL 0) ;

End ;

用文字说明可能比较抽象,下面用图像加以说明。比如数字0x12345678在两种不同字节序CPU中的存储顺序如下所示:

Big Endian

低地址 高地址

----------------------------------------->

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| 12 | 34 | 56 | 78 |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Little Endian

低地址 高地址

----------------------------------------->

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| 78 | 56 | 34 | 12 |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

从上面两图可以看出,采用big endian方式存储数据是符合我们人类的思维习惯的。而little endian,!@#$%^&*,见鬼去吧 -_-|||

为什么要注意字节序的问题呢?你可能这么问。当然,如果你写的程序只在单机环境下面运行,并且不和别人的程序打交道,那么你完全可以忽略字节序的存在。但是,如果你的程序要跟别人的程序产生交互呢?在这里我想说说两种语言。C/C++语言编写的程序里数据存储顺序是跟编译平台所在的CPU相关的,而JAVA编写的程序则唯一采用big endian方式来存储数据。试想,如果你用C/C++语言在x86平台下编写的程序跟别人的JAVA程序互通时会产生什么结果?就拿上面的0x12345678来说,你的程序传递给别人的一个数据,将指向0x12345678的指针传给了JAVA程序,由于JAVA采取big endian方式存储数据,很自然的它会将你的数据翻译为0x78563412。什么?竟然变成另外一个数字了?是的,就是这种后果。因此,在你的C程序传给JAVA程序之前有必要进行字节序的转换工作。

无独有偶,所有网络协议也都是采用big endian的方式来传输数据的。所以有时我们也会把big endian方式称之为网络字节序。当两台采用不同字节序的主机通信时,在发送数据之前都必须经过字节序的转换成为网络字节序后再进行传输。ANSI C中提供了下面四个转换字节序的宏。

big endian:最高字节在地址最低位,最低字节在地址最高位,依次排列。

little endian:最低字节在最低位,最高字节在最高位,反序排列。

endian指的是当物理上的最小单元比逻辑上的最小单元小时,逻辑到物理的单元排布关系。咱们接触到的物理单元最小都是byte,在通信领域中,这里往往是bit,不过原理也是类似的。

一个例子:

如果我们将0x1234abcd写入到以0x0000开始的内存中,则结果为

big-endian little-endian

0x0000 0x12 0xcd

0x0001 0x34 0xab

0x0002 0xab 0x34

0x0003 0xcd 0x12

目前应该little endian是主流,因为在数据类型转换的时候(尤其是指针转换)不用考虑地址问题。

PS:

这两个术语来自于 Jonathan Swift 的《《格利佛游记》其中交战的两个派别无法就应该从哪一端--小端还是大端--打开一个半熟的鸡蛋达成一致。:)

在那个时代,Swift是在讽刺英国和法国之间的持续冲突,Danny Cohen,一位网络协议的早期开创者,第一次使用这两个术语来指代字节顺序,后来这个术语被广泛接纳了

摘自《深入理解计算机系统》

很好的一本书:)

An Essay on Endian Order

Copyright (C) Dr. William T. Verts, April 19, 1996 Depending on which computing system you use, you will have to consider the byte order in which multibyte numbers are stored, particularly when you are writing those numbers to a file. The two orders are called "Little Endian" and "Big Endian".

The Basics

"Little Endian" means that the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address. (The little end comes first.) For example, a 4 byte LongInt

Byte3 Byte2 Byte1 Byte0

will be arranged in memory as follows: Base Address+0 Byte0

Base Address+1 Byte1

Base Address+2 Byte2

Base Address+3 Byte3

Intel processors (those used in PC's) use "Little Endian" byte order. "Big Endian" means that the high-order byte of the number is stored in memory at the lowest address, and the low-order byte at the highest address. (The big end comes first.) Our LongInt, would then be stored as:

Base Address+0 Byte3

Base Address+1 Byte2

Base Address+2 Byte1

Base Address+3 Byte0

Motorola processors (those used in Mac's) use "Big Endian" byte order. Which is Better?

You may see a lot of discussion about the relative merits of the two formats, mostly religious arguments based on the relative merits of the PC versus the Mac. Both formats have their advantages and disadvantages.

In "Little Endian" form, assembly language instructions for picking up a 1, 2, 4, or longer byte number proceed in exactly the same way for all formats: first pick up the lowest order byte at offset 0. Also, because of the 1:1 relationship between address offset and byte number (offset 0 is byte 0), multiple precision math routines are correspondingly easy to write.

In "Big Endian" form, by having the high-order byte come first, you can always test whether the number is positive or negative by looking at the byte at offset zero. You don't have to know how long the number is, nor do you have to skip over any bytes to find the byte containing the sign information. The numbers are also stored in the order in which they are printed out, so binary to decimal routines are particularly efficient.

What does that Mean for Us?

What endian order means is that any time numbers are written to a file, you have to know how the file is supposed to be constructed. If you write out a graphics file (such as a .BMP file) on a machine with "Big Endian" integers, you must first reverse the byte order, or a "standard" program to read your file won't work.

The Windows .BMP format, since it was developed on a "Little Endian" architecture, insists on the "Little Endian" format. You must write your Save_BMP code this way, regardless of the platform you are using.

Common file formats and their endian order are as follows:

Adobe Photoshop -- Big Endian BMP (Windows and OS/2 Bitmaps) -- Little Endian DXF (AutoCad) -- Variable GIF -- Little Endian IMG (GEM Raster) -- Big Endian JPEG -- Big Endian FLI (Autodesk Animator) -- Little Endian MacPaint -- Big Endian PCX (PC Paintbrush) -- Little Endian PostScript -- Not Applicable (text!) POV (Persistence of Vision ray-tracer) -- Not Applicable (text!) QTM (Quicktime Movies) -- Little Endian (on a Mac!) Microsoft RIFF (.WAV & .AVI) -- Both Microsoft RTF (Rich Text Format) -- Little Endian SGI (Silicon Graphics) -- Big Endian Sun Raster -- Big Endian TGA (Targa) -- Little Endian TIFF -- Both, Endian identifier encoded into file WPG (WordPerfect Graphics Metafile) -- Big Endian (on a PC!) XWD (X Window Dump) -- Both, Endian identifier encoded into file Correcting for the Non-Native Order

It is pretty easy to reverse a multibyte integer if you find you need the other format. A single function can be used to switch from one to the other, in either direction. A simple and not very efficient version might look as follows: Function Reverse (N:LongInt) : LongInt ;

Var B0, B1, B2, B3 : Byte ;

Begin

B0 := N Mod 256 ;

N := N Div 256 ;

B1 := N Mod 256 ;

N := N Div 256 ;

B2 := N Mod 256 ;

N := N Div 256 ;

B3 := N Mod 256 ;

Reverse := (((B0 * 256 + B1) * 256 + B2) * 256 + B3) ;

End ;

A more efficient version that depends on the presence of hexadecimal numbers, bit masking operators AND, OR, and NOT, and shift operators SHL and SHR might look as follows: Function Reverse (N:LongInt) : LongInt ;

Var B0, B1, B2, B3 : Byte ;

Begin

B0 := (N AND $000000FF) SHR 0 ;

B1 := (N AND $0000FF00) SHR 8 ;

B2 := (N AND $00FF0000) SHR 16 ;

B3 := (N AND $FF000000) SHR 24 ;

Reverse := (B0 SHL 24) OR (B1 SHL 16) OR (B2 SHL 8) OR (B3 SHL 0) ;

End ;

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有