对union的进一步认识与一些深层应用

虽然平时在程序开发时较少使用union,虽然当初学C语言时,union一章被老师略过没有介绍,虽然,自认为自己对union的认识已经足够,但是,在写完上一篇文章<(大卫的阅读笔记)C++中使用union的几点思考>之后,网上的讨论驱使我对这一基本的语言特性又投入了一些精力去关注,并写成了此文.

下面以MSDN中关于union的概述作为开头,这好像有些无聊,不过,有时候看specification的东西可以给我们很多提示,当我们从应用的角度去关注一样东西的时候,很多更深层次的考虑被我们忽略了.所以,看下去,里面有些东西可能正是你忽略了的.

union

union [tag] { member-list } [declarators];

[union] tag declarators;

The union keyword declares a union type and/or a variable of a union type.

A union is a user-defined data type that can hold values of different types at different times. It is similar to a structure

except that all of its members start at the same location in memory. A union variable can contain only one of its members at

a time. The size of the union is at least the size of the largest member(大卫注:我想不出来大于的情况).

For related information, see class, struct, and Anonymous Union.

Declaring a Union

Begin the declaration of a union with the union keyword, and enclose the member list in curly braces:

union UNKNOWN // Declare union type

{

char ch;

int i;

long l;

float f;

double d;

} var1; // Optional declaration of union variable

Using a Union

A C++ union is a limited form of the class type. It can contain access specifiers (public, protected, private), member data,

and member functions, including constructors and destructors. It cannot contain virtual functions or static data members. It

cannot be used as a base class, nor can it have base classes. Default access of members in a union is public.

A C union type can contain only data members.

In C, you must use the union keyword to declare a union variable. In C++, the union keyword is unnecessary:

Example 1

union UNKNOWN var2; // C declaration of a union variable

UNKNOWN var3; // C++ declaration of a union variable

Example 2

A variable of a union type can hold one value of any type declared in the union. Use the member-selection operator (.) to access a member of a union:

var1.i = 6; // Use variable as integer

var2.d = 5.327; // Use variable as double

为了避免对上述文字有稍许的歪曲,我故意没有翻译它,但在此对其进行一些归纳:

1.union是一种特殊的struct/class,是一种可用于容纳多种类型的类型,但与struct/class不同的是,所有的成员变量共享同一存储空间(最大的那一个成员类型的大小),这使得它具有多变的特性,可以在不同成员中任意切换,而无需借助强制类型转换,但这也使得你不能把它当作一个成员变量进行修改而不影响到另一成员变量;

2.union也可以有构造/析构函数,也可以包含访问标识符,但不能包含虚函数或静态成员变量/方法.

关于使用union时需要注意的一些问题,可以参考我的前一篇文章:<(大卫的阅读笔记)C++中使用union的几点思考>.

下面谈谈一些比较有意思并且有意义的union的应用.

1.in_addr

struct in_addr {

union {

struct { u_char s_b1,s_b2,s_b3,s_b4; } S_un_b;

struct { u_short s_w1,s_w2; } S_un_w;

u_long S_addr;

} S_un;

};

对于上面的struct,写过socket应用的人,肯定都用过它.不知你注意过没有,它包含了一个很有趣的union,该union的各成员具有相同的大小,分别代表同一信息的不同表现形式.你在进行程序设计的时候也可以利用这一特性来提供同一信息的不同表现形式,不过要注意,在进行跨平台应用时,字节顺序的影响可能给你造成一些不必要的麻烦.

2.匿名union

匿名union是没有名称和声明列表的union,这跟'__unnamed' union不是一回事,它的声明形式如下:

union { member-list } ;

匿名union仅仅通知编译器它的成员变量共享一个地址,而变量本身是直接引用的,不使用通常的点号运算符语法.也正因此,匿名union与同一程序块内的其它变量具有相同的作用域级别,需注意命名冲突.

请看下面的例子:

#include <iostream.h>

struct DataForm

{

enum DataType { CharData = 1, IntData, StringData };

DataType type;

// Declare an anonymous union.

union

{

char chCharMem;

char *szStrMem;

int iIntMem;

};

void print();

};

void DataForm::print()

{

// Based on the type of the data, print the

// appropriate data type.

switch( type )

{

case CharData:

cout << chCharMem;

break;

case IntData:

cout << szStrMem;

break;

case StringData:

cout << iIntMem;

break;

}

此外,匿名union还具有以下约束:

1).因为匿名联合不使用点运算符,所以包含在匿名联合内的元素必须是数据,不允许有成员函数,也不能包含私有或受保护的成员;

2).全局匿名联合必须是静态(static)的，否则就必须放在匿名名字空间中.

附注:

对匿名union的概念,你或许有些陌生,但对于Windows应用的开发人员,有一个经常用到的结构中就包含了匿名union,它就是VARIANT,也许你没有注意它罢了:

typedef struct FARSTRUCT tagVARIANT VARIANT;

typedef struct FARSTRUCT tagVARIANT VARIANTARG;

typedef struct tagVARIANT {

VARTYPE vt;

unsigned short wReserved1;

unsigned short wReserved2;

unsigned short wReserved3;

union {

Byte bVal; // VT_UI1.

Short iVal; // VT_I2.

long lVal; // VT_I4.

float fltVal; // VT_R4.

// ...

};

3.利用union进行类型转换

前面已经说过,union具有多变的特性,可以在不同成员中任意切换,而无需借助强制类型转换,下面举例说明这一点(其实1已经很好地说明了这一点):

#include <iostream>

using namespace std;

struct DATA

{

char c1;

char c2;

};

int main()

{

union {

int i;

DATA data;

} _ut;

_ut.i = 0x6162;

cout << "_ut.data.c1 = " << _ut.data.c1 << endl

<< "_ut.data.c2 = " << _ut.data.c2 << endl;

return 0;

}

需要提醒你的是,数据类型的转换,并非union的专长,只是一个可资利用的特性而已.因为,采用union进行类型间转换极易受平台影响,如上面的程序采用Intel x86 + Windows 2000 + VC6时输出为:

_ut.data.c1 = b

_ut.data.c2 = a

(注:因为Intel CPU的架构是Little Endian)

而在Sun的Sparc上,你得到的结果却是:

_ut.data.c1 =

_ut.data.c2 =

(注:因为采用Big Endian时,前两个字节为0x0000)

而即便是在同一平台上,在integer类型与real类型间进行转换时也不要采用union,否则,你会得到令你莫名其妙的结论(这是由于CPU对real类型的处理方式引起的,该方式在各平台上有极大区别,同时,根据C++ Standard,这种作法会引起"undefined behavior").

关于利用引用进行类型转换,可参考<引用在强制类型转化中的应用>.