Linux Kernel Internals(1)--Booting - 王朝网络宽屏版

Next Previous Contents

--------------------------------------------------------------------------------

1. Booting

1.1 Building the Linux Kernel Image

This section explains the steps taken during compilation of the Linux kernel and the output produced at each stage. The build process depends on the architecture so I would like to emphasize that we only consider building a Linux/x86 kernel.

When the user types 'make zImage' or 'make bzImage' the resulting bootable kernel image is stored as arch/i386/boot/zImage or arch/i386/boot/bzImage respectively. Here is how the image is built:

C and assembly source files are compiled into ELF relocatable object format (.o) and some of them are grouped logically into archives (.a) using ar(1).

Using ld(1), the above .o and .a are linked into vmlinux which is a statically linked, non-stripped ELF 32-bit LSB 80386 executable file.

System.map is produced by nm vmlinux, irrelevant or uninteresting symbols are grepped out.

Enter directory arch/i386/boot.

Bootsector asm code bootsect.S is preprocessed either with or without -D__BIG_KERNEL__, depending on whether the target is bzImage or zImage, into bbootsect.s or bootsect.s respectively.

bbootsect.s is assembled and then converted into 'raw binary' form called bbootsect (or bootsect.s assembled and raw-converted into bootsect for zImage).

Setup code setup.S (setup.S includes video.S) is preprocessed into bsetup.s for bzImage or setup.s for zImage. In the same way as the bootsector code, the difference is marked by -D__BIG_KERNEL__ present for bzImage. The result is then converted into 'raw binary' form called bsetup.

Enter directory arch/i386/boot/compressed and convert /usr/src/linux/vmlinux to $tmppiggy (tmp filename) in raw binary format, removing .note and .comment ELF sections.

gzip -9 < $tmppiggy > $tmppiggy.gz

Link $tmppiggy.gz into ELF relocatable (ld -r) piggy.o.

Compile compression routines head.S and misc.c (still in arch/i386/boot/compressed directory) into ELF objects head.o and misc.o.

Link together head.o, misc.o and piggy.o into bvmlinux (or vmlinux for zImage, don't mistake this for /usr/src/linux/vmlinux!). Note the difference between -Ttext 0x1000 used for vmlinux and -Ttext 0x100000 for bvmlinux, i.e. for bzImage compression loader is high-loaded.

Convert bvmlinux to 'raw binary' bvmlinux.out removing .note and .comment ELF sections.

Go back to arch/i386/boot directory and, using the program tools/build, cat together bbootsect, bsetup and compressed/bvmlinux.out into bzImage (delete extra 'b' above for zImage). This writes important variables like setup_sects and root_dev at the end of the bootsector.

The size of the bootsector is always 512 bytes. The size of the setup must be greater than 4 sectors but is limited above by about 12K - the rule is:

0x4000 bytes >= 512 + setup_sects * 512 + room for stack while running bootsector/setup

We will see later where this limitation comes from.

The upper limit on the bzImage size produced at this step is about 2.5M for booting with LILO and 0xFFFF paragraphs (0xFFFF0 = 1048560 bytes) for booting raw image, e.g. from floppy disk or CD-ROM (El-Torito emulation mode).

Note that while tools/build does validate the size of boot sector, kernel image and lower bound of setup size, it does not check the *upper* bound of said setup size. Therefore it is easy to build a broken kernel by just adding some large ".space" at the end of setup.S.

1.2 Booting: Overview

The boot process details are architecture-specific, so we shall focus our attention on the IBM PC/IA32 architecture. Due to old design and backward compatibility, the PC firmware boots the operating system in an old-fashioned manner. This process can be separated into the following six logical stages:

BIOS selects the boot device.

BIOS loads the bootsector from the boot device.

Bootsector loads setup, decompression routines and compressed kernel image.

The kernel is uncompressed in protected mode.

Low-level initialisation is performed by asm code.

High-level C initialisation.

1.3 Booting: BIOS POST

The power supply starts the clock generator and asserts #POWERGOOD signal on the bus.

CPU #RESET line is asserted (CPU now in real 8086 mode).

%ds=%es=%fs=%gs=%ss=0, %cs=0xFFFF0000,%eip = 0x0000FFF0 (ROM BIOS POST code).

All POST checks are performed with interrupts disabled.

IVT (Interrupt Vector Table) initialised at address 0.

The BIOS Bootstrap Loader function is invoked via int 0x19, with %dl containing the boot device 'drive number'. This loads track 0, sector 1 at physical address 0x7C00 (0x07C0:0000).

1.4 Booting: bootsector and setup

The bootsector used to boot Linux kernel could be either:

Linux bootsector (arch/i386/boot/bootsect.S),

LILO (or other bootloader's) bootsector, or

no bootsector (loadlin etc)

We consider here the Linux bootsector in detail. The first few lines initialise the convenience macros to be used for segment values:

--------------------------------------------------------------------------------

29 SETUPSECS = 4 /* default nr of setup-sectors */

30 BOOTSEG = 0x07C0 /* original address of boot-sector */

31 INITSEG = DEF_INITSEG /* we move boot here - out of the way */

32 SETUPSEG = DEF_SETUPSEG /* setup starts here */

33 SYSSEG = DEF_SYSSEG /* system loaded at 0x10000 (65536) */

34 SYSSIZE = DEF_SYSSIZE /* system size: # of 16-byte clicks */

--------------------------------------------------------------------------------

(the numbers on the left are the line numbers of bootsect.S file) The values of DEF_INITSEG, DEF_SETUPSEG, DEF_SYSSEG and DEF_SYSSIZE are taken from include/asm/boot.h:

--------------------------------------------------------------------------------

/* Don't touch these, unless you really know what you're doing. */

#define DEF_INITSEG 0x9000

#define DEF_SYSSEG 0x1000

#define DEF_SETUPSEG 0x9020

#define DEF_SYSSIZE 0x7F00

--------------------------------------------------------------------------------

Now, let us consider the actual code of bootsect.S:

--------------------------------------------------------------------------------

54 movw $BOOTSEG, %ax

55 movw %ax, %ds

56 movw $INITSEG, %ax

57 movw %ax, %es

58 movw $256, %cx

59 subw %si, %si

60 subw %di, %di

61 cld

62 rep

63 movsw

64 ljmp $INITSEG, $go

65 # bde - changed 0xff00 to 0x4000 to use debugger at 0x6400 up (bde). We

66 # wouldn't have to worry about this if we checked the top of memory. Also

67 # my BIOS can be configured to put the wini drive tables in high memory

68 # instead of in the vector table. The old stack might have clobbered the

69 # drive table.

70 go: movw $0x4000-12, %di # 0x4000 is an arbitrary value >=

71 # length of bootsect + length of

72 # setup + room for stack;

73 # 12 is disk parm size.

74 movw %ax, %ds # ax and es already contain INITSEG

75 movw %ax, %ss

76 movw %di, %sp # put stack at INITSEG:0x4000-12.

--------------------------------------------------------------------------------

Lines 54-63 move the bootsector code from address 0x7C00 to 0x90000. This is achieved by:

set %ds:%si to $BOOTSEG:0 (0x7C0:0 = 0x7C00)

set %es:%di to $INITSEG:0 (0x9000:0 = 0x90000)

set the number of 16bit words in %cx (256 words = 512 bytes = 1 sector)

clear DF (direction) flag in EFLAGS to auto-increment addresses (cld)

go ahead and copy 512 bytes (rep movsw)

The reason this code does not use rep movsd is intentional (hint - .code16).

Line 64 jumps to label go: in the newly made copy of the bootsector, i.e. in segment 0x9000. This and the following three instructions (lines 64-76) prepare the stack at $INITSEG:0x4000-0xC, i.e. %ss = $INITSEG (0x9000) and %sp = 0x3FF4 (0x4000-0xC). This is where the limit on setup size comes from that we mentioned earlier (see Building the Linux Kernel Image).

Lines 77-103 patch the disk parameter table for the first disk to allow multi-sector reads:

--------------------------------------------------------------------------------

77 # Many BIOS's default disk parameter tables will not recognise

78 # multi-sector reads beyond the maximum sector number specified

79 # in the default diskette parameter tables - this may mean 7

80 # sectors in some cases.

81 #

82 # Since single sector reads are slow and out of the question,

83 # we must take care of this by creating new parameter tables

84 # (for the first disk) in RAM. We will set the maximum sector

85 # count to 36 - the most we will encounter on an ED 2.88.

86 #

87 # High doesn't hurt. Low does.

88 #

89 # Segments are as follows: ds = es = ss = cs - INITSEG, fs = 0,

90 # and gs is unused.

91 movw %cx, %fs # set fs to 0

92 movw $0x78, %bx # fs:bx is parameter table address

93 pushw %ds

94 ldsw %fs:(%bx), %si # ds:si is source

95 movb $6, %cl # copy 12 bytes

96 pushw %di # di = 0x4000-12.

97 rep # don't need cld -> done on line 66

98 movsw

99 popw %di

100 popw %ds

101 movb $36, 0x4(%di) # patch sector count

102 movw %di, %fs:(%bx)

103 movw %es, %fs:2(%bx)

--------------------------------------------------------------------------------

The floppy disk controller is reset using BIOS service int 0x13 function 0 (reset FDC) and setup sectors a

[1] [2] [3] 下一页