分享
 
 
 

阅读笔记:x86系统调用入门

王朝other·作者佚名  2006-01-31
窄屏简体版  字體: |||超大  

阅读笔记:x86系统调用入门

原作者: Russ Blaine

原文来自:

http://blogs.sun.com/roller/page/rab

译注者: Badcoffee

Email: blog.oliver@gmail.com

Blog: http://blog.csdn.net/yayong

2005年7月

按:要开始学习像操作系统这样复杂的东东是一个令人头痛的问题。为了帮助新学者理清头绪,这里我们将讨论Solaris X86和Solaris

X64系统调用的基础构架。

x86 syscall primer

Getting started on a project as complex as an operating system can

be quite

a daunting(令人畏缩的) task. To help OpenSolaris newcomers sort out

their

head from

their tail(理出头绪), here's a look at the system call infrastructure on

Solaris

x86

and Solaris x64.

I'll go over the different system call methods used, their

departure(出发) points

in userland and entry points in the kernel, and then we'll actually

follow

one into the kernel with the debugger to see it all in action.

注:

1.

个人感觉学习操作系统最好的起点就是从系统调用来着手,因为系统调用是用户态进入到内核态的一个入口。看来不只是我们觉得操作系统复杂啊,连kernel

的developer都说它是be quite

a daunting task。所以学习中止步不前时千万别灰心,呵呵。

2. sort out their head from

tail是个习语,意思大约是“理顺头绪”。

Background

Processors in the x86 world support a number of different system

call

methods, and some are faster than others. In Solaris, unoptimized(未优化的)

system

calls take one of three possible paths into the kernel:

注:

3.

x86处理器支持许多种系统调用的方式,其中一些方式要比另一些快。在Solaris中,未优化的系统调用使用了其中的一种可能的方式。

lcall $0x27Used for years as the standard Solaris syscall method.

int $0x91

Used by linux for years, Solaris finally

adopted int as the base syscall method in Solaris 11 (under

development) - and earned a significant performance increase as a

result. It will be available soon in a Solaris 10 update.lcall $0x7Used by some (very old) statically linked binaries.

注:

4. lcall实际上就是利用x86的调用门机制。lcall $0x27是solaris系统调用的标准使用方式。lcall

$0x7则出现在solaris非常古老的静态链接库里。

5. int方式实际上利用的是x86提供的中断门。int

$0x91这种方式是Solaris在版本11马上要实现的一种方式,这种方式会显著提高性能,它也很快会出现在Solaris的update版本中。

Linux和FreeBSD实际上就是利用同样的机制,只不过它们用的是int $0x80,中断向量号不一样而已。

x86的CPU支持4种不同的门调用机制:

中断门 -- 被Windows/Linux/Unix系统用作中断处理和系统调用

陷阱门 -- 一般用做异常处理

调用门 -- Linux/Unix用来实现系统服务,兼容以前版本的应用

任务门 -- 现代的OS都不使用任务门,因为速度慢和任务数限制,只有早期的Linux2.0使用

关于x86 CPU调用门的详细介绍,请参考IntelP4的手册卷3:系统编程

Fast Syscalls and Hardware Capability Libraries

When a well-behaved application makes a system call, it jumps

through a

wrapper(包装) function in libc. Changing the instruction used to enter

the kernel

becomes a matter of changing the wrappers in libc. Recently I

integrated

support for faster, chip-proprietary(芯片特有的) system calls into Solaris

10:

sysenter (from Intel) and syscall (from AMD). Along with

new

kernel entry points, new hwcap (as in "hardware capability")

versions of libc were provided to take advantage of the these new,

faster

instructions ( TimMarsland has written about the hw capability architecture and

DarrenMoffat has written about how the system goes about selecting and

using

a hwcap libc).

注:

6. 应用程序调用系统调用,通常是通过libc里面的包装函数,包装函数最终会通过CPU提供的几种系统进入系统调用服务的指令中的一种,来进入到内核态。

7. 最近,作者集成支持了更快的,芯片特有的系统调用指令到Solaris

10:Intel的sysenter和AMD的syscall。新的kernel的入口点,提供了新的hardware

capability版本的libc库,它们利用了这些新的更快的指令。作者还给了另外相关的两篇文章的链接,都是关于hwcap库的。

I often get confused about which system call method is used on which

type

of system. For the record, the following table shows which methods are

supported by the various flavor combinations of x86 kernels, CPUs, and

user

application types shipping today:

u64 = 64-bit user applications

u32 = 32-bit user applications

syscall

sysenter

64-bit kernel

Intel Xeon

u64 (64-bit libc)

u32 (hwcap1)

AMD Opteron

u64 (64-bit libc)

u32 (hwcap2)

-

32-bit kernel

Intel Xeon

-

u32 (hwcap1)

AMD Opteron

u32

u32 (hwcap1)

(The hwcap libraries referenced live in the /usr/lib/libc directory.)

注:

8. 上表给出了Intel Xeon和AMD Optern在32bit和64bitkernel的情况下,使用libc库的版本的情况:

Solaris是64位内核时,64位的libc库(即u64)无论Xeon还是Optern都是使用的syscall指令,这大概是因为AMD在64位

技术领先一步,intel不得不追随吧.

Solaris是64位内核时,还同时为支持32位应用程序提供了32位的libc库,这时Solaris为两种CPU提供了不同的32位libc版本:

u32 (hwcap1) --

libc的hardware

capability 1版本,提供对Intel CPU快速系统调用指令SYSENTER/SYSEXIT的支持

u32 (hwcap2) -- libc的hardware

capability 2版本,提供对AMD的快速系统调用指令SYSCALL/SYSRET的支持

Solaris是32位内核时,AMD和Intel都使用

libc的hardware

capability 1版本。

Intel在很早就在PII 300(Family 6,Model 3,Stepping

3)支持了新的快速系统调用指令SYSENTER/SYSEXIT。AMD的Optern在32位模式下是与其保持兼容的,在64位模式下,AMD抢得先

机,推出新的快速系统调用指令SYSCALL/SYSRET,Intel的EMT64不得不与之兼容。

关于Intel及AMD的快速系统调用指令可以参考

Linux2.6 对新型 CPU快速系统调用的支持这篇文章。当然,更彻底是需要看一看Intel和AMD的系统编程手册了。

Digging In

To illustrate this, let's take a look at the libc source code. It

lives in

under the usr/src/lib/libc directory. The important entries here

are:

i386/ - 32-bit source code and unoptimized binaryamd64/ - 64-bit source code and binaryi386_hwcap1/ - Intel CPU-specific source code and binaryi386_hwcap2/ - AMD CPU-specific source code and binary

注:

9. 这里给出了libc的源代码路径,通过查看i386/sysamd64/sys下syscall.s

的源代码,结合i386_hwcap1i386_hwcap2

代码目录下的Makefile文件的宏定义,即可了解4种libc版本的差异。

A simple system call to use for this example is mkdir(2). We

can

use mdb to disassemble the text bits and see how libc jumps into the

kernel:

rab> mdb /lib/libc.so.1

Loading modules: [ libc.so.1 ]

> mkdir::dis

mkdir: movl $0x50,%eax

mkdir+5: syscall

mkdir+7: jb -0x82847 <__cerror>

mkdir+0xd: ret

We can see that the system call number (See EricSchrock's post for more information on system call numbers) is

stashed

away in register %eax so the kernel can find it later, and

then

the syscall instruction is executed to transfer control to

the

kernel.

注:

10.

这里用mdb可以反汇编libc的系统调用mkdir(2),可以看出只是一个简单的包装函数,通过把系统调用号放入eax寄存器,再用syscall指

令来进入内核。

12. mkdir的系统调用号是0x50即十进制的80,在syscall.h

以找到定义:

#define

SYS_mkdir 80

This example is on an AMD Opteron system, because otherwise we'd

expect to

find either lcall $0x27 or sysenter as the control

transfer instruction. We can get at the unoptimized libc by unmounting

the

hwcap library:

rab> su

Password:

# umount /lib/libc.so.1

rab> mdb /lib/libc.so.1

Loading modules: [ libc.so.1 ]

> mkdir::dis

mkdir: movl $0x50,%eax

mkdir+5: lcall $0x27,$0x0

mkdir+0xc: jb -0x82b2c <__cerror>

mkdir+0x12: ret

注:

13. umount掉libc.so.1后,这时就是未经优化的系统调用libc版本了,可以看到,发起系统调用的指令已经改成lcall

$0x27了。作者应该是在Solaris10上做的实验,在OpenSolaris上,未优化的libc中系统调用应该已经用int

$0x91了,请见我后面的注释15和16小节。

Tracing it back to the source

Ah-hah - now let's look at the source for the libc mkdir(2)

wrapper to complete the userland picture:

rab> pwd

.../usr/src/lib/libc/common/sys

rab> cat mkdir.s

[ snip ]

#include "SYS.h"

SYSCALL_RVAL1(mkdir)

RET

SET_SIZE(mkdir)

注:

14.

这里展示了mkdir在libc里的实现,实际上就是用了SYSCALL_RVAL1这个宏,看表面意思这个宏应该是用在返回值只有一个的系统调用上的。

In order to organize the source in a portable way that avoids

reproducing

the same code in more than one place, many portions of libc are

implemented

as preprocessor macros. mkdir(2) is so simple that it needs

nothing but the SYSCALL macro, found in SYS.h. For reasons too

boring to repeat here, the SYSCALL macro eventually expands into a

corresponding SYSTRAP macro. All 32-bit variants of libc share one

SYS.h, and preprocessor macros defined via Makefiles in the

binary

directories determine which instructions go into the SYSTRAP macro:

注:

15. 使用SYSCALL*的宏主要是多个地方避免重复编码,这个宏展开后对应着SYSTRAP的宏。SYSCALL*类的宏在SYS.h文件里定义是随着结合i386_hwcap1i386_hwcap2

代码目录下的Makefile文件的宏定义来决定用哪一种SYSTRAP宏的。

rab> pwd

.../usr/src/lib/libc/i386/inc

rab> grep SYSTRAP_RVAL1 SYS.h

#define SYSTRAP_RVAL1(name) __SYSCALL(name)

#define SYSTRAP_RVAL1(name) __SYSENTER(name)

#define SYSTRAP_RVAL1(name) __SYSLCALL(name)

One of the above macros are used depending on which libc is being

built:

__SYSCALL() for hwcap2, __SYSENTER() for

hwcap1, and __SYSLCALL() for the unoptimized base libc

at

/lib/libc.so.1.

注:

16. 可以看到,根据i386_hwcap1i386_hwcap2

录下的Makefile文件里的宏定义,libc被build成使用__SYSCALL()的hwcap2版本或者使用__SYSENTER()的

hwcap1版本,再或者未优化的版本(如前所述,solaris 10用lcall $27, OpenSolaris用int $91)。

事实上,所有32位的libc库,即便是hwcap1的libc库,也不是所有的系统调用全由__SYSENTER()来实现系统调用,对于多个返

回值的系统调用,还是会用lcall $0x27或者int $0x91来实现的,在OpenSolaris32bit的libc的源代码sys.h

有如下定义:

#define

SYSTRAP_RVAL2(name) __SYSCALLINT(name)

#define

SYSTRAP_2RVALS(name) __SYSCALLINT(name)

#define

SYSTRAP_64RVAL(name) __SYSCALLINT(name)

可以看到,OpenSolaris对于多返回值的系统调用是用init $0x91实现的。

rab> cat SYS.h

[ snip ]

#define __SYSLCALL(name) /* CSTYLED */ movl $SYS_/**/name, %eax; lcall $SYSCALL_TRAPNUM, $0

[ snip ]

#define __SYSCALL(name) /* CSTYLED */ movl $SYS_/**/name, %eax; .byte 0xf, 0x5 /* syscall */

We added support for AMD's syscall instruction to Solaris,

but we

were using a slightly older version of our assembler which

(embarassingly

enough) didn't yet recognize the instruction, so its opcode had to be

manually hard-coded into libc.

注:

17. 由于开发用的编译器版本略旧一些,还不能识别AMD

Optern的syscall指令,因此在__SYSCALL的宏定义里直接使用了该指令的机器码。

另外,可以在OpenSolaris的sys.h

件里找到支持新的int $0x91的实现:

#define

__SYSCALLINT(name) /* CSTYLED

*/ movl

$SYS_/**/name, %eax; int

$T_SYSCALLINT

Jumping Over the Fence(围栏)

That's all for userland; the easy part is over. Because the actual

workings of

the differing system call instructions vary widely, the kernel uses

separate code paths to deal with each. The function entry points used

are

(shown are only those for 32-bit applications making system calls):

Entry Instruction

Kernel Entry Point

64-bit

kernel

lcall*

trap()

syscall

sys_syscall32()

sysenter

sys_sysenter()

32-bit kernel

lcall

sys_call()

sysenter

sys_sysenter()

* In the 64-bit kernel, 32-bit

system

calls made via lcall come in to the system via a

segment-not-present trap (#np), a matter which is beyond the

scope

of this document. Trust me, you don't want to get into segmentation

now...

注:

18. 上表只给出了Solaris内核中的32位应用程序的系统调用入口。为支持各种系统调用指令,实际上内核同时实现了不同代码路径的处理函数。

Seeing it in Action

Using the kernel debugger we can step out of the classroom and watch

these

creatures in their native wild habitats. Boot a machine and from the

system console get the kernel debugger loaded and ready. Enter the

debugger, and then set a breakpoint on the syscall entry

point. I'm still using the same Opteron machine as above (running the

64-bit kernel), so I need to re-mount the hwcap library:

root> mount -O -F lofs /usr/lib/libc/libc_hwcap2.so.1 /lib/libc.so.1

注:

19.

由于之前作者已经umount了hwcap2的libc库,所以这里想使用hwcap2版本的话,需要重新mount该库到

/lib/libc.so.1。

root> mdb -K

Welcome to kmdb

Loaded modules: [ cpc ptm ufs unix krtld sppp nca lofs genunix ip logindmux usba

specfs nfs random sctp ]

[0]> sys_syscall32:b

[0]> :c

kmdb: stop at sys_syscall32

kmdb: target stopped at:

sys_syscall32: swapgs

[1]> ::cpuinfo

ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC

0 fffffffffbc230a0 1b 0 0 60 no no t-0 ffffffff82b38520

fsflush

1 ffffffff8bdd1800 1b 0 0 49 no no t-0 ffffffff8cc991e0 ksh

We set a breakpoint, and tripped over(跳出) it immediately after

continuing

(because system calls are a very common occurrence on even an idle

machine). We can see that CPU1 tripped(跳入) the breakpoint first (as

evidenced

by the [1] in the kmdb prompt), and that ksh is the process

running. Which system call is the shell making?

注:

20. 作者利用mdb

-k进入到kmdb来直接设置在64位内核的32位应用程序的系统调用入口函数sys_syscalll32(见前面的表格)设置内核断点,

然后又用:c来继续恢复内核运行:

[0]> sys_syscall32:b ;设置断点

[0]> :c ;继续恢复运行

21.

即时在一台空闲的机器上,系统调用也是发生的非常频繁的,因此很快CPU1就运行到设置的断点处,这时kmdb的提示符就是[1]表示停在CPU1上。使

用::cpuinfo可以看到,用户进程ksh在CPU1上运行。

Remember that the libc

wrapper function stashed the system call number in register

%eax. When we are in the 64-bit kernel, %eax is the

lower

32-bits of register %rax:

[1]> <rax=D

98

注:

22.

libc的包装函数是用寄存器eax来存放调用号,Opteron中rax寄存器的低32位就是eax,因此这里直接察看其内容,转换成10进制数格式。

syscall 98, which -- according to the sysent table (see sysent.c)

-- is the shell doing a sigaction(2) (which makes sense,

because

shells are always messing around with signals).

23. 可以看到,98号系统调用就是sigaction(2)是可以解释得通的,因为shell经常发信号。

Clear the breakpoint and try the same thing with the 64-bit entry

point (it

is sys_syscall()), but this time enter the debugger by

sending a

break over the console (how one does this varies depending on the

terminal

being used to access the console):

[1]> :z

[1]> sys_syscall:b

[1]> :c

root>

root>

root>

注:

24. 清除之前的断点,然后在64位内核中的64位应用的系统调用入口函数sys_syscall处设置断点,然后继续运行。

Because this is an otherwise idle machine, nothing trips the 64-bit

syscall

breakpoint just yet. There just aren't very many 64-bit processes

running. We can run one manually to trigger the breakpoint:

root> /usr/bin/amd64/ls

kmdb: stop at sys_syscall

kmdb: target stopped at:

sys_syscall: swapgs

[1]> <rax=D

115

We see that the first 64-bit system call made by the 64-bit ls is

mmap(2), which makes sense because the 64-bit dynamic linker

needs

to begin setting up the new process's address space.

注:

25.

由于这是台空闲机器,没有很多64位的应用程序在运行,因此继续运行后没有进入到断点处。因此作者手工执行64位的ls命令来使其进入断点。这时察看系统

调用号,是mmap(2),这也是可以解释的,因为程序开始执行时,64位的动态链接器需要先用mmap设置新的进程地址空间。

OpenSolaris

Solaris

mdb

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
2023年上半年GDP全球前十五强
 百态   2023-10-24
美众议院议长启动对拜登的弹劾调查
 百态   2023-09-13
上海、济南、武汉等多地出现不明坠落物
 探索   2023-09-06
印度或要将国名改为“巴拉特”
 百态   2023-09-06
男子为女友送行,买票不登机被捕
 百态   2023-08-20
手机地震预警功能怎么开?
 干货   2023-08-06
女子4年卖2套房花700多万做美容:不但没变美脸,面部还出现变形
 百态   2023-08-04
住户一楼被水淹 还冲来8头猪
 百态   2023-07-31
女子体内爬出大量瓜子状活虫
 百态   2023-07-25
地球连续35年收到神秘规律性信号,网友:不要回答!
 探索   2023-07-21
全球镓价格本周大涨27%
 探索   2023-07-09
钱都流向了那些不缺钱的人,苦都留给了能吃苦的人
 探索   2023-07-02
倩女手游刀客魅者强控制(强混乱强眩晕强睡眠)和对应控制抗性的关系
 百态   2020-08-20
美国5月9日最新疫情:美国确诊人数突破131万
 百态   2020-05-09
荷兰政府宣布将集体辞职
 干货   2020-04-30
倩女幽魂手游师徒任务情义春秋猜成语答案逍遥观:鹏程万里
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案神机营:射石饮羽
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案昆仑山:拔刀相助
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案天工阁:鬼斧神工
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案丝路古道:单枪匹马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:与虎谋皮
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:李代桃僵
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案镇郊荒野:指鹿为马
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:小鸟依人
 干货   2019-11-12
倩女幽魂手游师徒任务情义春秋猜成语答案金陵:千金买邻
 干货   2019-11-12
 
推荐阅读
 
 
 
>>返回首頁<<
 
靜靜地坐在廢墟上,四周的荒凉一望無際,忽然覺得,淒涼也很美
© 2005- 王朝網路 版權所有