阅读笔记:库绑定 - 我们应该让它更精确一些
原作者: Michael Walker's
原文来自: http://blogs.sun.com/roller/page/msw
译注者: Badcoffee
Email: blog.oliver@gmail.com
Blog: http://blog.csdn.net/yayong
2005年6月
按:Linux和Solairs的ELF文件是如何在执行过程中利用ld.so.1加载依赖的共享库并进行符号绑定的?这篇文章讲述了Solaris一种新的符号绑定方式:直接绑定。
Library Bindings - let's be a little bit more precise shall we
Library Bindings - let's be a little bit more precise shall weWell - now that OpenSolaris is officially open for business I think its well past time for another Blog entry from me. This blog entry will give a little history on how a executable (or shared objects) on Solaris binds to its various dependencies. My goal here is to give some insight(洞察) into what we do today, as well has hi-light a alternative binding technology which you may or may not be familiar with, Direct Bindings with ld -Bdirect. The Direct Bindings model is a much more precise(精确的) model when resolving(解析) reference(引用) between objects. But first a little history on what we currently do. Solaris (and *nix's in general) does the following when a process is executed. The kernel will load the required program (a ELF object) into memory and also load the runtime linker ( ld.so.1(1) ) into memory. The kernel then transfers(转让) control initially to the runtime linker. It's the runtime linkers job to examine the. program loaded and find any dependencies(依赖) it has (in the form of a shared object), load those shared objects into memory, and then bind all of the symbol bindings(绑定) (function calls, data references, etc...) from the program to each of those dependencies. Of coarse, as it loads each shared object it must in turn do the same examination on each of them and load any dependencies they require. Once all of the dependencies are loaded and their symbols have been bound - the runtime linker will fire(调用) the .init sections for each shared object loaded and finally transfer control to the executable, which calls main(). Most people think a process starts with main() but amazing things happen before we even get there.
注:
1. 在进程在执行期间,ELF文件加载被载入内存的同时,运行时链接器(ld.so.1)也被映射入内存。
2. Kernel最初将控制转给运行时链接器,运行时链接器的工作就是检查程序所依赖的共享库,并完成这些共享库的内存映射并且完成符号绑定。
3. 一旦所有的依赖被载入内存,并且它们的符号被绑定,运行时链接器将调用每一个共享库的.init section并把控制转给可执行文件,调用main函数。
以上所有过程,Linux和Solaris是相似的。
Here we will specifically look at how the runtime linker binds the various symbol reference between all of the objects loaded into memory. Let's take a simple example first - how about a application which links against a couple of shared objects and then libc.
% more *.c
::::::::::::::
bar.c
::::::::::::::
#include
void bar()
{
printf("inside of bar\n");
}
::::::::::::::
foo.c
::::::::::::::
#include
void foo() {
printf("inside of foo\n");
}
::::::::::::::
prog.c
::::::::::::::
#include
int
main(int argc, char *argv[]){
extern void foo();
extern void bar();
foo();
bar();
return (0);
}
% cc -G -o foo.so -Kpic foo.c -lc
% cc -G -o bar.so -Kpic bar.c -lc
% cc -o prog prog.c ./foo.so ./bar.so
We've now got a program, prog, which is bound against three shared objects, foo.so, bar.so and libc.so. The program makes two function calls, one to foo() and one to bar() located in it's dependent shared objects, by ldd'ing the executable we can see it's dependencies and a run of it shows the execution flow: % ldd prog
./foo.so => ./foo.so
./bar.so => ./bar.so
libc.so.1 => /lib/libc.so.1
libm.so.2 => /lib/libm.so.2
/platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1
% ./prog
inside of foo
inside of bar
%
注:
4. ld生成ELF格式文件时,已经将对所有库的依赖关系都存入ELF文件里,可以用elfdump来查看任意ELF文件的.dynamic和. SUNW_version section。
5. ldd可以读出ELF文件的依赖关系,并输出出来。
Nothing too fancy(奇特的) really - but it's an example we can use to examine what bindings are going on. First - when the program prog makes reference to foo and bar - it's up to the runtime linker to find definitions for these functions and bind the program to them. First the runtime linker will load in the dependent shared objects (listed above) - as the objects are loaded into memory we create a Link Map entry for each object, the objects are appended onto a Link Map list in the order that they are loaded. In the case above the Link Map list would contain: prog -> foo.so -> bar.so ->libc.so.1 -> libm.so.2 -> libc_psr.so.1
注:
6. 运行时链接器载入依赖的共享库,并且创建了一个叫做linkmap的数据结构。这里面是把linkmap简化后的表示,就是一个线性表。Linux也有类似的linkmap。
When the runtime linker needs to find a definition for a symbol it starts at the head of the list and will search each object for that symbol. If it's found, it binds to that symbol - if it's not found it proceeds to the next object on the list. The following should help demonstrate(示范) what's happening. I will run the prog program, but with some runtime linker diagnostics(诊断) turned on to trace what it is doing. I'm concentrating(集中注意力) specifically on foo and bar for this example - of course there are thousands of other bindings going on: % LD_DEBUG=symbols,bindings ./prog
...
20579: 1: symbol=foo; lookup in file=./prog [ ELF ]
20579: 1: symbol=foo; lookup in file=./foo.so [ ELF ]
20579: 1: binding file=./prog to file=./foo.so: symbol `foo'
...
20579: 1: symbol=bar; lookup in file=./prog [ ELF ]
20579: 1: symbol=bar; lookup in file=./foo.so [ ELF ]
20579: 1: symbol=bar; lookup in file=./bar.so [ ELF ]
20579: 1: binding file=./prog to file=./bar.so: symbol `bar'
...
注:
7. 符号绑定就是在linkmap顺序查找,找到存在指定符号的二进制对象,并与该符号进行绑定。整个过程就是查找线性表的过程。
Not so bad really, but it's really not the most efficient way to find a symbol is it. When we were looking for the symbol bar we had to go through 3 objects until we found it. Now imagine what happens when you have a more complex application which has many more shared objects with much larger symbol tables. If I look at firefox - I can see that has over 50 shared objects loaded: % pldd `pgrep firefox-bin`
28294: /disk3/local/firefox/firefox-bin /lib/libpthread.so.1
/lib/libthread.so.1
/lib/libc.so.1
/disk3/local/firefox/libmozjs.so
/disk3/local/firefox/libxpcom.so
/usr/sfw/lib/libgtk-1.2.so.0.9.1
/usr/sfw/lib/libgmodule-1.2.so.0.0.10
/usr/sfw/lib/libglib-1.2.so.0.0.10
/usr/openwin/lib/libXext.so.0
/usr/openwin/lib/libX11.so.4
/lib/libsocket.so.1
/lib/libnsl.so.1
/lib/libm.so.2
/usr/sfw/lib/libgdk-1.2.so.0.9.1
/disk3/local/firefox/libssl3.so
/disk3/local/firefox/libnss3.so
/disk3/local/firefox/libplc4.so
/disk3/local/firefox/libplds4.so
/disk3/local/firefox/libnspr4.so
/disk3/local/firefox/libsoftokn3.so
/lib/librt.so.1
/lib/libdl.so.1
/lib/libaio.so.1
/lib/libmd5.so.1
/usr/openwin/lib/libXt.so.4
/platform/sun4u-us3/lib/libc_psr.so.1
/usr/lib/libCrun.so.1
/usr/lib/libdemangle.so.1
/disk3/local/firefox/cpu/sparcv8plus/libnspr_flt4.so
/lib/libm.so.1
/disk3/local/firefox/libsmime3.so
/usr/openwin/lib/libXp.so.1
/disk3/local/firefox/libxpcom_compat.so
/usr/lib/libCstd.so.1
/usr/lib/cpu/sparcv8plus/libCstd_isa.so.1
/lib/libw.so.1
/lib/libmp.so.2
/lib/libscf.so.1
/lib/libuutil.so.1
/usr/openwin/lib/libSM.so.6
/usr/openwin/lib/libICE.so.6
/usr/lib/iconv/646%UTF-16BE.so
/usr/lib/iconv/UTF-16BE%646.so
/usr/jdk/instances/jdk1.5.0/jre/plugin/sparc/ns7/libjavaplugin_oji.so
/platform/sun4u/lib/libmd5_psr.so.1
/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/libjavaplugin_nscp.so
/disk3/local/firefox/components/libjar50.so
/usr/dt/lib/libXm.so.4
/disk3/local/firefox/libfreebl_hybrid_3.so
/usr/sfw/lib/mozilla/libnssckbi.so
%
And on average - each of those objects have symbol tables with over 2,500 symbols. Doing a linear(线性的) search at the beginning of each link-map list until you find the symbol just doesn't seem that practical anymore. Firefox is average for modern applications these days - if you were to take a look at Star Office you would find a single program which depends upon over 90 different shared objects.
注:
8. 不难想到,当程序的依赖库比较多,每个二进制对象的符号也很多的时候,这种线性表的查找将增加很多开销。
There's got to be a better way, right? There is - we call it direct bindings(直接绑定). Instead of doing the linear search at runtime you can simply ask the link-editor to record not only what shared objects you bound against - but what symbols you obtained from each shared object. So, if you are bound with Direct Bindings, the runtime linker changes how it looks up symbol bindings and instead will bind directly to the object that offered the symbol at runtime. A much more efficient model - here's the same prog, but this time built with direct bindings, this is done by passing the -Bdirect link-editor option on the link-line:
% cc -Bdirect -o prog prog.c ./foo.so ./bar.so
注:
9. 因此,有了直接绑定。直接绑定不是在运行时来做的,而是在程序链接阶段来做的,可以通过指定链接参数来完成。
When you link with -Bdirect the link-editor will store additional information in a object including where each symbol was seen at link time. This can be viewed with elfdump as follows: % elfdump -y prog
Syminfo Section: .SUNW_syminfo
index flgs bound to symbol
...
[15] DBL [1] ./foo.so foo
[19] DBL [3] ./bar.so bar
...
%
注:
10. 进行直接绑定后,ld生成的ELF文件多了一个section叫.SUNW_syminfo,可以用elfdump来查看。
11. .SUNW_syminfo里记录了所有符号和ELF文件依赖的二进制对象的绑定关系。
If we do the same experiment we did earlier, that of running the program and examining the actual bindings that the runtime linker is doing - we will see a much more efficient search: % LD_DEBUG=symbols,bindings ./prog
...
20728: 1: symbol=foo; lookup in file=./foo.so [ ELF ]
20728: 1: binding file=./prog to file=./foo.so: symbol `foo'
...
20728: 1: symbol=bar; lookup in file=./bar.so [ ELF ]
20728: 1: binding file=./prog to file=./bar.so: symbol `bar'
...
%
Notice we now find each symbol in the first object we look in, much better.
注:
12. 可以看到,做了直接绑定后,运行时的符号绑定就不需要做linkmap的线性查找了。
This Direct Bindings has been in Solaris for a few releases now, although because it's not the default not everyone is familiar with it. It has matured quite a bit over the last few years and we are now starting to use it for some of our core shared objects. If you look at the X11 shared objects delivered with Solaris - you'll find that they are bound with direct bindings:
% elfdump -y /usr/lib/libX11.so | head
Syminfo Section: .SUNW_syminfo
index flgs bound to symbol
[1] D <self> _XimXTransDisconnect
[2] D [8] libc.so.1 snprintf
[3] D <self> _XcmsFreeIntensityMaps
[4] D <self> _XcmsTableSearch
[5] D <self> _XDeq
[6] D <self> XGetWMSizeHints
[7] D <self> XUnmapWindow
%
注:
13. 直接绑定在Solaris不是默认的链接选项,用户链接程序需要显式的指定,但最新的Solaris的很多命令和程序已经使用了它来提高效率。
Besides the fact that Direct Bindings are more efficient, they are also much more precise(精确的). It can get very tricky(聪明的) to control the name space when you start to combine all of the shared objects that you see in new modern applications. If two shared objects happen to offer a symbol of the same name (not by intention) using the default binding lookup - we'll bind to the first one found, which is probably not what the user intends. If - however we bind to exactly the version that was found at the time the object was built, there will be many fewer surprises.
注:
14. 直接绑定更高效之外,也更精确。如果同一符号存在于不同对象,默认的绑定方式则会绑定到linkmap里第一个匹配的对象。而直接绑定则不会。
Along these lines - it's worth giving a cautionary(警告) note for those re-linking their existing Applications with Direct Bindings enabled. As we apply Direct Bindings to more and more applications we have found a few cases where there are multiple definitions of a single symbol, by changing the binding model you can change the behavior of the application. In most, if not all cases, this was a bug in the design of the application - but a program can become dependent upon this and result in a failure of the application when run with Direct Bindings.
注:
15. 直接绑定在一些出现同一个符号多处定义的程序上会导致运行错误,多数情况下是应用程序设计的错误。
Further details on Direct Bindings specifically and the runtime linker (ld.so.1(1)) and link-editor (ld(1)) in general can be found in the Linker and Libraries Guide which is part of the standard Solaris Documentation.
Examples of tracing what the runtime linker is doing can found in a Blog entry by Rod here titled Tracing a link-edit.
注:
16. Linker and Libraries Guide这本书里面讲述了运行时链接器(ld.so.1(1)) 和链接器(ld(1)) 的基本概念。
17. 本篇文章使用LD_DEBUG来跟踪运行时链接器,这种方式在Rod的文章Tracing a link-edit可以了解到。
Technorati Tag: OpenSolaris
Technorati Tag: Solaris