elf是Linux系统下最通用的可执行程序的文件格式,关于elf文件的加载与动态链接已很多的相关资料,而本文尝试着重介绍一下其中的一个点,即辅助向量(Auxiliary Vector)。
当一个elf文件被加载并作为一个进程开始执行之前,加载器会把相关信息传递给它,这其中除了用户设置的启动参数以外,还有系统环境的信息(即通常所说的环境变量)以及本文的主角:辅助向量。
对于用户参数和环境变量及其它们各自的作用,我们并不陌生,而且还很经常的使用,特别是用户参数,这无需多说,但对于辅助向量,却不明就里。事实上,辅助向量另外一种由内核向应用程序传递信息的方式。
辅助向量的存储位置与用户参数、环境变量类似,同样也存放在栈空间上,大致的布局结构如下:
position content size (bytes) + comment
(0xc0000000) < bottom of stack > 0 (virtual)
(0xbffffffc) [ end marker ] 4 (= NULL)
[ environment ASCIIZ str. ] >= 0
[ argument ASCIIZ strings ] >= 0
[ padding ] 0 - 16
[ auxv[term] (Elf32_auxv_t) ] 8 (= AT_NULL vector)
[ auxv[…] (Elf32_auxv_t) ] 8
[ auxv[1] (Elf32_auxv_t) ] 8
[ auxv[0] (Elf32_auxv_t) ] 8
[ envp[term] (pointer) ] 4 (= NULL)
[ envp[…] (pointer) ] 4
[ envp[1] (pointer) ] 4
[ envp[0] (pointer) ] 4
[ argv[n] (pointer) ] 4 (= NULL)
[ argv[n - 1] (pointer) ] 4
[ argv[…] (pointer) ] 4 * x
[ argv[1] (pointer) ] 4
[ argv[0] (pointer) ] 4 (program name)
stack pointer → [ argc = number of args ] 4
上面内容主要来之参考1,但根据Linux栈的特点(即倒序满栈)做了一下调整,另外,显示的虽然是32位系统地址情况,但64系统与此一致。
以32位系统为例,下的布局验证:
[root@lenky auxv]# uname -a
Linux lenky 2.6.30 #2 SMP Tue Sep 21 17:19:57 CST 2010 i686 i686 i386 GNU/Linux
[root@lenky auxv]# cat main.c
/**
- filename: main.c
*/
#include <stdio.h>
int main(int argc, char *argv)
{
printf(“argc:%d, argv[0]:%s\n”, argc, argv[0]);
return 0;
}
[root@lenky auxv]# gcc -O0 -g main.c -o main
[root@lenky auxv]# echo 0 > /proc/sys/kernel/randomize_va_space
[root@lenky auxv]# gdb ./main -q
(gdb) b main
Breakpoint 1 at 0x8048395: file main.c, line 8.
(gdb) r a b c d
Starting program: /home/work/auxv/main a b c d
Breakpoint 1, main (argc=5, argv=0xbffffab4) at main.c:8
8 printf(“argc:%d, argv[0]:%s\n”, argc, argv[0]);
(gdb) info reg esp
esp 0xbffffa00 0xbffffa00
(gdb) p &argc
$1 = (int *) 0xbffffa30
(gdb) dump memory /tmp/main.data 0xbffffa30 0xc0000000
(gdb) q
The program is running. Exit anyway? (y or n) y
[root@lenky auxv]#
这是一个很简单的测试程序,断点下在main函数处,gdb跟进来断下后(注意:在此之前把ASLR关掉,以便我们关注的逻辑更加清晰),esp寄存器的值为0xbffffa00,而参数argc的地址为0xbffffa30,这和前面给出的布局结构有一点差异,原因是在C语言中,参数是由函数调用者(即__libc_start_main)压入栈的,因此当执行到main函数后,栈已经加入了其它数据,比如保存的返回地址,main函数内需要的局部空间等,因此此时的栈指针esp寄存器要更小。
看看我们宕取出来的内存数据:
[root@lenky auxv]# hexdump -C /tmp/main.data
00000000 05 00 00 00 b4 fa ff bf cc fa ff bf 10 48 16 00 |…H…|
00000010 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 |…|
00000020 f4 cf 2a 00 a0 3c 16 00 00 00 00 00 88 fa ff bf |…*…<…|
00000030 b8 6d a4 83 e9 89 43 3c 00 00 00 00 00 00 00 00 |.m…C<…|
00000040 00 00 00 00 b0 c4 15 00 cd 1d 18 00 c0 3f 16 00 |…?..|
00000050 05 00 00 00 b0 82 04 08 00 00 00 00 d1 82 04 08 |…|
00000060 84 83 04 08 05 00 00 00 b4 fa ff bf d0 83 04 08 |…|
00000070 c0 83 04 08 c0 75 15 00 ac fa ff bf eb ff 15 00 |…u…|
00000080 05 00 00 00 f6 fb ff bf 0b fc ff bf 0d fc ff bf |…|
00000090 0f fc ff bf 11 fc ff bf 00 00 00 00 13 fc ff bf |…|
000000a0 22 fc ff bf 32 fc ff bf 3d fc ff bf 4b fc ff bf |"…2…=…K…|
000000b0 6d fc ff bf 80 fc ff bf 8a fc ff bf 4d fe ff bf |m…M…|
000000c0 58 fe ff bf c9 fe ff bf e3 fe ff bf f2 fe ff bf |X…|
000000d0 0d ff ff bf 21 ff ff bf 36 ff ff bf 47 ff ff bf |…!..6…G…|
000000e0 50 ff ff bf 5b ff ff bf 63 ff ff bf 70 ff ff bf |P…[…c…p…|
000000f0 7c ff ff bf b0 ff ff bf d2 ff ff bf 00 00 00 00 ||…|
00000100 20 00 00 00 14 f4 ff b7 21 00 00 00 00 f0 ff b7 | …!..|
00000110 10 00 00 00 ff fb eb 0f 06 00 00 00 00 10 00 00 |…|
00000120 11 00 00 00 64 00 00 00 03 00 00 00 34 80 04 08 |…d…4…|
00000130 04 00 00 00 20 00 00 00 05 00 00 00 07 00 00 00 |… …|
00000140 07 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |…|
00000150 09 00 00 00 b0 82 04 08 0b 00 00 00 00 00 00 00 |…|
00000160 0c 00 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 |…|
00000170 0e 00 00 00 00 00 00 00 17 00 00 00 00 00 00 00 |…|
00000180 19 00 00 00 db fb ff bf 1f 00 00 00 e7 ff ff bf |…|
00000190 0f 00 00 00 eb fb ff bf 00 00 00 00 00 00 00 00 |…|
000001a0 00 00 00 00 00 00 00 00 00 00 00 d8 28 62 8a b0 |…(b…|
000001b0 f6 f3 b2 fe 7b ae e7 aa 49 02 e2 69 36 38 36 00 |…{…I…i686.|
000001c0 00 00 00 00 00 00 2f 68 6f 6d 65 2f 77 6f 72 6b |…/home/work|
000001d0 2f 61 75 78 76 2f 6d 61 69 6e 00 61 00 62 00 63 |/auxv/main.a.b.c|
000001e0 00 64 00 48 4f 53 54 4e 41 4d 45 3d 6c 65 6e 6b |.d.HOSTNAME=lenk|
000001f0 79 00 53 48 45 4c 4c 3d 2f 62 69 6e 2f 62 61 73 |y.SHELL=/bin/bas|
00000200 68 00 54 45 52 4d 3d 78 74 65 72 6d 00 48 49 53 |h.TERM=xterm.HIS|
00000210 54 53 49 5a 45 3d 31 30 30 30 00 53 53 48 5f 43 |TSIZE=1000.SSH_C|
…
000005a0 73 00 47 5f 42 52 4f 4b 45 4e 5f 46 49 4c 45 4e |s.G_BROKEN_FILEN|
000005b0 41 4d 45 53 3d 31 00 2f 68 6f 6d 65 2f 77 6f 72 |AMES=1./home/wor|
000005c0 6b 2f 61 75 78 76 2f 6d 61 69 6e 00 00 00 00 00 |k/auxv/main…|
000005d0
[root@lenky auxv]#
注意,我们宕取的内存是从参数argc所在地址开始的,因此前面4个字节(一个int类型数据,小端模式):
05 00 00 00
也就是argc的值,数值5,符合“r a b c d”实际情况,即加上表示执行程序文件名的第0个参数,一共有5个用户参数。
接下来:
b4 fa ff bf
为main函数的第二个参数argv的值,这是一个数组的指针,在C语言中,也就是一个二级指针(当然,这只是一种粗略的说法),它在上面宕取内存中的相对偏移为:
0xbffffab4 – 0xbffffa30 = 0×84
即内容值为:
f6 fb ff bf
即:argv[0]是一个char *类型,所以其具体值由0xbffffbf6指定:
0xbffffbf6 – 0xbffffa30 = 0x1c6
计算偏移后,从宕取的内存来看,结果如下:
000001c6 “/home/work/auxv/main”
那么,其它的argv[1]、argv[2]、argv[3]、argv[4]分别为0xbffffab8、0xbffffabc等。
用户参数数组以NULL结束,再之后就是环境变量的参数,比如:
[root@lenky auxv]# gdb ./main -q
(gdb) b main
Breakpoint 1 at 0x8048395: file main.c, line 8.
(gdb) r
Starting program: /home/work/auxv/main
Breakpoint 1, main (argc=1, argv=0xbffffac4) at main.c:8
8 printf(“argc:%d, argv[0]:%s\n”, argc, argv[0]);
(gdb) p argc
$1 = 1
(gdb) p argv[0]
$2 = 0xbffffbfe “/home/work/auxv/main”
(gdb) p argv[1]
$3 = 0x0
(gdb) p argv[2]
$4 = 0xbffffc13 “HOSTNAME=lenky”
(gdb) p argv[3]
$5 = 0xbffffc22 “SHELL=/bin/bash”
(gdb) q
The program is running. Exit anyway? (y or n) y
[root@lenky auxv]#
而事实上,main函数的定义还可以是这样:
[root@lenky auxv]# cat env.c
/**
- filename: env.c
*/
#include <stdio.h>
int main(int argc, char *argv, char *envp)
{
printf(“argc:%d, argv[0]:%s, envp[0]:%s\n”, argc, argv[0], envp[0]);
return 0;
}
[root@lenky auxv]# gcc env.c -o env
[root@lenky auxv]# ./env
argc:1, argv[0]:./env, envp[0]:HOSTNAME=lenky
[root@lenky auxv]#
但我们平常在定义main函数时,为什么不带第三个参数envp也可以正常工作?原因在于C语言的参数是有调用者压入栈的,被调用者用或者不用它,用两个还是用三个参数,都没有关系,因此如下这些情况的原型声明都不影响程序正常执行:
int main();
int main(int argc, char *argv);
int main(int argc, char *argv, char *envp);
看汇编代码实例:
[root@lenky auxv]# gdb ./main -q
(gdb) b main
Breakpoint 1 at 0x8048395: file main.c, line 8.
(gdb) r
Starting program: /home/work/auxv/main
Breakpoint 1, main (argc=1, argv=0xbffffac4) at main.c:8
8 printf(“argc:%d, argv[0]:%s\n”, argc, argv[0]);
(gdb)
这是2个参数的情况,当前是在main函数内,我们来看看是由谁调入进来的:
(gdb) info reg ebp
ebp 0xbffffa28 0xbffffa28
(gdb) x/i *(0xbffffa28+4)
0x181e9c <__libc_start_main+220>: mov %eax,(%esp)
如果大家熟悉C函数调用栈帧,那么知道寄存器ebp的值再加上4所指向的地址空间里存储的是返回地址,所以用x命令反编译它,可以看到调用者为函数__libc_start_main。
再看看传了几个参数:
(gdb) x/10i *(0xbffffa28+4)-32
0x181e7c <__libc_start_main+188>: pusha
0x181e7d <__libc_start_main+189>: add %al,(%eax)
0x181e7f <__libc_start_main+191>: add %cl,-0x4b7d(%ebx)
0x181e85 <__libc_start_main+197>: decl 0x8b0c55(%ebx)
0x181e8b <__libc_start_main+203>: mov %edx,(%esp)
0x181e8e <__libc_start_main+206>: mov %eax,0x8(%esp)
0x181e92 <__libc_start_main+210>: mov 0x10(%ebp),%eax
0x181e95 <__libc_start_main+213>: mov %eax,0x4(%esp)
0x181e99 <__libc_start_main+217>: call *0x8(%ebp)
0x181e9c <__libc_start_main+220>: mov %eax,(%esp)
0x181e99行(就暂称为行吧)即为调入main的入口,下一行,即0x181e9c也就是返回地址,那么前面几行就是传入的参数,可以看到有3个mov指令到不同的栈空间,即esp,esp+4,esp+8。
从glibc的源代码,也可以看到相关部分:
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char *__unbounded *__unbounded ubp_av,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *__unbounded auxvec,
#endif
__typeof (main) init,
void (*fini) (void),
void (*rtld_fini) (void), void *__unbounded stack_end)
{
LIBC_START_MAIN也就是__libc_start_main,然后它有个函数指针参数为main,注意main的原型,重要的是宏MAIN_AUXVEC_DECL:
#ifdef MAIN_AUXVEC_ARG
/* main gets passed a pointer to the auxiliary. */
define MAIN_AUXVEC_DECL , void *
define MAIN_AUXVEC_PARAM , auxvec
#else
define MAIN_AUXVEC_DECL
define MAIN_AUXVEC_PARAM
#endif
main函数最少有3个参数,另外根据MAIN_AUXVEC_DECL宏是否打开,可能还有一个代表auxv的参数,但在我的glibc里,该宏貌似没有被打开,所以我系统上的main函数的真实原型声明应该为:
int main(int argc, char *argv, char *envp);
没有参数指定环境变量数组的元素个数,但它同样是以NULL结束,所以可以通过遍历envp,一直到auxv,看实例:
[root@lenky auxv]# cat auxv.c
/**
- filename: auxv.c
*/
#include <stdio.h>
#include <elf.h>
int main(int argc, char *argv, char *envp)
{
Elf32_auxv_t *auxv;
/*from stack diagram above: envp = NULL marks end of envp/
while(*envp++ != NULL);
/* auxv->a_type = AT_NULL marks the end of auxv */
for (auxv = (Elf32_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++)
{
if( auxv->a_type == AT_SYSINFO)
printf(“AT_SYSINFO is: 0x%x\n”, auxv->a_un.a_val);
}
}
[root@lenky auxv]# gcc auxv.c -g -o auxv
[root@lenky auxv]# ./auxv
AT_SYSINFO is: 0xb7fff414
[root@lenky auxv]# gdb -q ./auxv
(gdb) b 18
Breakpoint 1 at 0x80483e8: file auxv.c, line 18.
(gdb) r
Starting program: /home/work/auxv/auxv
AT_SYSINFO is: 0xb7fff414
Breakpoint 1, main (argc=1, argv=0xbffffac4, envp=0xbffffb30) at auxv.c:19
warning: Source file is more recent than executable.
19 }
(gdb) info auxv
32 AT_SYSINFO Special system info/entry points 0xb7fff414
33 AT_SYSINFO_EHDR System-supplied DSO’s ELF header 0xb7fff000
16 AT_HWCAP Machine-dependent CPU capability hints 0xfebfbff
6 AT_PAGESZ System page size 4096
17 AT_CLKTCK Frequency of times() 100
3 AT_PHDR Program headers for program 0x8048034
4 AT_PHENT Size of program header entry 32
5 AT_PHNUM Number of program headers 7
7 AT_BASE Base address of interpreter 0x0
8 AT_FLAGS Flags 0x0
9 AT_ENTRY Entry point of program 0x80482b0
11 AT_UID Real user ID 0
12 AT_EUID Effective user ID 0
13 AT_GID Real group ID 0
14 AT_EGID Effective group ID 0
23 AT_SECURE Boolean, was exec setuid-like? 0
25 ??? 0xbffffbdb
31 ??? 0xbfffffe7
15 AT_PLATFORM String identifying platform 0xbffffbeb “i686”
0 AT_NULL End of vector 0x0
(gdb)
从上面各种内幕分析(就不一一对照宕取内存数据和main参数的匹配性了),可以看到,最前面给出的布局是正确的。auxv传递给应用程序的信息各种各样,如上面gdb里显示的那样,比如有vsyscall入口地址、真实uid、真实gid等等,还可以这样显示:
[root@lenky auxv]# LD_SHOW_AUXV=1 ./auxv
AT_SYSINFO: 0xb7fff414
AT_SYSINFO_EHDR: 0xb7fff000
AT_HWCAP: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x8048034
AT_PHENT: 32
AT_PHNUM: 7
AT_BASE: 0x0
AT_FLAGS: 0x0
AT_ENTRY: 0x80482b0
AT_UID: 0
AT_EUID: 0
AT_GID: 0
AT_EGID: 0
AT_SECURE: 0
AT_??? (0x19): 0xbffffbeb
AT_??? (0x1f): 0xbffffff5
AT_PLATFORM: i686
AT_SYSINFO is: 0xb7fff414
[root@lenky auxv]#
64位系统情况:
[root@localhost auxv]# uname -a
Linux localhost.localdomain 3.7.0 #1 SMP Wed Jan 9 04:46:12 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost auxv]# cat /etc/issue
CentOS Linux release 6.0 (Final)
Kernel \r on an \m
[root@localhost auxv]# cat auxv64.c
/**
- filename: auxv.c
*/
#include <stdio.h>
#include <elf.h>
int main(int argc, char *argv, char *envp)
{
Elf64_auxv_t *auxv;
/*from stack diagram above: envp = NULL marks end of envp/
while(*envp++ != NULL);
/* auxv->a_type = AT_NULL marks the end of auxv */
for (auxv = (Elf64_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++)
{
if( auxv->a_type == AT_SYSINFO_EHDR)
printf(“AT_SYSINFO_EHDR is: 0x%p\n”, auxv->a_un.a_val);
}
}
[root@localhost auxv]# gcc auxv64.c -o auxv64
[root@localhost auxv]# LD_SHOW_AUXV=1 ./auxv64
AT_SYSINFO_EHDR: 0x7fff89b11000
AT_HWCAP: febfbff
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x400040
AT_PHENT: 56
AT_PHNUM: 8
AT_BASE: 0x0
AT_FLAGS: 0x0
AT_ENTRY: 0x4003e0
AT_UID: 0
AT_EUID: 0
AT_GID: 0
AT_EGID: 0
AT_SECURE: 0
AT_RANDOM: 0x7fff89a07ca9
AT_EXECFN: ./auxv64
AT_PLATFORM: x86_64
AT_SYSINFO_EHDR is: 0x0x7fff89b11000
[root@localhost auxv]#
参考:
http://articles.manugarg.com/aboutelfauxiliaryvectors.html
http://lwn.net/Articles/519085/
http://www.gnu.org/software/libc/manual/html_node/Auxiliary-Vector.html
http://www.win.tue.nl/~aeb/linux/hh/hh-14.html
http://articles.manugarg.com/systemcallinlinux2_6.html