在 2017年06月03日 04:56, Jean Wolter 写道:
Hello Leslie,
On 02/06/17 11:33, Jean Wolter wrote:
I replaced my "ned" binary with yours and the system boots up fine.
Now I wonder how I could reproduce your problem. It looks like I
- either need all binaries or
- need to build them the same way you are building them. I used a fresh checkout from the svn repositories and use gcc version 4.9.2 (Debian 4.9.2-10) to build the components.
Since I neither have your binaries nor know how you build them I thought about how you could diagnose the problem further. To demonstrate how to do this I intentionally added a null pointer dereference to ned and would like to discuss how I would diagnose this.
--- ned/server/src/main.cc (revision 72) +++ ned/server/src/main.cc (working copy) @@ -35,6 +35,8 @@ Dbg::set_level(Dbg::Warn); info.printf("Hello from Ned\n"); + *(int *)0 = 0; + boot_info.printf("cmdline: "); for (int i = 0; i < argc; ++i) boot_info.cprintf("%s ", argv[i]);
Add -serial_esc -wait to kernel options in conf/modules.list
--- modules.list (revision 72) +++ modules.list (working copy) @@ -80,6 +80,7 @@ module libuc_c.so
entry framebuffer-example-x86 +kernel fiasco -serial_esc -wait roottask moe rom/x86-fb.cfg module x86-fb.cfg module l4re
If you boot this the kernel will enter the kernel debugger before doing anything else. Enter the following commands:
- P+ /* show every pagefault before forwarding it to the pager */
- Prx0<space>100<space> /* restrict pagefault logging to pagefaults between [0-100] */.
- g /* go */
The kernel should stop when the access to 0x18 happens, then you can enter the kernel debugger using 'i' and can check who is responsible and maybe get a correct instruction pointer. If I do this here it looks like follows (I added -serial stdio to the qemu options):
qemu-system-x86_64 -kernel /home/jw5/build/tmp/l4re/bin/amd64_K8/bootstrap -append "bootstrap -modaddr 0x01100000" -serial stdio -initrd "/home/jw5/build/tmp/fiasco//fiasco -serial_esc -wait,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/sigma0 ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.cfg ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/l4re ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ned ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/io ,/home/jw5/src/l4resvn/src/l4/pkg/io/io/config//x86-legacy.devs ,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.io ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/fb-drv ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/mag ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ex_fb_spectrum_cc " ... Freeing init code/data: 49152 bytes (12 pages)
Calibrating timer loop... done.
CPU 0 [fffffffff003df99]: Wait
jdb: P+ PF logging enabled jdb: Pr restrict to addr in [0-100] /* pressed Prx0<space>100<space> */ PF logging enabled, restricted to 0000000000000000 <= pfa <= 0000000000000100 jdb: g /* g does no show up in output */ ... MOE: cmdline: /home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg MOE: Starting: rom/ned rom/x86-fb.cfg MOE: loading 'rom/ned' pf: 0022 pfa=0000000000000000 ip=00000000010038a2 (w-) spc=0xffffffff807c3dd8 /* press i */
CPU 0 [fffffffff0062116]: LOG
jdb:
Now you can use the kernel debugger to inspect the current state of the system. Here I simply use addr2line to find the error:
addr2line -p -i -e bin/amd64_K8/l4f/ned -a 10038a2 0x00000000010038a2: /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:38 (inlined by) /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:77
Line 38 is the line with the null pointer dereference.
Thanks for your help! worked :) addr2line -p -i -e bin/amd64_K8/l4f/ned -a 100398d 0x000000000100398d: /home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:38 (inlined by) /home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:77 and I need to carefully read JDB manual.
I hope this helps, Jean
instead of the on purpose null ptr dereference patch, I just use printf to debug: I have no idea how to set breakpoint for jdb by running l4re/fiasco qemu.
Index: pkg/l4re-core/ned/server/src/main.cc =================================================================== --- pkg/l4re-core/ned/server/src/main.cc (revision 72) +++ pkg/l4re-core/ned/server/src/main.cc (working copy) @@ -57,9 +57,11 @@ Ned::Server svr; Ned::server = &svr;
+ printf("DEBUG: %s, %s, line %d\n", __FILE__, __PRETTY_FUNCTION__, __LINE__);
lua(argc, argv);
+ printf("DEBUG: %s, %s, line %d\n", __FILE__, __PRETTY_FUNCTION__, __LINE__);
while (1) l4_sleep_forever(); Index: pkg/l4re-core/lua/lib/contrib/src/lauxlib.c =================================================================== --- pkg/l4re-core/lua/lib/contrib/src/lauxlib.c (revision 72) +++ pkg/l4re-core/lua/lib/contrib/src/lauxlib.c (working copy) @@ -968,7 +968,9 @@ lua_pop(L, 1); /* remove field */ lua_pushcfunction(L, openf); lua_pushstring(L, modname); /* argument to open function */ + printf("DEBUG: %s, %s, line %d: %s\n", __FILE__, __func__, __LINE__, modname); lua_call(L, 1, 1); /* call 'openf' to open module */ + printf("DEBUG: %s, %s, line %d: %s\n", __FILE__, __func__, __LINE__, modname); lua_pushvalue(L, -1); /* make copy of module (call result) */ lua_setfield(L, -3, modname); /* _LOADED[modname] = module */ }
perhaps lua_call -> lua_callk -> luaD_callnoyield -> luaD_call -> luaD_precall is not able to work https://pbs.twimg.com/media/DBiZeRPU0AEs6xo.png
but I did not find out the root cause whether or not luaopen_package func lead such issue:
61 static const luaL_Reg libs[] = 62 { 63 { "_G", luaopen_base }, 64 {LUA_LOADLIBNAME, luaopen_package},
...
104 int lua(int argc, char const *const *argv) 105 { 106 printf("Ned says: Hi World!\n"); 107 108 bool interactive = false; 109 bool noexit = false; 110 111 if (argc < 2) 112 interactive = true; 113 114 lua_State *L; 115 L = luaL_newstate(); 116 117 if (!L) 118 return 1; 119 120 for (int i = 0; libs[i].func; ++i) 121 { 122 luaL_requiref(L, libs[i].name, libs[i].func, 1);
...
because luaL_requiref -> lua_call, when name is LUA_LOADLIBNAME (package) and func is luaopen_package.
在 2017年06月05日 12:00, Leslie Zhai 写道:
在 2017年06月03日 04:56, Jean Wolter 写道:
Hello Leslie,
On 02/06/17 11:33, Jean Wolter wrote:
I replaced my "ned" binary with yours and the system boots up fine.
Now I wonder how I could reproduce your problem. It looks like I
- either need all binaries or
- need to build them the same way you are building them. I used a fresh checkout from the svn repositories and use gcc version 4.9.2 (Debian 4.9.2-10) to build the components.
Since I neither have your binaries nor know how you build them I thought about how you could diagnose the problem further. To demonstrate how to do this I intentionally added a null pointer dereference to ned and would like to discuss how I would diagnose this.
--- ned/server/src/main.cc (revision 72) +++ ned/server/src/main.cc (working copy) @@ -35,6 +35,8 @@ Dbg::set_level(Dbg::Warn); info.printf("Hello from Ned\n"); + *(int *)0 = 0; + boot_info.printf("cmdline: "); for (int i = 0; i < argc; ++i) boot_info.cprintf("%s ", argv[i]);
Add -serial_esc -wait to kernel options in conf/modules.list
--- modules.list (revision 72) +++ modules.list (working copy) @@ -80,6 +80,7 @@ module libuc_c.so
entry framebuffer-example-x86 +kernel fiasco -serial_esc -wait roottask moe rom/x86-fb.cfg module x86-fb.cfg module l4re
If you boot this the kernel will enter the kernel debugger before doing anything else. Enter the following commands:
- P+ /* show every pagefault before forwarding it to the pager */
- Prx0<space>100<space> /* restrict pagefault logging to pagefaults between [0-100] */.
- g /* go */
The kernel should stop when the access to 0x18 happens, then you can enter the kernel debugger using 'i' and can check who is responsible and maybe get a correct instruction pointer. If I do this here it looks like follows (I added -serial stdio to the qemu options):
qemu-system-x86_64 -kernel /home/jw5/build/tmp/l4re/bin/amd64_K8/bootstrap -append "bootstrap -modaddr 0x01100000" -serial stdio -initrd "/home/jw5/build/tmp/fiasco//fiasco -serial_esc -wait,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/sigma0 ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.cfg ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/l4re ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ned ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/io ,/home/jw5/src/l4resvn/src/l4/pkg/io/io/config//x86-legacy.devs ,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.io ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/fb-drv ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/mag ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ex_fb_spectrum_cc " ... Freeing init code/data: 49152 bytes (12 pages)
Calibrating timer loop... done.
CPU 0 [fffffffff003df99]: Wait
jdb: P+ PF logging enabled jdb: Pr restrict to addr in [0-100] /* pressed Prx0<space>100<space> */ PF logging enabled, restricted to 0000000000000000 <= pfa <= 0000000000000100 jdb: g /* g does no show up in output */ ... MOE: cmdline: /home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg MOE: Starting: rom/ned rom/x86-fb.cfg MOE: loading 'rom/ned' pf: 0022 pfa=0000000000000000 ip=00000000010038a2 (w-) spc=0xffffffff807c3dd8 /* press i */
CPU 0 [fffffffff0062116]: LOG
jdb:
Now you can use the kernel debugger to inspect the current state of the system. Here I simply use addr2line to find the error:
addr2line -p -i -e bin/amd64_K8/l4f/ned -a 10038a2 0x00000000010038a2: /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:38 (inlined by) /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:77
Line 38 is the line with the null pointer dereference.
Thanks for your help! worked :) addr2line -p -i -e bin/amd64_K8/l4f/ned -a 100398d 0x000000000100398d: /home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:38 (inlined by) /home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:77 and I need to carefully read JDB manual.
I hope this helps, Jean
-- Regards, Leslie Zhai - a LLVM hackerhttps://reviews.llvm.org/p/xiangzhai/
l4-hackers@os.inf.tu-dresden.de