instead of the on purpose null ptr dereference patch, I just use printf to debug: I have no idea how to set breakpoint for jdb by running l4re/fiasco qemu.


Index: pkg/l4re-core/ned/server/src/main.cc
===================================================================
--- pkg/l4re-core/ned/server/src/main.cc    (revision 72)
+++ pkg/l4re-core/ned/server/src/main.cc    (working copy)
@@ -57,9 +57,11 @@
   Ned::Server svr;
   Ned::server = &svr;
 
+  printf("DEBUG: %s, %s, line %d\n", __FILE__, __PRETTY_FUNCTION__, __LINE__);
 
   lua(argc, argv);
 
+  printf("DEBUG: %s, %s, line %d\n", __FILE__, __PRETTY_FUNCTION__, __LINE__);
 
   while (1)
     l4_sleep_forever();
Index: pkg/l4re-core/lua/lib/contrib/src/lauxlib.c
===================================================================
--- pkg/l4re-core/lua/lib/contrib/src/lauxlib.c    (revision 72)
+++ pkg/l4re-core/lua/lib/contrib/src/lauxlib.c    (working copy)
@@ -968,7 +968,9 @@
     lua_pop(L, 1);  /* remove field */
     lua_pushcfunction(L, openf);
     lua_pushstring(L, modname);  /* argument to open function */
+    printf("DEBUG: %s, %s, line %d: %s\n", __FILE__, __func__, __LINE__, modname);
     lua_call(L, 1, 1);  /* call 'openf' to open module */
+    printf("DEBUG: %s, %s, line %d: %s\n", __FILE__, __func__, __LINE__, modname);
     lua_pushvalue(L, -1);  /* make copy of module (call result) */
     lua_setfield(L, -3, modname);  /* _LOADED[modname] = module */
   }

perhaps lua_call -> lua_callk -> luaD_callnoyield -> luaD_call -> luaD_precall is not able to work https://pbs.twimg.com/media/DBiZeRPU0AEs6xo.png

but I did not find out the root cause whether or not luaopen_package func lead such issue:


 61 static const luaL_Reg libs[] =                                                 
 62 {                                                                              
 63   { "_G", luaopen_base },                                                      
 64   {LUA_LOADLIBNAME, luaopen_package},

...

104 int lua(int argc, char const *const *argv)                                     
105 {                                                                              
106   printf("Ned says: Hi World!\n");                                             
107                                                                                
108   bool interactive = false;                                                    
109   bool noexit = false;                                                         
110                                                                                
111   if (argc < 2)                                                                
112     interactive = true;                                                        
113                                                                                
114   lua_State *L;                                                                
115   L = luaL_newstate();                                                         
116                                                                                
117   if (!L)                                                                      
118     return 1;                                                                  
119                                                                                
120   for (int i = 0; libs[i].func; ++i)                                           
121     {                                                                          
122       luaL_requiref(L, libs[i].name, libs[i].func, 1);

...

because luaL_requiref -> lua_call, when name is LUA_LOADLIBNAME (package) and func is luaopen_package.

在 2017年06月05日 12:00, Leslie Zhai 写道:



在 2017年06月03日 04:56, Jean Wolter 写道:
Hello Leslie,

On 02/06/17 11:33, Jean Wolter wrote:
I replaced my "ned" binary with yours and the system boots up fine.

Now I wonder how I could reproduce your problem. It looks like I
  • either need all binaries or
  • need to build them the same way you are building them. I used a fresh checkout from the svn repositories and use  gcc version 4.9.2 (Debian 4.9.2-10) to build the components.

Since I neither have your binaries nor know how you build them I thought about how you could diagnose the problem further. To demonstrate how to do this I intentionally added a null pointer dereference to ned and would like to discuss how I would diagnose this.
--- ned/server/src/main.cc    (revision 72)
+++ ned/server/src/main.cc    (working copy)
@@ -35,6 +35,8 @@
   Dbg::set_level(Dbg::Warn);
   info.printf("Hello from Ned\n");
 
+  *(int *)0 = 0;
+
   boot_info.printf("cmdline: ");
   for (int i = 0; i < argc; ++i)
     boot_info.cprintf("%s ", argv[i]);
1. Add -serial_esc -wait to kernel options in conf/modules.list
--- modules.list    (revision 72)
+++ modules.list    (working copy)
@@ -80,6 +80,7 @@
 module libuc_c.so
 
 entry framebuffer-example-x86
+kernel fiasco -serial_esc -wait
 roottask moe rom/x86-fb.cfg
 module x86-fb.cfg
 module l4re
If you boot this the kernel will enter the kernel debugger before doing anything else. Enter the following commands:
The kernel should stop when the access to 0x18 happens, then you can enter the kernel debugger using 'i' and can check who is responsible and maybe get a correct instruction pointer. If I do this here it looks like follows (I added -serial stdio to the qemu options):

qemu-system-x86_64 -kernel /home/jw5/build/tmp/l4re/bin/amd64_K8/bootstrap -append "bootstrap -modaddr 0x01100000" -serial stdio -initrd "/home/jw5/build/tmp/fiasco//fiasco -serial_esc -wait,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/sigma0 ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.cfg ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/l4re ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ned ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/io ,/home/jw5/src/l4resvn/src/l4/pkg/io/io/config//x86-legacy.devs ,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.io ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/fb-drv ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/mag ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ex_fb_spectrum_cc "
...
Freeing init code/data: 49152 bytes (12 pages)

Calibrating timer loop... done.

    ---------------------------------------------------------------------     
    CPU 0 [fffffffff003df99]: Wait
jdb: P+
PF logging enabled
jdb: Pr restrict to addr in [0-100]     /* pressed Prx0<space>100<space> */
PF logging enabled, restricted to 0000000000000000 <= pfa <= 0000000000000100
jdb: g     /* g does no show up in output */
...
MOE: cmdline: /home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg
MOE: Starting: rom/ned rom/x86-fb.cfg
MOE: loading 'rom/ned'
pf:  0022 pfa=0000000000000000 ip=00000000010038a2 (w-) spc=0xffffffff807c3dd8
/* press i */
    ---------------------------------------------------------------------     
    CPU 0 [fffffffff0062116]: LOG
jdb:

Now you can use the kernel debugger to inspect the current state of the system. Here I simply use addr2line to find the error:
addr2line -p -i -e bin/amd64_K8/l4f/ned -a 10038a2
0x00000000010038a2: /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:38
 (inlined by) /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:77

Line 38 is the line with the null pointer dereference.
Thanks for your help! worked :)
addr2line -p -i -e bin/amd64_K8/l4f/ned -a 100398d
0x000000000100398d: /home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:38
 (inlined by) /home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:77
and I need to carefully read JDB manual.



I hope this helps,
Jean




-- 
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/

-- 
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/