Hi Jean,

Thanks for your help!

在 2017年06月09日 00:53, Jean Wolter 写道:

On 08/06/17 04:18, Leslie Zhai wrote:

Hi Matthias,

Jean taught me about how to debug L4Re using jdb in qemu http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008038.html it used a on purpose bug (null ptr deref) to crash Ned, then L4Re thrown: unhandled write page fault at 0x0 pc=0x100398d, and addr2line ... -e ned -a 100398d to indicate the root cause line.

But how to find out the root cause if unclear that which components bring in the issue?

I think there might be a misunderstanding. I only introduced the null pointer dereference to demonstrate how to do it using a known problem. You can apply exactly the same steps in a different situation.

I just want to express the ('on purpose' is misuse, sorry for my poor English) debug patch is a demo to guide me how to debug with jdb in the qemu :) you are my mentor teaching me patiently and carefully!

But I would like to add something. You actually had all the information you needed:

   MOE: loading 'rom/ned'
   Ned says: Hi World!
[1] 0 pf: 0022 pfa=0000000000000018 ip=fffffffff0031ea9 (R-) spc=0xffffffff807c3dd
[2] L4Re[rm]: unhandled read page fault at 0x18 pc=0x102e893
[3] L4Re: unhandled exception: pc=0xfffffffff0031ea9 (pfa=18)
     L4Re: Global::l4re_aux->ldr_flags=0

In [2] you see the message from the local pager, that is unable to find a valid region for the pagefault address and complains. It shows the 0x18 as pagefault address and an instruction pointer 0x102e893. The instruction pointer did not make any sense at that time. The local pager triggers an exception.

In [3] you see the exception message. It shows the instruction pointer where the pagefault was actually raised: 0xfffffffff0031ea9. This is an address inside the kernel:

That is the key point! it is magic to me that 0xfffffffff0031ea9 is an address inside the kernel, I need to deepinto Fiasco about address space, correct?

~/build/tmp/l4re$ addr2line -p -i -e ../leslie/fiasco/build/fiasco.image -a fffffffff0031ea9
0xfffffffff0031ea9: /home/zhaixiang/project/l4re/kernel/fiasco/src/drivers/amd64/processor-amd64.cpp:67
...
(inlined by) /home/zhaixiang/project/l4re/kernel/fiasco/src/kern/ram_quota.cpp:53

/home/zhaixiang/project/l4re/kernel/fiasco/src/drivers/amd64/processor-amd64.cpp:67
fffffffff0031ea8:       fa                      cli
fffffffff0031ea9:       48 8b 47 18             mov    0x18(%rdi),%rax
fffffffff0031ead:       48 03 77 10             add    0x10(%rdi),%rsi
_ZN9Ram_quota5allocEl():
/home/zhaixiang/project/l4re/kernel/fiasco/src/kern/ram_quota.cpp:54

If it is not a kernel fault and you need to find out, which component is responsible (or need more information about the current state) you can press 'i' when line [0] appears. You enter the kernel debugger and can look at the current thread using t<enter>. The thread has an id, which you can lookup in the list of present threads (using 'lp'). Here it is thread 22:

id cpu    name             pr     sp wait    to stack state
   2e   0     -----             2     1e     -       ( 920) rcv_wait
   2b   0     -----            10     1e     -       (1072) rcv_wait
   22   0     -----             2     1e             (1776) ready
   1f   0     #ned             ff     1e     -       (1072) ready,rcv_wait

All threads shown here have the same address space and therefore the problem happened in the context of ned.

The same story is how to debug L4Linux? http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008047.html please give me some advice, thanks a lot!

Maybe you can add "-serial stdio" to your qemu options and provide the complete backtrace for the problem? It looks like a framebuffer issue, but there should be more information in the lines above ...

I will try instead of posting screenshots on Twitter, sorry for my posting!

regards,
Jean

-- 
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/

--------------B12399E552DA71F083F32CA1-- (