Hi Jean,
Thanks for your help!
On 08/06/17 04:18, Leslie Zhai wrote:I just want to express the ('on purpose' is misuse, sorry for my poor English) debug patch is a demo to guide me how to debug with jdb in the qemu :) you are my mentor teaching me patiently and carefully!
Hi Matthias,
Jean taught me about how to debug L4Re using jdb in qemu http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008038.html it used a on purpose bug (null ptr deref) to crash Ned, then L4Re thrown: unhandled write page fault at 0x0 pc=0x100398d, and addr2line ... -e ned -a 100398d to indicate the root cause line.
But how to find out the root cause if unclear that which components bring in the issue?
I think there might be a misunderstanding. I only introduced the null pointer dereference to demonstrate how to do it using a known problem. You can apply exactly the same steps in a different situation.
That is the key point! it is magic to me that 0xfffffffff0031ea9 is an address inside the kernel, I need to deepinto Fiasco about address space, correct?
But I would like to add something. You actually had all the information you needed:
MOE: loading 'rom/ned'
Ned says: Hi World!
[1] 0 pf: 0022 pfa=0000000000000018 ip=fffffffff0031ea9 (R-) spc=0xffffffff807c3dd
[2] L4Re[rm]: unhandled read page fault at 0x18 pc=0x102e893
[3] L4Re: unhandled exception: pc=0xfffffffff0031ea9 (pfa=18)
L4Re: Global::l4re_aux->ldr_flags=0
In [2] you see the message from the local pager, that is unable to find a valid region for the pagefault address and complains. It shows the 0x18 as pagefault address and an instruction pointer 0x102e893. The instruction pointer did not make any sense at that time. The local pager triggers an exception.
In [3] you see the exception message. It shows the instruction pointer where the pagefault was actually raised: 0xfffffffff0031ea9. This is an address inside the kernel:
I will try instead of posting screenshots on Twitter, sorry for my posting!~/build/tmp/l4re$ addr2line -p -i -e ../leslie/fiasco/build/fiasco.image -a fffffffff0031ea9If it is not a kernel fault and you need to find out, which component is responsible (or need more information about the current state) you can press 'i' when line [0] appears. You enter the kernel debugger and can look at the current thread using t<enter>. The thread has an id, which you can lookup in the list of present threads (using 'lp'). Here it is thread 22:
0xfffffffff0031ea9: /home/zhaixiang/project/l4re/kernel/fiasco/src/drivers/amd64/processor-amd64.cpp:67
...
(inlined by) /home/zhaixiang/project/l4re/kernel/fiasco/src/kern/ram_quota.cpp:53
/home/zhaixiang/project/l4re/kernel/fiasco/src/drivers/amd64/processor-amd64.cpp:67
fffffffff0031ea8: fa cli
fffffffff0031ea9: 48 8b 47 18 mov 0x18(%rdi),%rax
fffffffff0031ead: 48 03 77 10 add 0x10(%rdi),%rsi
_ZN9Ram_quota5allocEl():
/home/zhaixiang/project/l4re/kernel/fiasco/src/kern/ram_quota.cpp:54
id cpu name pr sp wait to stack stateAll threads shown here have the same address space and therefore the problem happened in the context of ned.
2e 0 ----- 2 1e - ( 920) rcv_wait
2b 0 ----- 10 1e - (1072) rcv_wait
22 0 ----- 2 1e (1776) ready
1f 0 #ned ff 1e - (1072) ready,rcv_wait
Maybe you can add "-serial stdio" to your qemu options and provide the complete backtrace for the problem? It looks like a framebuffer issue, but there should be more information in the lines above ...
The same story is how to debug L4Linux? http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008047.html please give me some advice, thanks a lot!
regards,
Jean
-- Regards, Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/