Hi Matthias,
Thank you so much for your pretty cool
0001-Fix-invalid-initialization-in-new.patch
http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008005.html
Jean taught me about how to debug L4Re using jdb in qemu
http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008038.html it
used a on purpose bug (null ptr deref) to crash Ned, then L4Re thrown:
unhandled write page fault at 0x0 pc=0x100398d, and addr2line ... -e ned
-a 100398d to indicate the root cause line.
But how to find out the root cause if unclear that which components
bring in the issue? just like Jean investigated why
framebuffer-example-x86 failed to work, actually it is not init process
issue, but Fiasco `_quota` member of a thread is not correctly
initialized. 0001-Fix-invalid-initialization-in-new.patch is awesome!
how to debug deepinto it? it might be easy to a maintainer of Fiasco
kernel, but it is really magic to me :)
The same story is how to debug L4Linux?
http://os.inf.tu-dresden.de/pipermail/l4-hackers/2017/008047.html please
give me some advice, thanks a lot!
--
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/
Hi Jean,
Happy holiday :)
在 2017年06月05日 23:43, Jean Wolter 写道:
> Hello Leslie,
>
> today are public holidays in Germany, so I will try to follow up on
> the issue tomorrow. I downloaded your binaries and am able to
> reproduce the problem. It appears to be a problem with the fiasco
> build. When I use my fiasco image build with gcc 4.9
l4/build/.config
GCCDIR=/usr/lib/gcc/x86_64-redhat-linux/6.3.1
...
> the systems boots. When I use your fiasco image it triggers the
> pagefault shown in your screen shots.
>
> There are some warnings in the boot sequence with your kernel:
> "Warning: Buddy::alloc: Size mismatch: e58 v 1000" ...
fiasco/build/globalconfig.out
# CONFIG_AMD64_K8 is not set
# CONFIG_AMD64_CORE2 is not set
CONFIG_AMD64_CORE_I=y
l4/build/.config
CONFIG_BUILD_ARCH="amd64"
CONFIG_BUILD_ARCH_amd64=y
CONFIG_CPU="K8"
CONFIG_CPU_X86_K8=y
does it mean this mismatch? but when I rebuilt fiasco with
CONFIG_AMD64_K8=y
# CONFIG_AMD64_CORE2 is not set
# CONFIG_AMD64_CORE_I is not set
the same story so I need to go deeper into fiasco/kern/buddy_alloc.cpp
>
>
> regards,
>
> Jean
>
>
--
Regards,
Leslie Zhai - a LLVM hackerhttps://reviews.llvm.org/p/xiangzhai/
Dear Rui,
ご回答有難うございます!
> I believe the LLD's linker script is now fairly decent that you can
expect it should work in most cases.
--
Regards,
Leslie Zhai https://reviews.llvm.org/p/xiangzhai/
在 2017年06月03日 04:56, Jean Wolter 写道:
> Hello Leslie,
>
> On 02/06/17 11:33, Jean Wolter wrote:
>> I replaced my "ned" binary with yours and the system boots up fine.
>>
>> Now I wonder how I could reproduce your problem. It looks like I
>>
>> * either need all binaries or
>> * need to build them the same way you are building them. I used a
>> fresh checkout from the svn repositories and use gcc version
>> 4.9.2 (Debian 4.9.2-10) to build the components.
>>
>
> Since I neither have your binaries nor know how you build them
https://drive.google.com/open?id=0ByE8c-y74l_uLUdKWG9NWndnSUk
and
https://drive.google.com/open?id=0ByE8c-y74l_udUtJOTZvc3BycG8
> I thought about how you could diagnose the problem further. To
> demonstrate how to do this I intentionally added a null pointer
> dereference to ned and would like to discuss how I would diagnose this.
>
> --- ned/server/src/main.cc (revision 72)
> +++ ned/server/src/main.cc (working copy)
> @@ -35,6 +35,8 @@
> Dbg::set_level(Dbg::Warn);
> info.printf("Hello from Ned\n");
>
> + *(int *)0 = 0;
> +
> boot_info.printf("cmdline: ");
> for (int i = 0; i < argc; ++i)
> boot_info.cprintf("%s ", argv[i]);
>
> 1. Add -serial_esc -wait to kernel options in conf/modules.list
>
> --- modules.list (revision 72)
> +++ modules.list (working copy)
> @@ -80,6 +80,7 @@
> module libuc_c.so
>
> entry framebuffer-example-x86
> +kernel fiasco -serial_esc -wait
> roottask moe rom/x86-fb.cfg
> module x86-fb.cfg
> module l4re
>
> If you boot this the kernel will enter the kernel debugger before
> doing anything else. Enter the following commands:
>
> * P+ /* show every pagefault before forwarding it to the pager */
> * Prx0<space>100<space> /* restrict pagefault logging to pagefaults
> between [0-100] */.
> * g /* go */
>
> The kernel should stop when the access to 0x18 happens, then you can
> enter the kernel debugger using 'i' and can check who is responsible
> and maybe get a correct instruction pointer. If I do this here it
> looks like follows (I added -serial stdio to the qemu options):
>
> qemu-system-x86_64 -kernel
> /home/jw5/build/tmp/l4re/bin/amd64_K8/bootstrap -append "bootstrap
> -modaddr 0x01100000" -serial stdio -initrd
> "/home/jw5/build/tmp/fiasco//fiasco -serial_esc
> -wait,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/sigma0
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe
> rom/x86-fb.cfg,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.cfg
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/l4re
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ned
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/io
> ,/home/jw5/src/l4resvn/src/l4/pkg/io/io/config//x86-legacy.devs
> ,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.io
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/fb-drv
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/mag
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ex_fb_spectrum_cc "
> ...
> Freeing init code/data: 49152 bytes (12 pages)
>
> Calibrating timer loop... done.
>
> ---------------------------------------------------------------------
> CPU 0 [fffffffff003df99]: Wait
> jdb: P+
> PF logging enabled
> jdb: Pr restrict to addr in [0-100] /* pressed
> Prx0<space>100<space> */
> PF logging enabled, restricted to 0000000000000000 <= pfa <=
> 0000000000000100
> jdb: g /* g does no show up in output */
> ...
> MOE: cmdline: /home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg
> MOE: Starting: rom/ned rom/x86-fb.cfg
> MOE: loading 'rom/ned'
> pf: 0022 pfa=0000000000000000 ip=00000000010038a2 (w-)
> spc=0xffffffff807c3dd8
> /* press i */
> ---------------------------------------------------------------------
> CPU 0 [fffffffff0062116]: LOG
> jdb:
>
> Now you can use the kernel debugger to inspect the current state of
> the system. Here I simply use addr2line to find the error:
>
> addr2line -p -i -e bin/amd64_K8/l4f/ned -a 10038a2
> 0x00000000010038a2:
> /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:38
> (inlined by)
> /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:77
>
>
> Line 38 is the line with the null pointer dereference.
>
> I hope this helps,
> Jean
>
>
>
--
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/
在 2017年06月03日 04:56, Jean Wolter 写道:
> Hello Leslie,
>
> On 02/06/17 11:33, Jean Wolter wrote:
>> I replaced my "ned" binary with yours and the system boots up fine.
>>
>> Now I wonder how I could reproduce your problem. It looks like I
>>
>> * either need all binaries or
>> * need to build them the same way you are building them. I used a
>> fresh checkout from the svn repositories and use gcc version
>> 4.9.2 (Debian 4.9.2-10) to build the components.
>>
>
> Since I neither have your binaries nor know how you build them I
> thought about how you could diagnose the problem further. To
> demonstrate how to do this I intentionally added a null pointer
> dereference to ned and would like to discuss how I would diagnose this.
>
> --- ned/server/src/main.cc (revision 72)
> +++ ned/server/src/main.cc (working copy)
> @@ -35,6 +35,8 @@
> Dbg::set_level(Dbg::Warn);
> info.printf("Hello from Ned\n");
>
> + *(int *)0 = 0;
> +
> boot_info.printf("cmdline: ");
> for (int i = 0; i < argc; ++i)
> boot_info.cprintf("%s ", argv[i]);
>
> 1. Add -serial_esc -wait to kernel options in conf/modules.list
>
> --- modules.list (revision 72)
> +++ modules.list (working copy)
> @@ -80,6 +80,7 @@
> module libuc_c.so
>
> entry framebuffer-example-x86
> +kernel fiasco -serial_esc -wait
> roottask moe rom/x86-fb.cfg
> module x86-fb.cfg
> module l4re
>
> If you boot this the kernel will enter the kernel debugger before
> doing anything else. Enter the following commands:
>
> * P+ /* show every pagefault before forwarding it to the pager */
> * Prx0<space>100<space> /* restrict pagefault logging to pagefaults
> between [0-100] */.
> * g /* go */
>
> The kernel should stop when the access to 0x18 happens, then you can
> enter the kernel debugger using 'i' and can check who is responsible
> and maybe get a correct instruction pointer. If I do this here it
> looks like follows (I added -serial stdio to the qemu options):
>
> qemu-system-x86_64 -kernel
> /home/jw5/build/tmp/l4re/bin/amd64_K8/bootstrap -append "bootstrap
> -modaddr 0x01100000" -serial stdio -initrd
> "/home/jw5/build/tmp/fiasco//fiasco -serial_esc
> -wait,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/sigma0
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe
> rom/x86-fb.cfg,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.cfg
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/l4re
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ned
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/io
> ,/home/jw5/src/l4resvn/src/l4/pkg/io/io/config//x86-legacy.devs
> ,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.io
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/fb-drv
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/mag
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ex_fb_spectrum_cc "
> ...
> Freeing init code/data: 49152 bytes (12 pages)
>
> Calibrating timer loop... done.
>
> ---------------------------------------------------------------------
> CPU 0 [fffffffff003df99]: Wait
> jdb: P+
> PF logging enabled
> jdb: Pr restrict to addr in [0-100] /* pressed
> Prx0<space>100<space> */
> PF logging enabled, restricted to 0000000000000000 <= pfa <=
> 0000000000000100
> jdb: g /* g does no show up in output */
> ...
> MOE: cmdline: /home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg
> MOE: Starting: rom/ned rom/x86-fb.cfg
> MOE: loading 'rom/ned'
> pf: 0022 pfa=0000000000000000 ip=00000000010038a2 (w-)
> spc=0xffffffff807c3dd8
> /* press i */
> ---------------------------------------------------------------------
> CPU 0 [fffffffff0062116]: LOG
> jdb:
>
> Now you can use the kernel debugger to inspect the current state of
> the system. Here I simply use addr2line to find the error:
>
> addr2line -p -i -e bin/amd64_K8/l4f/ned -a 10038a2
> 0x00000000010038a2:
> /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:38
> (inlined by)
> /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:77
>
>
> Line 38 is the line with the null pointer dereference.
Thanks for your help! worked :)
addr2line -p -i -e bin/amd64_K8/l4f/ned -a 100398d
0x000000000100398d:
/home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:38
(inlined by)
/home/zhaixiang/project/l4re/l4/pkg/l4re-core/ned/server/src/main.cc:77
and I need to carefully read JDB manual.
>
> I hope this helps,
> Jean
>
>
>
--
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/
Hi Jean,
Thanks for your help!
在 2017年06月03日 04:56, Jean Wolter 写道:
> Hello Leslie,
>
> On 02/06/17 11:33, Jean Wolter wrote:
>> I replaced my "ned" binary with yours and the system boots up fine.
>>
>> Now I wonder how I could reproduce your problem. It looks like I
>>
>> * either need all binaries or
>> * need to build them the same way you are building them. I used a
>> fresh checkout from the svn repositories and use gcc version
>> 4.9.2 (Debian 4.9.2-10) to build the components.
>>
>
> Since I neither have your binaries nor know how you build them
I follow the http://os.inf.tu-dresden.de/fiasco/build.html and
http://l4re.org/build.html
and my environment is Fedora 24, gcc 6.3.1, qemu 2.6.2 and clang-5.0svn.
hello example is able to work :)
> I thought about how you could diagnose the problem further. To
> demonstrate how to do this I intentionally added a null pointer
> dereference to ned and would like to discuss how I would diagnose this.
>
> --- ned/server/src/main.cc (revision 72)
> +++ ned/server/src/main.cc (working copy)
> @@ -35,6 +35,8 @@
> Dbg::set_level(Dbg::Warn);
> info.printf("Hello from Ned\n");
>
> + *(int *)0 = 0;
> +
> boot_info.printf("cmdline: ");
> for (int i = 0; i < argc; ++i)
> boot_info.cprintf("%s ", argv[i]);
>
> 1. Add -serial_esc -wait to kernel options in conf/modules.list
>
> --- modules.list (revision 72)
> +++ modules.list (working copy)
> @@ -80,6 +80,7 @@
> module libuc_c.so
>
> entry framebuffer-example-x86
> +kernel fiasco -serial_esc -wait
> roottask moe rom/x86-fb.cfg
> module x86-fb.cfg
> module l4re
>
> If you boot this the kernel will enter the kernel debugger before
> doing anything else. Enter the following commands:
>
> * P+ /* show every pagefault before forwarding it to the pager */
> * Prx0<space>100<space> /* restrict pagefault logging to pagefaults
> between [0-100] */.
> * g /* go */
>
> The kernel should stop when the access to 0x18 happens, then you can
> enter the kernel debugger using 'i' and can check who is responsible
> and maybe get a correct instruction pointer. If I do this here it
> looks like follows (I added -serial stdio to the qemu options):
>
> qemu-system-x86_64 -kernel
> /home/jw5/build/tmp/l4re/bin/amd64_K8/bootstrap -append "bootstrap
> -modaddr 0x01100000" -serial stdio -initrd
> "/home/jw5/build/tmp/fiasco//fiasco -serial_esc
> -wait,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/sigma0
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe
> rom/x86-fb.cfg,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.cfg
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/l4re
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ned
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/io
> ,/home/jw5/src/l4resvn/src/l4/pkg/io/io/config//x86-legacy.devs
> ,/home/jw5/src/l4resvn/src/l4/conf/examples//x86-fb.io
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/fb-drv
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/mag
> ,/home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/ex_fb_spectrum_cc "
> ...
> Freeing init code/data: 49152 bytes (12 pages)
>
> Calibrating timer loop... done.
>
> ---------------------------------------------------------------------
> CPU 0 [fffffffff003df99]: Wait
> jdb: P+
when I input p+ then pressed Enter, but jdb did not print out "PF
logging enabled" to me https://pbs.twimg.com/media/DBcu1DUUwAAkiDl.png
> PF logging enabled
> jdb: Pr restrict to addr in [0-100] /* pressed
> Prx0<space>100<space> */
> PF logging enabled, restricted to 0000000000000000 <= pfa <=
> 0000000000000100
> jdb: g /* g does no show up in output */
but input g then pressed Enter will continue to go
https://pbs.twimg.com/media/DBc0XlWUAAAmgSN.png sorry I am not familiar
with jdb, but have some experience with gdb.
> ...
> MOE: cmdline: /home/jw5/build/tmp/l4re/bin/amd64_K8/l4f/moe rom/x86-fb.cfg
> MOE: Starting: rom/ned rom/x86-fb.cfg
> MOE: loading 'rom/ned'
> pf: 0022 pfa=0000000000000000 ip=00000000010038a2 (w-)
> spc=0xffffffff807c3dd8
> /* press i */
> ---------------------------------------------------------------------
> CPU 0 [fffffffff0062116]: LOG
> jdb:
>
> Now you can use the kernel debugger to inspect the current state of
> the system. Here I simply use addr2line to find the error:
so I failed to debug the on purpose null ptr dereference issue *(int *)0
= 0; in your patch via jdb.
> addr2line -p -i -e bin/amd64_K8/l4f/ned -a 10038a2
> 0x00000000010038a2:
> /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:38
> (inlined by)
> /home/jw5/src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:77
>
>
> Line 38 is the line with the null pointer dereference.
>
> I hope this helps,
> Jean
>
>
>
--
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/
在 2017年06月02日 15:34, Jean Wolter 写道:
>
> Hello Leslie,
>
> On 02/06/17 05:52, Leslie Zhai wrote:
>> Thanks for your reply! but I met the same story for L4Linux
>> https://pbs.twimg.com/media/DBSQtoNUIAEE2HY.png
> It looks like the very same error, an unhandled read page-fault at
> address 0x18 triggered by an instruction at 0x102d72a.
>
>> > Maybe you could use addr2line/objdump
>> > to figure out, what happens at address 0x102d72a in ned.
>>
>> cd /home/zhaixiang/project/l4re/l4/build
>> addr2line -p -e bin/amd64_K8/l4f/ned 0x102d72a
>>
>> but there is no output easy to read for humans, I am not familiar
>> with addr2line, please give me some advice, thanks a lot!
> If I try it with an instruction pointer somewhere in the main function
> it looks like follows:
>
> ~/build/tmp/l4re/bin/amd64_K8/l4f$ addr2line -p -i -e ned -a 10038b3
> 0x00000000010038b3:
> /home/.../src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:39
> (inlined by)
> /home/.../src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:75
>
>
>> objdump -D bin/amd64_K8/l4f/ned > ned.S
>>
> If you use -lSd you should see line number information in the
> disassembled output, e.g:
>
> /home/.../src/l4resvn/src/l4/pkg/l4re-core/ned/server/src/main.cc:39
> for (int i = 0; i < argc; ++i)
> 10038b3: 85 db test %ebx,%ebx
>
>
>> 102d72a: 66 0f 12 05 9e 7b 02 movlpd 0x27b9e(%rip),%xmm0 #
>> 10552d0 <_ZL7HOOKKEY+0x 8>
>> is it enough to figure out what happened? if not, I will upload the
>> disassemble to my Google drive.
> This looks strange. There is nothing looking like an access to 0x18.
> Could send me your ned binary?
https://drive.google.com/open?id=0ByE8c-y74l_uVlB6cnJBdWV1VzA
>
> regards,
> Jean
>
--
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/
Hi L4 hackers,
When I Running hello.iso http://os.inf.tu-dresden.de/fiasco/build.html
via *make grub2iso E=hello MODULE_SEARCH_PATH=/path/to/fiasco-build-dir
*it failed to generate ISO, xorriso thrown such error:
xorriso : FAILURE : Not a known command: '-f'
xorriso : aborting : -abort_on 'FAILURE' encountered 'FAILURE'
Failed to create ISO at tool/lib/L4/Grub.pm line 123.
Makefile:598: recipe for target 'grub2iso' failed
so I simply changed the option:
Index: tool/lib/L4/Grub.pm
===================================================================
--- tool/lib/L4/Grub.pm (revision 72)
+++ tool/lib/L4/Grub.pm (working copy)
@@ -116,7 +116,7 @@
close A;
};
my $cmd = "$mkr --output=\"$isofilename\" $dir ".
- join(' ', @morefiles)." --$opt -f";
+ join(' ', @morefiles)." --$opt -follow default";
system("$cmd");
die "Failed to create ISO" if $?;
# grub-mkrescue does not propagate internal tool errors
please review my patch, thanks a lot!
--
Regards,
Leslie Zhai - a LLVM hacker https://reviews.llvm.org/p/xiangzhai/