Dear Reinier,
Am 05.11.2014 um 23:22 schrieb Reinier Millo Sánchez:
Hello Aaron I'm working in a project to develop an embedded operating system for real time purposes using Fiasco.OC as microkernel. I have reviewed you article "Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind". I'm interested to do son profiling tests with Fiasco.OC and Linux Kernel, using Valgrind. Do you have ported Valgrind to the Fiasco.OC microkernel interface? Is the port available?
please let us use the l4-hackers mailing list for further discussion.
Indeed we ported Valgrind to Fiasco.OC and L4Re. Its source code is available in the L4Re SVN in l4/pkg/valgrind. As you might have noticed, our paper was in 2010 and if I remember correctly, the Valgrind version in the repository was last updated around 2011. It might still work, but you may also encounter problems. Feel free to try this out and let us know about any questions you have.
Apart from that there may be other ways of achieving your profiling needs on top of Fiasco.OC. What exactly would you want to do?
Kind regards, Bjoern
Hello Hackers,
actually i have a bug in my app, and i i can't make it up on specific code of my app.
L4Re[rm]: unhandled write page fault @7ffff300 pc=15c7a8
Am 2014-11-06 07:56, schrieb Björn Döbel:
Dear Reinier,
Am 05.11.2014 um 23:22 schrieb Reinier Millo Sánchez:
Hello Aaron I'm working in a project to develop an embedded operating system for real time purposes using Fiasco.OC as microkernel. I have reviewed you article "Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind". I'm interested to do son profiling tests with Fiasco.OC and Linux Kernel, using Valgrind. Do you have ported Valgrind to the Fiasco.OC microkernel interface? Is the port available?
please let us use the l4-hackers mailing list for further discussion.
Indeed we ported Valgrind to Fiasco.OC and L4Re. Its source code is available in the L4Re SVN in l4/pkg/valgrind. As you might have noticed, our paper was in 2010 and if I remember correctly, the Valgrind version in the repository was last updated around 2011. It might still work, but you may also encounter problems. Feel free to try this out and let us know about any questions you have.
Apart from that there may be other ways of achieving your profiling needs on top of Fiasco.OC. What exactly would you want to do?
Kind regards, Bjoern
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
Sorry for the mail before, this was a mistake.
Next try:
i run into an Error in my l4re-App. Till now, i did printf()-Debugging, but it is at its limit in this case.
The Error message is the following:
L4Re[rm]: unhandled write page fault @7ffff300 pc=15c7a8
However, with printf() i can't make it up to specific code. But, there is a function with some if-else stuff, and if i comment some lines out, the Error doesnt appear. Funny, printf() tells me, that the commented code never executes. i.e. the else-part is never ever executed, but the program still fails when it is not commented out.
So, i'm not sure what to try next. Does Valgrind work here?
And what about the Valgrind in the Snapshot? It has some strange Makefile. Does this work for a ARM system?
SYSTEMS = x86-l4f
all:: if [ ! -e $(PKGDIR)/broken ]; then PWD=$(PWD)/build make -C build; fi
Or can i run my bootstrap.raw on an standard ARM Linux?
Anything else to try besides Valgrind?
Thanks
Am 2014-11-06 07:56, schrieb Björn Döbel:
Dear Reinier,
Am 05.11.2014 um 23:22 schrieb Reinier Millo Sánchez:
Hello Aaron I'm working in a project to develop an embedded operating system for real time purposes using Fiasco.OC as microkernel. I have reviewed you article "Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind". I'm interested to do son profiling tests with Fiasco.OC and Linux Kernel, using Valgrind. Do you have ported Valgrind to the Fiasco.OC microkernel interface? Is the port available?
please let us use the l4-hackers mailing list for further discussion.
Indeed we ported Valgrind to Fiasco.OC and L4Re. Its source code is available in the L4Re SVN in l4/pkg/valgrind. As you might have noticed, our paper was in 2010 and if I remember correctly, the Valgrind version in the repository was last updated around 2011. It might still work, but you may also encounter problems. Feel free to try this out and let us know about any questions you have.
Apart from that there may be other ways of achieving your profiling needs on top of Fiasco.OC. What exactly would you want to do?
Kind regards, Bjoern
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
Hi,
i run into an Error in my l4re-App. Till now, i did printf()-Debugging, but it is at its limit in this case.
The Error message is the following:
L4Re[rm]: unhandled write page fault @7ffff300 pc=15c7a8
This error message tells you that your program is trying to execute an instruction at PC value 0x15c7a8. This instruction causes a write page fault at address 0x7ffff300 and L4re does not know what to do about it.
This usually happens when you try to access an address with no memory mapped. From the address it looks like you are accessing an address right below your stack, i.e., you are exceeding L4Re's default stack size. Most likely you have a function with a large buffer on the stack or you are recursing very deeply?
However, with printf() i can't make it up to specific code. But, there is a function with some if-else stuff, and if i comment some lines out, the Error doesnt appear. Funny, printf() tells me, that the commented code never executes. i.e. the else-part is never ever executed, but the program still fails when it is not commented out.
So, i'm not sure what to try next. Does Valgrind work here?
Probably it would. Valgrind tracks allocated memory regions and could at least tell you what I told you above: you are accessing memory out of any allocated region.
And what about the Valgrind in the Snapshot? It has some strange Makefile. Does this work for a ARM system?
SYSTEMS = x86-l4f
all:: if [ ! -e $(PKGDIR)/broken ]; then PWD=$(PWD)/build make -C build; fi
Or can i run my bootstrap.raw on an standard ARM Linux?
Nope, we only ported it to x86/32, never tried ARM.
Anything else to try besides Valgrind?
As suggested above, check for deep recursion or arrays allocated on the stack. If that does not help, try increasing L4Re's default stack size for your application using the L4RE_ELF_AUX_ELEM_T macro defined in l4/re/elf_aux.h. For this, place something like this anywhere in your compilation unit:
#include <l4/re/elf_aux.h> L4RE_ELF_AUX_ELEM_T(l4re_elf_aux_mword_t, stack_size, L4RE_ELF_AUX_T_STACK_SIZE, 65536);
(This example sets the stack size to 64k.)
Hth, Bjoern
One more hint:
The Error message is the following:
L4Re[rm]: unhandled write page fault @7ffff300 pc=15c7a8
This error message tells you that your program is trying to execute an instruction at PC value 0x15c7a8. This instruction causes a write page fault at address 0x7ffff300 and L4re does not know what to do about it.
This usually happens when you try to access an address with no memory mapped. From the address it looks like you are accessing an address right below your stack, i.e., you are exceeding L4Re's default stack size. Most likely you have a function with a large buffer on the stack or you are recursing very deeply?
you might want to use standard binutils (e.g., objdump) on your binary to find out what instruction is at the respective PC and map this to whatever function it belongs to in your code.
Bjoern
Hello Hackers,
this is a following to the thread below, but i didnt wanna occupy it so i start a new one. http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007049.html http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007050.html
This usually happens when you try to access an address with no memory mapped. From the address it looks like you are accessing an address right below your stack, i.e., you are exceeding L4Re's default stack size. Most likely you have a function with a large buffer on the stack or you are recursing very deeply?
Actually not. There is a tiny buffer (4096 Bytes), and one function calles itself recursively one time.
If that does not help, try increasing L4Re's default stack size for your application using the L4RE_ELF_AUX_ELEM_T macro defined in l4/re/elf_aux.h. For this, place something like this anywhere in your compilation unit:
#include <l4/re/elf_aux.h> L4RE_ELF_AUX_ELEM_T(l4re_elf_aux_mword_t, stack_size, L4RE_ELF_AUX_T_STACK_SIZE, 65536);
Didn't work. I even doubled the size. There is just a tiny difference in the Error Message, which most likely comes from the extra code.
before:
L4Re[rm]: unhandled write page fault @7ffff300 pc=15c7a8
now:
L4Re[rm]: unhandled write page fault @7ffff2e0 pc=15c7a8
you might want to use standard binutils (e.g., objdump) on your binary to find out what instruction is at the respective PC and map this to whatever function it belongs to in your code.
I only have a pc=115c7a8, but since i start from 0x01000000, i guess that right?
115c7a8: e28cca13 add ip, ip, #77824 ; 0x13000
I couldn't find an image with debug information, so i dont know what C-code this belongs to.
Thanks, ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12.11.2014 00:11, ba_f wrote:
Hello Hackers,
this is a following to the thread below, but i didnt wanna occupy it so i start a new one. http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007049.html http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007050.html
This usually happens when you try to access an address with no memory mapped. From the address it looks like you are accessing an address right below your stack, i.e., you are exceeding L4Re's default stack size. Most likely you have a function with a large buffer on the stack or you are recursing very deeply?
Actually not. There is a tiny buffer (4096 Bytes), and one function calles itself recursively one time.
Ok, so my guess was wrong.
before:
L4Re[rm]: unhandled write page fault @7ffff300 pc=15c7a8
now:
L4Re[rm]: unhandled write page fault @7ffff2e0 pc=15c7a8
you might want to use standard binutils (e.g., objdump) on your binary to find out what instruction is at the respective PC and map this to whatever function it belongs to in your code.
I only have a pc=115c7a8, but since i start from 0x01000000, i guess that right?
115c7a8: e28cca13 add ip, ip, #77824 ; 0x13000
No way. This instruction adds a constant to a register and does not touch memory at all. Hence it won't raise a page fault. Did you objdump your program for that?
I couldn't find an image with debug information, so i dont know what C-code this belongs to.
objdump's '-d' option did not help? Are you specifying your own compiler flags for this program? Otherwise, L4Re's build system by default compiles with debug info.
Bjoern
Hello,
I couldn't find an image with debug information, so i dont know what C-code this belongs to.
objdump's '-d' option did not help? Are you specifying your own compiler flags for this program? Otherwise, L4Re's build system by default compiles with debug info.
Indeed, i was objdumping the bootstrap.elf, which seems not to have Debug-Infos. But i found Debug-infos here: obj/l4/arm-ca/bin/arm_armv7a/l4f/crapApp
I only have a pc=115c7a8, but since i start from 0x01000000, i guess that right?
115c7a8: e28cca13 add ip, ip, #77824 ; 0x13000
No way. This instruction adds a constant to a register and does not touch memory at all. Hence it won't raise a page fault. Did you objdump your program for that?
I'm not sure what you mean. I'm not experienced with objdump, though. But i can't look at all Load & Store instructions, do i? Looking for pc=15c7a8 isn't the solution, neither, since it's not there. So, what would u recommend to look for?
My problem again: The function foo() is only called at one place in the code. I have to comment some stuff out in foo() to make the program work. But even if i dont call foo() (it is useless for this test-case), the program fails at the same place. So, the fault can't be in foo().
There are no threads, and no risky pointer stuff. Actually it is tested legacy code for Intel-Linux.
Thank u so far,
ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
I only have a pc=115c7a8, but since i start from 0x01000000, i guess that right?
115c7a8: e28cca13 add ip, ip, #77824 ; 0x13000
No way. This instruction adds a constant to a register and does not touch memory at all. Hence it won't raise a page fault. Did you objdump your program for that?
I'm not sure what you mean.
The error you are seeing is a page fault. Page faults happen, when you access memory that is not mapped. For this you need to be executing an instruction that actually accesses memory. The instruction you found does not access memory.
I'm not experienced with objdump, though. But i can't look at all Load & Store instructions, do i? Looking for pc=15c7a8 isn't the solution, neither, since it's not there. So, what would u recommend to look for?
We are looking for an instruction at address 0x15c7a8. Things I would check now:
* If the instruction is not in your program (the one in the build directory), is it maybe in another module that gets packed into your bootstrap.elf image? You get a list of the packed modules when you run make E=.. - the binaries should all be in your build directory.
* At the time of the page fault you end up in the kernel debugger. Use 'lp' to see the list of present threads. One of them will be the one that is in the debugger now. It should be one of the L4Re threads (those have names starting with # in the thread list). Which other thread is currently blocking in IPC to the L4Re thread? What is this thread's state (command 't<id>')? At what instruction is the thread stuck? What binary belongs to this thread?
* Easy stuff: is your program even executing code in its main() function already? How far does the program get before the page fault happens?
My problem again: The function foo() is only called at one place in the code. I have to comment some stuff out in foo() to make the program work. But even if i dont call foo() (it is useless for this test-case), the program fails at the same place. So, the fault can't be in foo().
Then there seems to be no need for debugging foo().
Btw., are you debugging this on real hardware or in an emulator, such as qemu?
Bjoern
Hello,
We are looking for an instruction at address 0x15c7a8. Things I would check now:
- If the instruction is not in your program (the one in the build directory), is it maybe in another module that gets packed into your bootstrap.elf image? You get a list of the packed modules when you run make E=.. - the binaries should all be in your build directory.
I have found this in moe: GC_try_to_collect_inner(): /src/l4/pkg/boehm_gc/contrib/alloc.c:404 15c7a8: e12fff37 blx r7
- At the time of the page fault you end up in the kernel debugger. Use 'lp' to see the list of present threads. One of them will be the one that is in the debugger now. It should be one of the L4Re threads (those have names starting with # in the thread list). Which other thread is currently blocking in IPC to the L4Re thread? What is this thread's state (command 't<id>')? At what instruction is the thread stuck? What binary belongs to this thread?
Do u say JDB starts automatically, when the error occurs? But typing 'lp <ENTER>' in the UART console, doesn't effect anything. Do i have to add JDB via modules.list or something?
- Easy stuff: is your program even executing code in its main() function already? How far does the program get before the page fault happens?
Yes, it runs for a while. There are two L4-Task doing a protocol over IPC. Client sends a message and Server answers. Then Client fails, when perparing his next message (no ipc-functions involved).
Btw., are you debugging this on real hardware or in an emulator, such as qemu?
Only on real HW: ARMv7 Cortex A9. Qemu doesn't work on my Debian. The window stays black. Maybe i'll try on an other Linux.
Have a nice day!
ba_f
Hello again,
and thanks to Martin.
- At the time of the page fault you end up in the kernel debugger. Use 'lp' to see the list of present threads.
Alright, let's see:
--------------------------------------------------------------------- CPU 0 [f002e898]: IRQ ENTRY CPU(s) 0-1 entered JDB jdb: l id cpu name pr sp wait to state 6e 0 &- 2 58 - rcv_wait 6b 0 ----- 10 58 - rcv_wait 62 0 &- 2 49 - rcv_wait 60 0 ----- 2 58 6e rcv_wait,exc_progr 5c 0 ----- 10 49 - rcv_wait 59 0 #myClient ff 58 - rcv_wait 4d 0 ----- 2 49 - rcv_wait,fpu 4a 0 #myServer ff 49 - rcv_wait 40 0 ----- 2 31 - rcv_wait 3d 0 ----- 10 31 - rcv_wait 35 0 ----- 2 31 35 rcv_wait 32 0 #ned ff 31 - rcv_wait c 1 ----- 0 1 ready a 0 moe ff 9 - rcv_wait 8 0 sigma0 1 7 - rcv_wait 6 0 ----- 0 1 ready
One of them will be the one that is in the debugger now. It should be one of the L4Re threads (those have names starting with # in the thread list).
Is it thread 60 with 'exc_progr'? Or one of the 'ready' threads?
What is this thread's state (command 't<id>')? At what instruction is the thread stuck?
jdb: t60 thread : 60 <0xf1195000> CPU: 0:0 prio: 02 mode: Con state : 40008 rcv_wait,exc_progr wait for: 6e polling: rcv descr: 00000000 lcked by: timeout : cpu time: 21.000 ms timeslice: 8000/-1 �s pager : [C: 3] D: 59 task : D: 58 exc-hndl: [C: 415] D: 6e UTCB : f118a200/b3000200 vCPU : --- vCPU : ---
PC=0015c7a8 USP=7ffff2f8 [0] 000a4a0c 00091f5c 0000050c 00025848 [4] 00031c20 ffffdabc 000aea70 00000001 [8] 80003e64 00091f5c 00000000 00000000 [c] 000ae9b8 0007cb14 004094b4 60000010
At least this thread has the PC of the Error-Message:
myClient| L4Re[rm]: unhandled write page fault @7ffff2e0 pc=15c7a8 myClient| No signal handler found
Well, is this the bad guy? If so, then where does it come from? And what to check next?
Greetings, ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Alright, let's see:
CPU 0 [f002e898]: IRQ ENTRY
CPU(s) 0-1 entered JDB jdb: l id cpu name pr sp wait to state 6e 0 &- 2 58 - rcv_wait 6b 0 ----- 10 58 - rcv_wait 62 0 &- 2 49 - rcv_wait 60 0 ----- 2 58 6e rcv_wait,exc_progr 5c 0 ----- 10 49 - rcv_wait 59 0 #myClient ff 58 - rcv_wait 4d 0 ----- 2 49 - rcv_wait,fpu 4a 0 #myServer ff 49 - rcv_wait 40 0 ----- 2 31 - rcv_wait 3d 0 ----- 10 31 - rcv_wait 35 0 ----- 2 31 35 rcv_wait 32 0 #ned ff 31 - rcv_wait c 1 ----- 0 1 ready a 0 moe ff 9 - rcv_wait 8 0 sigma0 1 7 - rcv_wait 6 0 ----- 0 1 ready
One of them will be the one that is in the debugger now. It should be one of the L4Re threads (those have names starting with # in the thread list).
Is it thread 60 with 'exc_progr'? Or one of the 'ready' threads?
Very good. The interesting parts of this table are:
* id -> is the respective thread's debug ID * name -> the thread name set using l4_debugger_set_name(). Usually, L4Re applications' first thread is one with a name starting with # followed by the binary name. * sp -> the address space (L4: task) this thread is running in * wait -> the ID of the thread / object this thread is currently blocked waiting for * state -> the thread's scheduling state
As you guessed correctly, 60 is the interesting one here. It is stuck in an exception sent to thread 6e. Both are running in address space 58, The L4Re thread for this space is 59 indicating that this is the myClient application.
One more interesting thing here: * 60 is stuck in a message sent to a thread named '&-'. This is the name used by L4Re's POSIX signal library for the thread responsible for handling exceptions and generating signals from them. This indicates that the actual page fault is already gone and was trans- formed into an exception message to this thread.
What is this thread's state (command 't<id>')? At what instruction is the thread stuck?
jdb: t60 thread : 60 <0xf1195000> CPU: 0:0 prio: 02 mode: Con state : 40008 rcv_wait,exc_progr wait for: 6e polling: rcv descr: 00000000 lcked by: timeout : cpu time: 21.000 ms timeslice: 8000/-1 �s pager : [C: 3] D: 59 task : D: 58 exc-hndl: [C: 415] D: 6e UTCB : f118a200/b3000200 vCPU : --- vCPU : ---
PC=0015c7a8 USP=7ffff2f8 [0] 000a4a0c 00091f5c 0000050c 00025848 [4] 00031c20 ffffdabc 000aea70 00000001 [8] 80003e64 00091f5c 00000000 00000000 [c] 000ae9b8 0007cb14 004094b4 60000010
At least this thread has the PC of the Error-Message:
Exactly. 60 is the one that caused the page fault. This means your myClient application binary should contain the respective PC and allow you to figure out where the faulting access happened.
myClient| L4Re[rm]: unhandled write page fault @7ffff2e0 pc=15c7a8 myClient| No signal handler found
Well, is this the bad guy? If so, then where does it come from? And what to check next?
Ideally you now find the faulting address in myClient and figure out where the access happens. Keep us posted if you need help with that.
Bjoern
Ideally you now find the faulting address in myClient and figure out where the access happens. Keep us posted if you need help with that.
Bjoern
Oh boy...
Looking at the TCB, i see the start address 0xf118b000; and what i really do care about is the address of the UTCB at f118a000, don't i?
jdb: t59 thread : 59 <0xf118b000> CPU: 0:0 prio: ff mode: Con state : 008 rcv_wait wait for: --- polling: rcv descr: 00000000 lcked by: timeout : cpu time: 1.000 ms timeslice: 9000/-1 �s pager : [C: 5] D: 54 task : D: 58 exc-hndl: [C: 5] D: 54 UTCB : f118a000/b3000000 vCPU : --- vCPU : ---
PC=b0002b8c USP=b1007dc0 [0] 00000001 00000001 fffff80f 04000000 [4] 00000000 b001910c 00000000 b00190e4 [8] b1007e14 00000000 000000ff b0019284 [c] 00000007 b0002b8c fffffff8 00000010
f118be6c f1195000 f00101c0 f00101c4 f118b000 200000d3 f118b000 f005f6e8 f00640e0 e80 f1195000 f118b014 f005f6e8 ffff0440 00000000 f0010b78 f005f6b8 00000001 ea0 f11f6000 f118a000 00000008 f00640e0 f118bef8 f118bec8 f0054590 f118b000 ec0 00000000 f001e378 200000d3 f118b008 f1195040 f001e2e4 f005f6e8 f118a000 ee0 00000001 00330007 fffe0002 f118a000 00000000 00000001 f0054590 f001c4b0 f00 00000000 fff32011 f118bf4c f003581c ff000000 00000001 00060450 00000003 f20 00000000 04000000 00000001 f118bfb8 f118bf40 00000001 f118b000 f0022090 f40 00000001 00000000 04000000 f118bfb8 00000003 00000000 00000000 f118bfb8 tcb: 59 f118b11c [Thread ] {#myClient } C=0 S=D:58 R=1
Well, i dumped f118b000 and found f118a000.
f118b000:f0055f80 00000008 0 0 0 0 f118be68 0 f118b020:b3000000 f118a000 f12e45ec 0 0 f118b000 0 0 f118a040:b1000000 b1007fff 0 fffff800 b3000000 b3000fff 0 fffff800
Goto f118a000, gives me the PC=15c7a8 again, where the Error happens. And at 15c7a8 there is e12fff37 <=> blx r7 , which i've already found in MOE. But that's not what i'm looking for, is it?
f118a000: -1 0015c7a8 0000f000 0 0 0004000c 00414000 0000003e f118a020:00414037 b001efff 0 fffff800 b0100000 b013ffff 0 fffff800 f118a040:b1000000 b1007fff 0 fffff800 b3000000 b3000fff 0 fffff800
Anyway, thank u again. ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 17.11.2014 23:00, ba_f wrote:
Ideally you now find the faulting address in myClient and figure out where the access happens. Keep us posted if you need help with that.
Bjoern
Oh boy...
Looking at the TCB, i see the start address 0xf118b000; and what i really do care about is the address of the UTCB at f118a000, don't i?
Why would you care about the UTCB? We are looking for the reason your thread caused a page fault.
[..]
Goto f118a000, gives me the PC=15c7a8 again, where the Error happens.
Yep, the UTCB contains the exception message informing the exception handler about the page fault. This is the effect of your page fault, not the cause.
And at 15c7a8 there is e12fff37 <=> blx r7 , which i've already found in MOE. But that's not what i'm looking for, is it?
This has nothing to do with MOE. When you objdump the myClient binary, can you find the address in there? Does the binary contain blx r7 as well?
Bjoern
Hi,
And at 15c7a8 there is e12fff37 <=> blx r7 , which i've already found in MOE. But that's not what i'm looking for, is it?
This has nothing to do with MOE. When you objdump the myClient binary, can you find the address in there? Does the binary contain blx r7 as well?
No, this instruction or PC is not in myClient nor in myServer. But i can find the instruction in a shared lib which myServer uses.
btw. i use l4re-snapshot-2014022818.
Goto f118a000, gives me the PC=15c7a8 again, where the Error happens.
Yep, the UTCB contains the exception message informing the exception handler about the page fault. This is the effect of your page fault, not the cause.
Well then, where do i find the cause for the page error? Do i need a deeper unterstanding of the stack, or shall i just look at the instructions around PC=15c7a8 ?
greets, ba_f
what does it mean in the JDB lp command if there is an *asterisk * *after the name of the thread in the *wait *column ?
id cpu name pr sp wait to state 109 0 cons a ba 77* rcv_wait
On Tue, Nov 18, 2014 at 6:19 PM, ba_f ba_f@rbg.informatik.tu-darmstadt.de wrote:
Hi,
And at 15c7a8 there is e12fff37 <=> blx r7 , which i've already
found in MOE. But that's not what i'm looking for, is it?
This has nothing to do with MOE. When you objdump the myClient binary, can you find the address in there? Does the binary contain blx r7 as well?
No, this instruction or PC is not in myClient nor in myServer. But i can find the instruction in a shared lib which myServer uses.
btw. i use l4re-snapshot-2014022818.
Goto f118a000, gives me the PC=15c7a8 again, where the Error
happens.
Yep, the UTCB contains the exception message informing the exception handler about the page fault. This is the effect of your page fault, not the cause.
Well then, where do i find the cause for the page error? Do i need a deeper unterstanding of the stack, or shall i just look at the instructions around PC=15c7a8 ?
greets, ba_f
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 21.11.2014 19:35, teclis High Elf wrote:
what does it mean in the JDB lp command if there is an *asterisk * *after the name of the thread in the *wait *column ?
id cpu name pr sp wait to state 109 0 cons a ba 77* rcv_wait
The asterisk indicates that the thread is waiting on an IRQ object.
Bjoern
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
And at 15c7a8 there is e12fff37 <=> blx r7 , which i've already found in MOE. But that's not what i'm looking for, is it?
This has nothing to do with MOE. When you objdump the myClient binary, can you find the address in there? Does the binary contain blx r7 as well?
No, this instruction or PC is not in myClient nor in myServer. But i can find the instruction in a shared lib which myServer uses.
No. We are definitely looking at myClient as this is where the page fault happens. Please objdump myClient and find the page fault PC again.
Goto f118a000, gives me the PC=15c7a8 again, where the Error happens.
Yep, the UTCB contains the exception message informing the exception handler about the page fault. This is the effect of your page fault, not the cause.
Well then, where do i find the cause for the page error? Do i need a deeper unterstanding of the stack, or shall i just look at the instructions around PC=15c7a8 ?
Actually I still think we are looking for the instruction exactly at this location. How does the objdump around this area look like?
Bjoern
Hello,
well i guess we're stuck here.
Because i can't find no 15c7a8 nor e12fff37 in myClient. I also checked the one shared lib myClient uses, without success.
Maybe i do objdump the wrong files?
arm-linux-gnueabihf-objdump -Dlx l4re-snapshot-2014022818/obj/l4/arm-ca/bin/arm_armv7a/l4f/myClient | less arm-linux-gnueabihf-objdump -Dlx l4re-snapshot-2014022818/obj/l4/arm-ca/lib/arm_armv7a/l4f/libClient.so | less
But hey, JDB tells me that at 15c7a8 there is the instruction e12fff37 <=> blx r7. And this is exactly what i get when objdump moe. PC and opcode match.
objdump -Dlx l4re-snapshot-2014022818/obj/l4/arm-ca/bin/arm_armv7a/l4f/moe | less
GC_try_to_collect_inner(): l4re-snapshot-2014022818/src/l4/pkg/boehm_gc/contrib/alloc.c:404 15c7a8: e12fff37 blx r7
Sounds logic to me, that BOEHM_GC runs into the fault...
And at 15c7a8 there is e12fff37 <=> blx r7 , which i've already found in MOE. But that's not what i'm looking for, is it?
This has nothing to do with MOE. When you objdump the myClient binary, can you find the address in there? Does the binary contain blx r7 as well?
No, this instruction or PC is not in myClient nor in myServer. But i can find the instruction in a shared lib which myServer uses.
No. We are definitely looking at myClient as this is where the page fault happens. Please objdump myClient and find the page fault PC again.
Thanks.
Am 22.11.2014 um 15:37 schrieb ba_f:
Hello,
well i guess we're stuck here.
Because i can't find no 15c7a8 nor e12fff37 in myClient. I also checked the one shared lib myClient uses, without success.
Maybe i do objdump the wrong files?
arm-linux-gnueabihf-objdump -Dlx l4re-snapshot-2014022818/obj/l4/arm-ca/bin/arm_armv7a/l4f/myClient | less arm-linux-gnueabihf-objdump -Dlx l4re-snapshot-2014022818/obj/l4/arm-ca/lib/arm_armv7a/l4f/libClient.so | less
But hey, JDB tells me that at 15c7a8 there is the instruction e12fff37 <=> blx r7. And this is exactly what i get when objdump moe. PC and opcode match.
objdump -Dlx l4re-snapshot-2014022818/obj/l4/arm-ca/bin/arm_armv7a/l4f/moe | less
GC_try_to_collect_inner(): l4re-snapshot-2014022818/src/l4/pkg/boehm_gc/contrib/alloc.c:404 15c7a8: e12fff37 blx r7
Sounds logic to me, that BOEHM_GC runs into the fault...
Hi ba_f,
blx r7 is a false alarm, it cannot cause this type of write page fault. Even the instruction itself makes no sense since r7 has a value of 1.
Could you do the following: Insert a known write page fault into your client (maybe something like *(volatile int *)0x0=0xaffedead; ) and search for the pc in "objdump -d" on myClient. You can do the same with myServer. This should match and you will see opcode causing the write page fault.
Now enter JDB and dump the instruction @pc. This does not match the opcode caused the write page fault. Check the last line of the dump screen and you will see the reason: "dump: d<010001fc> physical".
Martin.
Hi Martin,
thank you, but i'm afraid, i haven't learnt the full lesson, yet.
Hi ba_f,
blx r7 is a false alarm, it cannot cause this type of write page fault. Even the instruction itself makes no sense since r7 has a value of 1.
Could you do the following: Insert a known write page fault into your client (maybe something like *(volatile int *)0x0=0xaffedead; ) and search for the pc in "objdump -d" on myClient. You can do the same with myServer. This should match and you will see opcode causing the write page fault.
Now enter JDB and dump the instruction @pc. This does not match the opcode caused the write page fault. Check the last line of the dump screen and you will see the reason: "dump: d<010001fc> physical".
Martin.
All right, i inserted the faulty instruction, and the familiar Error Message occurs.
myClient| L4Re[rm]: unhandled write page fault @0 pc=7cddc
Now, objdump does not show this PC in myClient or libClient. But as before, the PC is found in moe.
Lesson learnt, this PC is useless for me???
Ok back to myClient. Since i don't know the PC of *(volatile int *)0x0=0xaffedead; i search for the opcode and found this.
1fdd4: e30d3ead movw r3, #57005 ; 0xdead 1fdd8: e34a3ffe movt r3, #45054 ; 0xaffe
Weird again, i can't find one of this instructions in bootstrap.elf.
So still, i have no clue, how the PC in the Error Message shall lead me to the faulty instruction.
thanks for patience,
ba_f
Am 24.11.2014 um 23:37 schrieb ba_f:
Hi Martin,
thank you, but i'm afraid, i haven't learnt the full lesson, yet.
Hi ba_f,
blx r7 is a false alarm, it cannot cause this type of write page fault. Even the instruction itself makes no sense since r7 has a value of 1.
Could you do the following: Insert a known write page fault into your client (maybe something like *(volatile int *)0x0=0xaffedead; ) and search for the pc in "objdump -d" on myClient. You can do the same with myServer. This should match and you will see opcode causing the write page fault.
Now enter JDB and dump the instruction @pc. This does not match the opcode caused the write page fault. Check the last line of the dump screen and you will see the reason: "dump: d<010001fc> physical".
Martin.
All right, i inserted the faulty instruction, and the familiar Error Message occurs.
myClient| L4Re[rm]: unhandled write page fault @0 pc=7cddc
Now, objdump does not show this PC in myClient or libClient. But as before, the PC is found in moe.
Lesson learnt, this PC is useless for me???
Ok back to myClient. Since i don't know the PC of *(volatile int *)0x0=0xaffedead; i search for the opcode and found this.
1fdd4: e30d3ead movw r3, #57005 ; 0xdead 1fdd8: e34a3ffe movt r3, #45054 ; 0xaffe
Weird again, i can't find one of this instructions in bootstrap.elf.
So still, i have no clue, how the PC in the Error Message shall lead me to the faulty instruction.
Can you repeat his with the hello example:
int main(void) { for (;;) { puts("Hello World!"); *(volatile int *)0x0=0xaffedead; sleep(1); } }
When I run this in Qemu I get:
MOE: cmdline: moe --init=rom/hello MOE: Starting: rom/hello MOE: loading 'rom/hello' Hello World! L4Re[rm]: unhandled write page fault at 0x0 pc=0x10001fc L4Re: unhandled exception: pc=0x10001fc
===
id cpu name pr sp wait to state 1d 0 ----- 2 19 1a rcv_wait,exc_progr 1a 0 #hello ff 19 - rcv_wait a 0 moe ff 9 - rcv_wait 8 0 sigma0 1 7 - rcv_wait 6 1 ----- 0 1 ready 5 0 ----- 0 1 ready
===
thread : 1d <0xf11a4000> CPU: 0:0 prio: 02 state : 40008 rcv_wait,exc_progr wait for: 1a polling: rcv descr: 00000000 lcked by: timeout : cpu time: 13.000 ms timeslice: 2000/-1 �s pager : [C: 3] D: 1a task : D: 19 exc-hndl: [C: 3] D: 1a UTCB : f11da200/b3000200 vCPU : --- vCPU : ---
PC=010001fc USP=80007ef8 [0] 0000000d 00000001 00000004 0100952c [4] affedead 00000000 00000001 00000005 [8] 01018014 010001e0 00000005 80007f04 [c] 00000008 01009590 60000010 20000010
===
thread : 1a <0xf11a0000> CPU: 0:0 prio: ff state : 008 rcv_wait wait for: --- polling: rcv descr: 00000000 lcked by: timeout : cpu time: 27.000 ms timeslice: 2000/-1 �s pager : [C: 5] D: 16 task : D: 19 exc-hndl: [C: 5] D: 16 UTCB : f11da000/b3000000 vCPU : --- vCPU : ---
PC=b0001e78 USP=b1007d70 [0] 00000000 b3000000 fffff806 04000000 [4] 00000000 b001910c 00000000 fffffc18 [8] b00190e4 00000000 b1007d74 b1007d94 [c] 00000001 b0001e78 fffffff8 60000010
===
objdump -d pkg/hello/server/src/OBJ-arm_armv7a-l4f/hello
010001e0 <main>: 10001e0: e92d4830 push {r4, r5, fp, lr} 10001e4: e30d4ead movw r4, #57005 ; 0xdead 10001e8: e28db00c add fp, sp, #12 10001ec: e34a4ffe movt r4, #45054 ; 0xaffe 10001f0: e3a05000 mov r5, #0 10001f4: e59f0010 ldr r0, [pc, #16] ; 100020c <main+0x2c> 10001f8: eb0024a2 bl 1009488 <puts> 10001fc: e5854000 str r4, [r5] 1000200: e3a00001 mov r0, #1 1000204: eb001f4b bl 1007f38 <sleep> 1000208: eafffff9 b 10001f4 <main+0x14> 100020c: 01012b50 .word 0x01012b50
===
and pc=0x10001fc matches exactly the page fault.
Indeed, I didn't found this in objdump -d bootstrap.elf either.
Martin
Hillo,
Can you repeat his with the hello example:
int main(void) { for (;;) { puts("Hello World!"); *(volatile int *)0x0=0xaffedead; sleep(1); } } objdump -d pkg/hello/server/src/OBJ-arm_armv7a-l4f/hello
010001e0 <main>: 10001e0: e92d4830 push {r4, r5, fp, lr} 10001e4: e30d4ead movw r4, #57005 ; 0xdead 10001e8: e28db00c add fp, sp, #12 10001ec: e34a4ffe movt r4, #45054 ; 0xaffe 10001f0: e3a05000 mov r5, #0 10001f4: e59f0010 ldr r0, [pc, #16] ; 100020c <main+0x2c> 10001f8: eb0024a2 bl 1009488 <puts> 10001fc: e5854000 str r4, [r5] 1000200: e3a00001 mov r0, #1 1000204: eb001f4b bl 1007f38 <sleep> 1000208: eafffff9 b 10001f4 <main+0x14> 100020c: 01012b50 .word 0x01012b50
===
and pc=0x10001fc matches exactly the page fault.
alright, this works for me, too.
I get unhandled write page fault @0 pc=1000228. And objdump of hello shows the same instruction as yours, at PC.
Btw. i did execute on ARM directly, same as my actual project.
Now i revisited my project a found something. When i add the faulty code in myClient, i find the PC with objdump. But, when i put the faulty code in libClient, i.e. a shared lib which myClient uses, then i dont find the PC in myClient or libClient (but in moe).
So, the Error must happen somewhere in libClient. But the question is why the PC is so strange, and how may it lead me to the fault?
Thanks, ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Now i revisited my project a found something. When i add the faulty code in myClient, i find the PC with objdump. But, when i put the faulty code in libClient, i.e. a shared lib which myClient uses, then i dont find the PC in myClient or libClient (but in moe).
The fact that your PC has a value that is also valid in Moe does *not* imply that the code comes from Moe. Those are two completely distinct binaries executing in different address spaces.
What seems to happen is that your dynamic lib gets loaded to some free address range within myClient and due to the nature of dynamic loading your PC may end up anywhere the linker choses.
Debugging quickfix: link your program statically to identify the faulty location.
Alternative (not recommended before you tried the quickfix): obtain dynamic loading info using the LD_DEBUG environment variable and figure out which library gets mapped to the respective address.
Also, to really understand what is going on here: http://www.iecc.com/linker/
Bjoern
Am 26.11.2014 um 23:24 schrieb Björn Döbel:
The fact that your PC has a value that is also valid in Moe does *not* imply that the code comes from Moe. Those are two completely distinct binaries executing in different address spaces.
But is it possible in JDB to display memory of a certain thread in it's address space to see the faulty instruction?
Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
But is it possible in JDB to display memory of a certain thread in it's address space to see the faulty instruction?
'd' -> dump memory for task
Bjoern
Hello,
Debugging quickfix: link your program statically to identify the faulty location.
Unfortunately, no success neither.
I made libClients.a link with myClient. Still there is some dynamic linking, i couldn't get rid of. And MODE=shared must be set to build successfully. Is there a file, listing all files for the Linker? Maybe i could check there which .so is needed for building.
Objdump gives me the following, but i guess that comes from MODE=shared:
NEEDED libc_be_sig.so NEEDED libpthread.so NEEDED libld-l4.so NEEDED libdl.so NEEDED libc_support_misc.so NEEDED libc_be_socket_noop.so NEEDED lib4re-util.so NEEDED libc_be_l4refile.so NEEDED libc_be_l4re.so NEEDED libsupc++.so NEEDED libuc_c.so NEEDED lib4re.so NEEDED libl4util.so NEEDED libl4sys.so
Anyway...
Alternative (not recommended before you tried the quickfix): obtain dynamic loading info using the LD_DEBUG environment variable and figure out which library gets mapped to the respective address.
...this looks promising.
But, do you know how to get the output of LD_DEBUG? I run on an embedded ARM with UART-output.
I set 'LD_DEBUG=all' in the Makefile of myClient. Normally, i'd specify the output there. E.g. 'LD_DEBUG=all cat'
Thanks, ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 30.11.2014 19:59, ba_f wrote:
Hello,
Debugging quickfix: link your program statically to identify the faulty location.
Unfortunately, no success neither.
I made libClients.a link with myClient. Still there is some dynamic linking, i couldn't get rid of. And MODE=shared must be set to build successfully. Is there a file, listing all files for the Linker? Maybe i could check there which .so is needed for building.
No .so files are needed for static linking. What error are you getting if you remove MODE=shared?
Objdump gives me the following, but i guess that comes from MODE=shared:
NEEDED libc_be_sig.so NEEDED libpthread.so NEEDED libld-l4.so NEEDED libdl.so NEEDED libc_support_misc.so NEEDED libc_be_socket_noop.so NEEDED lib4re-util.so NEEDED libc_be_l4refile.so NEEDED libc_be_l4re.so NEEDED libsupc++.so NEEDED libuc_c.so NEEDED lib4re.so NEEDED libl4util.so NEEDED libl4sys.so
Anyway...
There are static library equivalents for all those. Sounds like setting the proper REQUIRES_LIBS in your Makefile should do the trick.
Alternative (not recommended before you tried the quickfix): obtain dynamic loading info using the LD_DEBUG environment variable and figure out which library gets mapped to the respective address.
...this looks promising.
But, do you know how to get the output of LD_DEBUG? I run on an embedded ARM with UART-output.
I set 'LD_DEBUG=all' in the Makefile of myClient. Normally, i'd specify the output there. E.g. 'LD_DEBUG=all cat'
LD_DEBUG is an environment variable, so specifying it at compile time in the Makefile is not useful. Instead, add it as an environment variable to your setup:
L4.default_loader.start( { caps ...}, "rom/myProgram", { LD_DEBUG=all } };
Again, I would go for the static linking route.
Bjoern
Hi,
Objdump gives me the following, but i guess that comes from MODE=shared:
NEEDED libc_be_sig.so NEEDED libpthread.so NEEDED libld-l4.so NEEDED libdl.so NEEDED libc_support_misc.so NEEDED libc_be_socket_noop.so NEEDED lib4re-util.so NEEDED libc_be_l4refile.so NEEDED libc_be_l4re.so NEEDED libsupc++.so NEEDED libuc_c.so NEEDED lib4re.so NEEDED libl4util.so NEEDED libl4sys.so
There are static library equivalents for all those. Sounds like setting the proper REQUIRES_LIBS in your Makefile should do the trick.
What error are you getting if you remove MODE=shared?
There are Library dependencies missing: libld-l4 lib4re-util libuc_c lib4re libl4util libl4sys.
Although, the .a files exist.
Nevertheless, i can't find any .pc file in obj/pc/ for this libs; opposite to the other REQUIRED_LIBS that the make-process does find. I don't know, why .pc files haven't been built.
Btw, i had to build libld-l4.a. By default, it only sets TARGET=libld-l4.so
LD_DEBUG is an environment variable, so specifying it at compile time in the Makefile is not useful. Instead, add it as an environment variable to your setup:
L4.default_loader.start( { caps ...}, "rom/myProgram", { LD_DEBUG=all } };
Again, I would go for the static linking route.
Have you tried this before? I don't get any output from LD_DEBUG.
Thanks again, ba_f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 02.12.2014 12:36, ba_f wrote:
Hi,
Objdump gives me the following, but i guess that comes from MODE=shared:
NEEDED libc_be_sig.so NEEDED libpthread.so NEEDED libld-l4.so NEEDED libdl.so NEEDED libc_support_misc.so NEEDED libc_be_socket_noop.so NEEDED lib4re-util.so NEEDED libc_be_l4refile.so NEEDED libc_be_l4re.so NEEDED libsupc++.so NEEDED libuc_c.so NEEDED lib4re.so NEEDED libl4util.so NEEDED libl4sys.so
There are static library equivalents for all those. Sounds like setting the proper REQUIRES_LIBS in your Makefile should do the trick.
What error are you getting if you remove MODE=shared?
There are Library dependencies missing: libld-l4 lib4re-util libuc_c lib4re libl4util libl4sys.
Although, the .a files exist.
Can you please share the whole Makefile? These are very standard L4Re libraries and they are usually linked against all programs.
Nevertheless, i can't find any .pc file in obj/pc/ for this libs; opposite to the other REQUIRED_LIBS that the make-process does find. I don't know, why .pc files haven't been built.
Btw, i had to build libld-l4.a. By default, it only sets TARGET=libld-l4.so
libld-l4 is the dynamic linker. There is no need to build a static version of it as statically linked binaries do not require a dynamic linker.
LD_DEBUG is an environment variable, so specifying it at compile time in the Makefile is not useful. Instead, add it as an environment variable to your setup:
L4.default_loader.start( { caps ...}, "rom/myProgram", { LD_DEBUG=all } };
Again, I would go for the static linking route.
Have you tried this before? I don't get any output from LD_DEBUG.
Try
{ LD_DEBUG="all" }
please.
Bjoern
Hello there,
i guess, it's solved. I just looked at your very first answers, suggesting to increase the stack-size. http://os.inf.tu-dresden.de/pipermail/l4-hackers/2014/007049.html
Don't know why it didn't work in the first place. Maybe, i placed the code at some stupid place...
Anyway, can you tell me what the default stack-size is? 32kB?
-------------------------------
There are Library dependencies missing: libld-l4 lib4re-util libuc_c lib4re libl4util libl4sys.
Although, the .a files exist.
Can you please share the whole Makefile? These are very standard L4Re libraries and they are usually linked against all programs.
Ah, didn't know they are linked anyway. So one must not list those on the REQUIRES_LIBS, then it works.
Try
{ LD_DEBUG="all" }
please.
Great, this works.
-----------------
Now, this was fun...
Thanks for all the help and patience.
l4-hackers@os.inf.tu-dresden.de