Booting on Raspberry Pi

Tue Sep 17 19:33:00 CEST 2013

Hi,

Am 16.09.2013 19:21, schrieb Robert Kaiser:
> Hallo Adam
> 
> thanks for your helpful response
> 
> Am 09/15/13 16:11, schrieb Adam Lackorzynski:
>> On Thu Sep 12, 2013 at 17:13:07 +0200, Robert Kaiser wrote:
>>> Adam Lackorzynski wrote:
>>>>> Unfortunately, it *still* doesn't work. The last messages I see trying
>>>>> to run the bootstrap_hello example are:
>>>>>
>>>>> MOE: cmdline: moe --init=rom/hello
>>>>> MOE: Starting: rom/hello
>>>>> MOE: loading 'rom/hello'
>>>>> L4Re: unhandled exception: pc=0xffffff9c
>>>>>
>>>>> Any hints what could be wrong now?
>>>> Would be interesting to know where this is coming from (lr). Anyway,
>>>> this does not look so bad because quite a few things have happened
>>>> again.
>>> I agree. (My problem here is that I am only just learning how to use
>>> JDB.) With pagefault monitoring enabled, the last lines of output look
>>> like this:
>>>
>>> ......
>>> pf:  001d pfa=010191a4 ip=0100a7c8 (r-) spc=0xf12e56fc err=410007
>>>
>>> pf:  001d pfa=000012e0 ip=0100a830 (w-) spc=0xf12e56fc err=410807
>>>
>>> pf:  001a pfa=b000f070 ip=b000f070 (r-) spc=0xf12e56fc err=330007
>>>
>>> L4Re: unhandled exception: pc=0xffffff9c
>>>
>>> Am I right to interpret this as "last pagefault occured due to an opcode
>>> fetch at virtual address b000f070"? AFAIK, none of the modules in the
>> Yes.
>>
>>> image has its text segment in the b0000000 range, so this must be the
>>> unhandled exception L4Re complains about (but if so, why does it say
>>> pc=0xffffff9c?).
>> The 'l4re' binary is linked to b0000000, so the pagefault looks ok. It's
>> your lokal region manager.
>>
>>> spc=0xf12e56fc would be the faulting thread's number, right?
>> That's the space aka task. 0x1d and 0x1a are the threads. Check with
>> 'lp'.
>>
>>> Giving an "s" command, I get:
>>>
>>>        1 f00567b8 [Task   ] {KERNEL} R=2
>>>        7 f12e5770 [Task   ] {sigma0          } R=3
>>>        9 f12e5720 [Task   ] {moe             } R=3
>>>       19 f12e56d0 [Task   ] {hello           } R=3
>>>
>>>
>>> The thread number, f12e56fc, does not appear. It is closeest to
>>> f12e56d0, but does that really mean the fault happened in the hello task?
>> It happened in the hello task because that output can only come from
>> hello in your setup, and the thread numbers indicate that too.
>>
>>> I would like to derive the program address where the fault occurs from
>>> this, but frankly, not being familiar with JDB I'm at a loss here.
>> In 'lp', press enter on the 1d thread, that will give you the tcb view
>> in which you can see the registers for example.
> Ahaaa!
> 
> Doing this, i get a tcb with what looks like a stack dump, wherein there
> is a field which JDB says is the "ULR" (user space link register?). Its
> value is 0x100bb20. Dissassembling the neighborhood of that location, I get:
> ....
> 0100bb0c     bl   
> 0100bb10     mvn    ip, #127    ; 0x7f
> 0100bb14     str    r8, [r0, #500]
> 0100bb18     mov    r0, r8
> 0100bb1c     blx    ip
> 0100bb20     str    r5, [r4, #544]
> ....
> 
> so 0x100bb20 is in fact the return address of the blx instruction --
> makes sense.
> 
> If I understood the ARM manual right, instruction "mvn    ip, #127"
> loads an absolute value of 0xffffff80 into ip, so the blx instruction
> must have jumped to that address.
> 
> disassembling that address gives me
> 
> ffffff80     push       {r4, lr}
> ffffff84     mrc    15, 0, r4, cr13, cr0, {2}
> ffffff88     str    r0, [r4, #4]
> ffffff8c     mov    r2, #167; 0x10
> ffffff90     str    r2, [r4]
> ffffff94     mov    r3, #0    ; 0x0
> ffffff98     movw       r2, #63491    ; 0xf803
> ffffff9c     mov    r0, #24,; 0x2
> ffffffa0     movt       r2, #65535540]  ; 0xffff
> 
> .. and 0xffffff9c is in fact the address where the fault happened!
> 
> 
> 
>>
>>> JDB Single stepping does not seem to work on ARM platforms.
>> Indeed that does not work.
> 
> do breakpoints work?
> 
>>
>>>  For which architecture version have you been building?
>> Looks good.
>>
>>
>> The problem is in the kernel-provided code that uses instructions that
>> are incompatible with rpi's CPU. 
> So that would be the instruction at 0xffffff9c, right?
> 
> ffffff9c     mov    r0, #24,; 0x2
> 
> This disassembly looks a little strange, maybe not only the CPU but also
> the disassembler is choking on this opcode.
> 
> Now, how do I find the place in the source code corresponding to this
> instruction?
> 
> (Disassembling fiasco.image doesnt help -- it ends long before that address)
> 
>> I'll fix it.
>>
> 
> I can't wait to see your fix! Please let me know ASAP. If you need any
> more input from my side, just tell me what to do.

Yay! Got it working ! :-)

The offending instructions are movt and movw. The code in
sys_call_page-arm.cpp constructs a syscall entry sequence which uses
these instructions. (How can this ever have worked on the RPi?)

Anyway, here is my suggestion for a patch:

--- src/kernel/fiasco/src/kern/arm/sys_call_page-arm.cpp.orig
2013-09-17 19:10:18.773107154 +0200
+++ src/kernel/fiasco/src/kern/arm/sys_call_page-arm.cpp
2013-09-17 19:10:18.773107154 +0200
@@ -40,10 +40,20 @@
   sys_calls[offset++] = 0xe3a02010; // mov     r2, #0x10 -> set tls opcode
   sys_calls[offset++] = 0xe5842000; // str     r2, [r4]
   sys_calls[offset++] = 0xe3a03000; // mov     r3, #0
+#ifdef CONFIG_ARM_1176
+  sys_calls[offset++] = 0xe3a02003; // mov     r2, #3
+  sys_calls[offset++] = 0xe3a00002; // mov     r0, #2
+  sys_calls[offset++] = 0xe38224ff; // orr     r2, #0xff000000
+  sys_calls[offset++] = 0xe38004ff; // orr     r0, #0xff000000
+  sys_calls[offset++] = 0xe38228ff; // orr     r2, #0x00ff0000
+  sys_calls[offset++] = 0xe380073d; // orr     r0, #0x00f40000
+  sys_calls[offset++] = 0xe3822b3e; // orr     r2, #0x0000f800
+#else
   sys_calls[offset++] = 0xe30f2803; // movw    r2, #0xf803
   sys_calls[offset++] = 0xe3a00002; // mov     r0, #2
   sys_calls[offset++] = 0xe34f2fff; // movt    r2, #0xffff
   sys_calls[offset++] = 0xe34f0ff4; // movt    r0, #0xfff4
+#endif
   sys_calls[offset++] = 0xe1a0e00f; // mov     lr, pc
   sys_calls[offset++] = 0xe3e0f00b; // mvn     pc, #11
   sys_calls[offset++] = 0xe8bd8010; // pop     {r4, pc}


With this patch applied, my Raspberry Pi now happily prints "Hello
World!" (Strange how something as unspectacular as that can make someone
really happy ;-))

Is the patch OK? If so: please apply.

Thanks, Adam,  for your help! Without it, I would never have found this.

Cheers

Robert