Booting on Raspberry Pi

Tue Sep 17 19:53:45 CEST 2013

On Tue Sep 17, 2013 at 19:33:00 +0200, Robert Kaiser wrote:
> Am 16.09.2013 19:21, schrieb Robert Kaiser:
> > Hallo Adam
> > 
> > thanks for your helpful response
> > 
> > Am 09/15/13 16:11, schrieb Adam Lackorzynski:
> >> On Thu Sep 12, 2013 at 17:13:07 +0200, Robert Kaiser wrote:
> >>> Adam Lackorzynski wrote:
> >>>>> Unfortunately, it *still* doesn't work. The last messages I see trying
> >>>>> to run the bootstrap_hello example are:
> >>>>>
> >>>>> MOE: cmdline: moe --init=rom/hello
> >>>>> MOE: Starting: rom/hello
> >>>>> MOE: loading 'rom/hello'
> >>>>> L4Re: unhandled exception: pc=0xffffff9c
> >>>>>
> >>>>> Any hints what could be wrong now?
> >>>> Would be interesting to know where this is coming from (lr). Anyway,
> >>>> this does not look so bad because quite a few things have happened
> >>>> again.
> >>> I agree. (My problem here is that I am only just learning how to use
> >>> JDB.) With pagefault monitoring enabled, the last lines of output look
> >>> like this:
> >>>
> >>> ......
> >>> pf:  001d pfa=010191a4 ip=0100a7c8 (r-) spc=0xf12e56fc err=410007
> >>>
> >>> pf:  001d pfa=000012e0 ip=0100a830 (w-) spc=0xf12e56fc err=410807
> >>>
> >>> pf:  001a pfa=b000f070 ip=b000f070 (r-) spc=0xf12e56fc err=330007
> >>>
> >>> L4Re: unhandled exception: pc=0xffffff9c
> >>>
> >>> Am I right to interpret this as "last pagefault occured due to an opcode
> >>> fetch at virtual address b000f070"? AFAIK, none of the modules in the
> >> Yes.
> >>
> >>> image has its text segment in the b0000000 range, so this must be the
> >>> unhandled exception L4Re complains about (but if so, why does it say
> >>> pc=0xffffff9c?).
> >> The 'l4re' binary is linked to b0000000, so the pagefault looks ok. It's
> >> your lokal region manager.
> >>
> >>> spc=0xf12e56fc would be the faulting thread's number, right?
> >> That's the space aka task. 0x1d and 0x1a are the threads. Check with
> >> 'lp'.
> >>
> >>> Giving an "s" command, I get:
> >>>
> >>>        1 f00567b8 [Task   ] {KERNEL} R=2
> >>>        7 f12e5770 [Task   ] {sigma0          } R=3
> >>>        9 f12e5720 [Task   ] {moe             } R=3
> >>>       19 f12e56d0 [Task   ] {hello           } R=3
> >>>
> >>>
> >>> The thread number, f12e56fc, does not appear. It is closeest to
> >>> f12e56d0, but does that really mean the fault happened in the hello task?
> >> It happened in the hello task because that output can only come from
> >> hello in your setup, and the thread numbers indicate that too.
> >>
> >>> I would like to derive the program address where the fault occurs from
> >>> this, but frankly, not being familiar with JDB I'm at a loss here.
> >> In 'lp', press enter on the 1d thread, that will give you the tcb view
> >> in which you can see the registers for example.
> > Ahaaa!
> > 
> > Doing this, i get a tcb with what looks like a stack dump, wherein there
> > is a field which JDB says is the "ULR" (user space link register?). Its
> > value is 0x100bb20. Dissassembling the neighborhood of that location, I get:
> > ....
> > 0100bb0c     bl   
> > 0100bb10     mvn    ip, #127    ; 0x7f
> > 0100bb14     str    r8, [r0, #500]
> > 0100bb18     mov    r0, r8
> > 0100bb1c     blx    ip
> > 0100bb20     str    r5, [r4, #544]
> > ....
> > 
> > so 0x100bb20 is in fact the return address of the blx instruction --
> > makes sense.
> > 
> > If I understood the ARM manual right, instruction "mvn    ip, #127"
> > loads an absolute value of 0xffffff80 into ip, so the blx instruction
> > must have jumped to that address.
> > 
> > disassembling that address gives me
> > 
> > ffffff80     push       {r4, lr}
> > ffffff84     mrc    15, 0, r4, cr13, cr0, {2}
> > ffffff88     str    r0, [r4, #4]
> > ffffff8c     mov    r2, #167; 0x10
> > ffffff90     str    r2, [r4]
> > ffffff94     mov    r3, #0    ; 0x0
> > ffffff98     movw       r2, #63491    ; 0xf803
> > ffffff9c     mov    r0, #24,; 0x2
> > ffffffa0     movt       r2, #65535540]  ; 0xffff
> > 
> > .. and 0xffffff9c is in fact the address where the fault happened!
> > 
> > 
> > 
> >>
> >>> JDB Single stepping does not seem to work on ARM platforms.
> >> Indeed that does not work.
> > 
> > do breakpoints work?
> > 
> >>
> >>>  For which architecture version have you been building?
> >> Looks good.
> >>
> >>
> >> The problem is in the kernel-provided code that uses instructions that
> >> are incompatible with rpi's CPU. 
> > So that would be the instruction at 0xffffff9c, right?
> > 
> > ffffff9c     mov    r0, #24,; 0x2
> > 
> > This disassembly looks a little strange, maybe not only the CPU but also
> > the disassembler is choking on this opcode.
> > 
> > Now, how do I find the place in the source code corresponding to this
> > instruction?
> > 
> > (Disassembling fiasco.image doesnt help -- it ends long before that address)
> > 
> >> I'll fix it.
> >>
> > 
> > I can't wait to see your fix! Please let me know ASAP. If you need any
> > more input from my side, just tell me what to do.
> 
> Yay! Got it working ! :-)
> 
> The offending instructions are movt and movw. The code in
> sys_call_page-arm.cpp constructs a syscall entry sequence which uses
> these instructions. (How can this ever have worked on the RPi?)

This code is new and has never worked on the rpi, so thanks for pointing
that out.

> Anyway, here is my suggestion for a patch:

I've done something similar in the meantime but wasn't so quick...

> With this patch applied, my Raspberry Pi now happily prints "Hello
> World!" (Strange how something as unspectacular as that can make someone
> really happy ;-))

I know that feeling :)

Adam
-- 
Adam                 adam at os.inf.tu-dresden.de
  Lackorzynski         http://os.inf.tu-dresden.de/~adam/