Re: Booting L4Re on the CI20: Panic in sigma0

18 Jul 2017

      Hi Paul,

On 07/18/2017 01:43 AM, Paul Boddie wrote:
...
Well, I haven't really figured this out at all. I thought it might be useful 
to investigate what the message actually represents. First of all, it 
originates from the Dispatcher::dispatch method in...
pkg/l4re-core/l4re_kernel/server/src/dispatcher.cc
I think the "tag" breaks down into something like this:
0xfffb1026 -> label=0xfffb (-5) -> L4_PROTO_EXCEPTION
                flags=0x1 -> L4_MSGTAG_TRANSFER_FPU
                items=0x00
                words=0x26 (38)
(Reference: pkg/l4re-core/l4sys/include/types.h)
The code performing this logging doesn't indicate what the result of the 
message dispatch was, so I added a trace statement to see, which yielded this 
"tag" information:
0xfc170000 -> label=0xfc17 (-1001) -> L4_EMSGTOOSHORT
                flags=0x0
                items=0x00
                words=0x00
There is a bug in the Fiasco where it sends the wrong message size.
Please apply the attached patch to Fiasco. Afterwards you should get
more useful error messages in your L4 applications when it throws
exceptions.
...
Attempting to determine the nature of the supposed exception, I managed to 
discover that...
l4_utcb_exc_is_pf returns 1 (page fault)
l4_utcb_exc_pfa returns 0x800d1308 (which is a kernel mode address on MIPS)
The program counter is given as 0x7000049c, with the exception cause being 
decoded from 0x10 to be interpreted as an "exception code value" of 4 in the 
CP0_CAUSE register (address error, load or instruction fetch).
I thought that enabling more logging in sigma0 might help, presuming that the 
page fault would be propagated through the pager hierarchy. But changing 
debug_ipc to 1 in...
An address error generally means that you are trying to access a bad
address (which would be the case with the PFA given above). This is
different from a normal page fault, which corresponds to TLB exceptions.
That is why sigma0 is not involved.

Exceptions are directly sent to the exception handler which in a
standard L4 application is the thread started first (l4re-kernel thread)
or, if that one fails, the launcher (moe in your case).
...
...indicated that sigma0 is not involved when the above requests are made and 
dispatched: there is a lot of logging from sigma0, but logging from Moe takes 
over after a certain point. And I don't see any logging from Moe when these 
page faults occur: they are described using the "L4Re[svr]" prefix, as shown 
above.
So, I don't really have much more to go on, here. There's a chance that my 
rdhwr instruction support introduced a bug, I suppose, even though I've read 
through that code several times and can't see anything obviously wrong with 
it. I do wonder whether the initialisation routine for other programs is 
initialising the t9 register improperly, as noted previously.
The t9 issue is a likely cause. There are a couple of places where
.cpload is used.
...
But then, I 
don't understand why the erroneously-initialised program isn't just terminated 
when its page fault can't be handled.
That is the standard behaviour and the attached patch hopefully brings
it back.

Kind regards

Sarah
...
Although I've seen a fair amount of the L4Re internals now, I don't think I 
have any productive way of finding the problem here, unfortunately. I guess 
this exercise has provided some way of getting a "tour" of the framework, and 
maybe that will be useful in the future, but I had hoped that this board was 
already supported to the point of already running the example programs.
Paul
_______________________________________________
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
-- 
Sarah Hoffmann, sarah.hoffmann@kernkonzept.com

Kernkonzept GmbH, Dresden, Germany
https://kernkonzept.com/