Booting L4Re on the CI20: Panic in sigma0
paul at boddie.org.uk
Tue Jul 18 01:43:24 CEST 2017
And some more...
On Sunday 16. July 2017 02.24.03 Paul Boddie wrote:
> But with the --l4re-dbg option set to "all", after this output...
> L4Re: load binary 'rom/hello'
> L4Re: Start server loop
> ...I notice this continuously recurring message:
> L4Re[svr]: request: tag=0xfffb1026 proto=-5 obj=0x0
(Note that this is the only thing that occurs, doing so endlessly and very
Well, I haven't really figured this out at all. I thought it might be useful
to investigate what the message actually represents. First of all, it
originates from the Dispatcher::dispatch method in...
I think the "tag" breaks down into something like this:
0xfffb1026 -> label=0xfffb (-5) -> L4_PROTO_EXCEPTION
flags=0x1 -> L4_MSGTAG_TRANSFER_FPU
The code performing this logging doesn't indicate what the result of the
message dispatch was, so I added a trace statement to see, which yielded this
0xfc170000 -> label=0xfc17 (-1001) -> L4_EMSGTOOSHORT
As far as I can tell (four or so invocations are traversed), this might be
produced in the handle_svr_obj_call function in...
I would try and add some debugging statements here as well, but doing so seems
to cause a cascade of library requirements across a range of components. So,
although I might suspect that the request is malformed in some way, I have no
firm idea that this is the case. However, one test of it is the following:
tag.words() + tag.items() * Item_words > Mr_words
Here, the left hand side of the comparison is 38 whereas Mr_words is
apparently 63, so this test does not identify the cause of the problem.
Meanwhile, I looked into the nature of the request and found the UTCB-related
Attempting to determine the nature of the supposed exception, I managed to
l4_utcb_exc_is_pf returns 1 (page fault)
l4_utcb_exc_pfa returns 0x800d1308 (which is a kernel mode address on MIPS)
The program counter is given as 0x7000049c, with the exception cause being
decoded from 0x10 to be interpreted as an "exception code value" of 4 in the
CP0_CAUSE register (address error, load or instruction fetch).
I thought that enabling more logging in sigma0 might help, presuming that the
page fault would be propagated through the pager hierarchy. But changing
debug_ipc to 1 in...
...indicated that sigma0 is not involved when the above requests are made and
dispatched: there is a lot of logging from sigma0, but logging from Moe takes
over after a certain point. And I don't see any logging from Moe when these
page faults occur: they are described using the "L4Re[svr]" prefix, as shown
So, I don't really have much more to go on, here. There's a chance that my
rdhwr instruction support introduced a bug, I suppose, even though I've read
through that code several times and can't see anything obviously wrong with
it. I do wonder whether the initialisation routine for other programs is
initialising the t9 register improperly, as noted previously. But then, I
don't understand why the erroneously-initialised program isn't just terminated
when its page fault can't be handled.
Although I've seen a fair amount of the L4Re internals now, I don't think I
have any productive way of finding the problem here, unfortunately. I guess
this exercise has provided some way of getting a "tour" of the framework, and
maybe that will be useful in the future, but I had hoped that this board was
already supported to the point of already running the example programs.
More information about the l4-hackers