Hi Paul,
On 07/18/2017 01:43 AM, Paul Boddie wrote:
Well, I haven't really figured this out at all. I thought it might be useful to investigate what the message actually represents. First of all, it originates from the Dispatcher::dispatch method in...
pkg/l4re-core/l4re_kernel/server/src/dispatcher.cc
I think the "tag" breaks down into something like this:
0xfffb1026 -> label=0xfffb (-5) -> L4_PROTO_EXCEPTION flags=0x1 -> L4_MSGTAG_TRANSFER_FPU items=0x00 words=0x26 (38)
(Reference: pkg/l4re-core/l4sys/include/types.h)
The code performing this logging doesn't indicate what the result of the message dispatch was, so I added a trace statement to see, which yielded this "tag" information:
0xfc170000 -> label=0xfc17 (-1001) -> L4_EMSGTOOSHORT flags=0x0 items=0x00 words=0x00
There is a bug in the Fiasco where it sends the wrong message size. Please apply the attached patch to Fiasco. Afterwards you should get more useful error messages in your L4 applications when it throws exceptions.
Attempting to determine the nature of the supposed exception, I managed to discover that...
l4_utcb_exc_is_pf returns 1 (page fault) l4_utcb_exc_pfa returns 0x800d1308 (which is a kernel mode address on MIPS)
The program counter is given as 0x7000049c, with the exception cause being decoded from 0x10 to be interpreted as an "exception code value" of 4 in the CP0_CAUSE register (address error, load or instruction fetch).
I thought that enabling more logging in sigma0 might help, presuming that the page fault would be propagated through the pager hierarchy. But changing debug_ipc to 1 in...
An address error generally means that you are trying to access a bad address (which would be the case with the PFA given above). This is different from a normal page fault, which corresponds to TLB exceptions. That is why sigma0 is not involved.
Exceptions are directly sent to the exception handler which in a standard L4 application is the thread started first (l4re-kernel thread) or, if that one fails, the launcher (moe in your case).
...indicated that sigma0 is not involved when the above requests are made and dispatched: there is a lot of logging from sigma0, but logging from Moe takes over after a certain point. And I don't see any logging from Moe when these page faults occur: they are described using the "L4Re[svr]" prefix, as shown above.
So, I don't really have much more to go on, here. There's a chance that my rdhwr instruction support introduced a bug, I suppose, even though I've read through that code several times and can't see anything obviously wrong with it. I do wonder whether the initialisation routine for other programs is initialising the t9 register improperly, as noted previously.
The t9 issue is a likely cause. There are a couple of places where .cpload is used.
But then, I don't understand why the erroneously-initialised program isn't just terminated when its page fault can't be handled.
That is the standard behaviour and the attached patch hopefully brings it back.
Kind regards
Sarah
Although I've seen a fair amount of the L4Re internals now, I don't think I have any productive way of finding the problem here, unfortunately. I guess this exercise has provided some way of getting a "tour" of the framework, and maybe that will be useful in the future, but I had hoped that this board was already supported to the point of already running the example programs.
Paul
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers