And some more...
On Sunday 16. July 2017 02.24.03 Paul Boddie wrote:
But with the --l4re-dbg option set to "all", after this output...
L4Re: load binary 'rom/hello' L4Re: Start server loop
...I notice this continuously recurring message:
L4Re[svr]: request: tag=0xfffb1026 proto=-5 obj=0x0
(Note that this is the only thing that occurs, doing so endlessly and very frequently.)
Well, I haven't really figured this out at all. I thought it might be useful to investigate what the message actually represents. First of all, it originates from the Dispatcher::dispatch method in...
pkg/l4re-core/l4re_kernel/server/src/dispatcher.cc
I think the "tag" breaks down into something like this:
0xfffb1026 -> label=0xfffb (-5) -> L4_PROTO_EXCEPTION flags=0x1 -> L4_MSGTAG_TRANSFER_FPU items=0x00 words=0x26 (38)
(Reference: pkg/l4re-core/l4sys/include/types.h)
The code performing this logging doesn't indicate what the result of the message dispatch was, so I added a trace statement to see, which yielded this "tag" information:
0xfc170000 -> label=0xfc17 (-1001) -> L4_EMSGTOOSHORT flags=0x0 items=0x00 words=0x00
As far as I can tell (four or so invocations are traversed), this might be produced in the handle_svr_obj_call function in...
pkg/l4re-core/l4sys/include/cxx/ipc_server
I would try and add some debugging statements here as well, but doing so seems to cause a cascade of library requirements across a range of components. So, although I might suspect that the request is malformed in some way, I have no firm idea that this is the case. However, one test of it is the following:
tag.words() + tag.items() * Item_words > Mr_words
Here, the left hand side of the comparison is 38 whereas Mr_words is apparently 63, so this test does not identify the cause of the problem.
Meanwhile, I looked into the nature of the request and found the UTCB-related functions in...
pkg/l4re-core/l4sys/include/utcb.h
Attempting to determine the nature of the supposed exception, I managed to discover that...
l4_utcb_exc_is_pf returns 1 (page fault) l4_utcb_exc_pfa returns 0x800d1308 (which is a kernel mode address on MIPS)
The program counter is given as 0x7000049c, with the exception cause being decoded from 0x10 to be interpreted as an "exception code value" of 4 in the CP0_CAUSE register (address error, load or instruction fetch).
I thought that enabling more logging in sigma0 might help, presuming that the page fault would be propagated through the pager hierarchy. But changing debug_ipc to 1 in...
pkg/l4re-core/sigma0/server/src/globals.h
...indicated that sigma0 is not involved when the above requests are made and dispatched: there is a lot of logging from sigma0, but logging from Moe takes over after a certain point. And I don't see any logging from Moe when these page faults occur: they are described using the "L4Re[svr]" prefix, as shown above.
So, I don't really have much more to go on, here. There's a chance that my rdhwr instruction support introduced a bug, I suppose, even though I've read through that code several times and can't see anything obviously wrong with it. I do wonder whether the initialisation routine for other programs is initialising the t9 register improperly, as noted previously. But then, I don't understand why the erroneously-initialised program isn't just terminated when its page fault can't be handled.
Although I've seen a fair amount of the L4Re internals now, I don't think I have any productive way of finding the problem here, unfortunately. I guess this exercise has provided some way of getting a "tour" of the framework, and maybe that will be useful in the future, but I had hoped that this board was already supported to the point of already running the example programs.
Paul