Creating tasks and the l4_task_map function

Philipp Eppelt philipp.eppelt at kernkonzept.com
Thu Sep 29 13:55:49 CEST 2022


Hi Paul,

I believe I have an idea of what you are observing.

>>> 00 00 00 00 ce e3 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>>> 00 ...
>>> ----------- ----------- ----------------------- -----------------------
>>> opcode      ???         offset                  hot_spot


On a 32bit architecture, the opcode is 32bit, the ??? is 32bit padding 
before the offset, which is - also on a 32bit architecture - a 64bit 
value and needs to be aligned to 64bit.
The client has to adhere to the platform specific alignment constraints 
of the members in the data structure, as does the server side.

 From what I have understood, you are implementing a dataspace provider 
and are observing the provider generating the error (-39/-L4_EBADPROTO)?

EBADPROTO normaly means that the protocol value in the l4_msgtag_t is 
not supported by the server.
In case of an unsupported opcode servers reports L4_ENOSYS.
Thus, I'm a bit confused about the EBADPROTO error code and the 
opcode/MR issue. What am I missing?

Cheers
Philipp


On 9/28/22 18:36, Philipp Eppelt wrote:
> Hi Paul,
> 
> thanks for the detailed explanation and sorry for the long wait. I waded 
> through the templates and am wondering how this happens. I too am of the 
> opinion that on a 32bit platform the opcode is a 32-bit number and it 
> always is written to the first field in the MRs. (Currently, I'd say 
> it's an int).
> 
> I need to build a test for to reproduce this to figure out what happens 
> within the templates, but I'm not sure if I understood the details of 
> your explanation, so let me put it in my own words:
> 
> * You observe the opcode of a Dataspace::map operation to be written 
> into a 64bit field in the MRs, but only in the lower 32bit. The upper 
> 32bit remain the old value.
> 
> * This happens only with IPC using the RPC framework. In IPC using code 
> that is written by hand - like L4::Factory.create() or L4::Task.map()  - 
> the opcode-field is a 32bit MR; on 64-bit architectures this is a 64bit 
> value and field respectively.
> 
> * As far as I understood you are writing the client-side of the 
> Dataspace.map() request yourself and do not use the RPC framework? With 
> that I mean writing the opcode to mr[0], the offset to mr[1], and so on.
> 
> But the server side then reads the mr[0] and mr[1] together as one 64bit 
> value and then replies with EBADPROTO?
> 
> On this last point I'm a bit lost on which side does what exactly. Can 
> you maybe write a bit of pseudo code on what happens on client and what 
> on server side to help me understand?
> 
> Cheers
> Philipp
> 
> 
> On 9/18/22 23:15, Paul Boddie wrote:
>> On Monday, 29 August 2022 00:37:31 CEST Paul Boddie wrote:
>>> On Sunday, 28 August 2022 23:07:38 CEST Adam Lackorzynski wrote:
>>>> On Sun Aug 21, 2022 at 00:18:57 +0200, Paul Boddie wrote:
>>>>> I just spent quite some time seeing errors like this...
>>>>>
>>>>> ext2svr | L4Re[rm]: mapping for page fault failed with error -39 at
>>>>> 0x1002fbc00 pc=0x10b7804
>>>>> ext2svr | L4Re: rom/ext2_server: Unhandled exception: PC=0x10b7804
>>>>> PFA=0x1002fbc00 LdrFlgs=0x0
>>>>>
>>>>> -39 being -L4_EBADPROTO (unsupported protocol), of course.
>>
>> [...]
>>
>>>>> Do you have any ideas as to why the first message register gets
>>>>> corrupted?
>>>>
>>>> No, still not. Any chance I could see a small example of this?
>>>
>>> I'll try and package up what I've been doing so that it can be more 
>>> readily
>>> investigated. I was actually in the middle of this packaging process 
>>> when I
>>> discovered the problem once again.
>>
>> Following up, I decided to give my code a test in 32-bit x86 and MIPS 
>> virtual
>> machines which caused the problem to be much more pronounced. This led 
>> me to
>> review a few things where I had misread the definitions of certain 
>> types (in
>> pkg/l4re-core/l4sys/include/l4int.h). However, I think that the nature 
>> of the
>> problem is actually as follows.
>>
>> When a map request is sent by the L4Re region mapper, the IPC 
>> framework pieces
>> together the necessary message. What isn't entirely obvious is the 
>> nature of
>> the opcode being used. I originally thought that it was of type 
>> l4_umword_t,
>> and dumping the bytes in the message, it does appear that the opcode is
>> actually a 32-bit value (compatible with l4_umword_t on a 32-bit 
>> platform) but
>> that the first operand only appears after an initial 64-bit unit 
>> containing
>> the opcode, even on a 32-bit platform.
>>
>> This might not produce problems on 64-bit platforms, although my original
>> report did concern such a platform, but problems are immediately 
>> evident on a
>> 32-bit platform. For example, here is a map request on x86 that caused
>> problems:
>>
>> 00 00 00 00 ce e3 08 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> 00 ...
>> ----------- ----------- ----------------------- -----------------------
>> opcode      ???         offset                  hot_spot
>>
>> So, when the IPC framework writes the opcode, it fills the first 
>> 32-bit unit
>> but apparently not the second half of the initial 64-bit unit. 
>> Consequently,
>> the contents of the second half of the word appear to persist from 
>> whatever
>> they were previously. Hence my annotation of "???" above. It looks 
>> like an
>> address from the program, perhaps an earlier page fault address.
>>
>> Often, zero bytes are involved, thus preserving the appropriate 
>> behaviour even
>> on 64-bit platforms where the opcode could be interpreted as the first 
>> 64-bit
>> (l4_umword_t) unit. Zeroing the first message register (as I noted in an
>> earlier message) fixed any cases where the prior non-zero contents 
>> leaked into
>> the message and corrupted the opcode when considered to be a 64-bit 
>> value.
>>
>> The problem on 32-bit platforms is that the operands are displaced and 
>> are not
>> found after the first 32-bit word. In the above example, this causes the
>> offset operand to be misinterpreted, along with the values that follow 
>> it.
>> Obviously, this brings programs needing dataspaces to a halt very 
>> quickly.
>>
>> The above behaviour contradicts the way the IPC messages are 
>> constructed by
>> Lua, for example. What I see on amd64/x86-64 is different from on x86 
>> for the
>> same message payload. For example, an invocation of a factory create 
>> operation
>> with an opcode of 6:
>>
>> On amd64:
>>
>> 06 00 00 00 00 00 00 00 81 00 08 00 00 00 00 00 e8 03 00 00 00 00 00 
>> 00 ...
>> ----------------------- ----- ----- ----------- -----------------------
>> opcode                  type  size  unused      value (1000)
>>
>> On x86:
>>
>> 06 00 00 00 81 00 04 00 e8 03 00 00 ...
>> ----------- ----- ----- -----------
>> opcode      type  size  value (1000)
>>
>> Here, the opcode is dependent on the word size, and the Lua code is 
>> happy to
>> use the word size for the operands with no padding or gaps being 
>> introduced.
>> Other IPC messages also appear to use the word size for the opcode. For
>> example, when attach operations are invoked on the region mapper I have
>> implemented, the opcode is only a 32-bit value with no trailing data 
>> before
>> the initial operands.
>>
>> I imagine that none of this would manifest itself if I used precisely 
>> the same
>> libraries and/or code as other L4Re components, but then that rather 
>> makes the
>> system monolithic. There should be a degree of interoperability based on
>> message specifications and interface descriptions, and there should be 
>> some
>> consistency, too. What I have seen is that the dataspace IPC is not 
>> consistent
>> with other IPC.
>>
>> Paul
>>
> 
> _______________________________________________
> l4-hackers mailing list
> l4-hackers at os.inf.tu-dresden.de
> https://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

-- 
philipp.eppelt at kernkonzept.com - Tel. 0351-41 883 221
http://www.kernkonzept.com

Kernkonzept GmbH.  Sitz: Dresden.  Amtsgericht Dresden, HRB 31129.
Geschäftsführer: Dr.-Ing. Michael Hohmuth
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://os.inf.tu-dresden.de/pipermail/l4-hackers/attachments/20220929/68f3b475/attachment.sig>


More information about the l4-hackers mailing list