2 L3 IPC

2.1 IPC Overview

L3 tasks communicate with each other via messages. The communication is synchronous, i.e. sender and receiver have to be ready to exchange the message. If the receiver isn't ready, the sender has to wait until it is ready or the specified timeout is over and vice versa. The communication is direct, i.e. there are neither comunication channels nor links, only global thread and task identifiers.

L3 ipc is implemented with a single system call. The parameters select the operations to be executed. There are five operations:

send a message to a certain thread (snd msg != nil msg, rcv msg == nil msg)

receive a message from a certain thread (snd msg == nil msg, rcv msg != nil msg, closed wait)

receive a message from any thread (snd msg == nil msg, rcv msg != nil msg, open wait)

send and receive a message (rpc) to and from a certain thread (snd msg != nil msg, rcv msg != nil msg, closed wait)

send a message to a certain thread and receive a message from any thread (snd msg != nil msg, rcv msg != nil msg, open wait)

A message is described by two so called dopes. The first dope describes the layout of the message area, the other a message to be sent (or a received message). The message to be sent can be the complete message area or a subset of it. So you can use one message area for both, for a send operation and for a receive operation.

Each message contains a direct string (mandatory) and may contain indirect strings (optional), flexpages (optinal) and dataspaces (optional). The parts of a message are strictly arranged. First comes the direct string, then the indirect strings and last but not least the dataspaces.

A direct string consists of a number of dwords. An indirect string hast two parts, one for a message to be sent and another for a message to be received. Both are described by an address of an area and the length of that area. If an indirect string hast to be sent, you fill the first part with it's address and length. If you want to receive an indirect string, you allocate a buffer and fill the second part with it's address and size.

under construction

The third part, the flexpage, is a page from the address space of the sender, which will be mapped into the receivers address space. It is described with four parameters. The sender specifies the map mode he allows (read only, read/write), the address and size of the page to be send and as an additional parameter the baseaddress in the receivers address space. The receiver specifies the address and size of the area, to which the page should be mapped. So we can describe a flexpage as a tupel of address and size like this: FlexPage ( address, size).

It is possible, that the send page is bigger or smaller then the specified receive flexpage. Then the kernel has to calculate something like an effective flexpage for the final mapping. That means, the kernel has to divide the bigger flexpage into flexpages with the size of the smaller one and has to deceide, to which or which of this subflexpages should be mapt. For that he needs an additional parameter from the sender, the sendbase. This parameter provides the missing informations, if such a splitting occurs.

Consider the following situation: A pager provides different data spaces and a client maps a dataspace from that pager. The pager sends flexpages with different sizes, if a page fault occurs and the client accepts different sizes. The effectiv flexpages are calculated like this:

size of send flexpage = size of receive flexpage

This is the trivial case. The page will be mapped to receive flexpage with size of receive flexpage.

size of send flexpage < size of receive flexpage

This could be the normal situation. The client would accept more then the page causing the page fault, but the pager isn't able to deliver more then one page. Now the kernel has to decide, to which address in the receive flexpage the send flexpage should be mapped. The effectiv flexpages would be calculated like this:

effective send flexpage = send flexpage

effective receive flexpage =
```
( (rcvfpage AND -size of rcvfpage +
  (sndbase  AND (size of rcvfpage-1) AND -size of sndfpage), 
  size of sndfpage ) 
```
The pager sends a flexpage (address, size), and a sendbase = page fault address. That means, the pager delivers at least the page causing the page fault, but if the client accepts a bigger flexpage, id asked to map it to the page fault address.

size of send flexpage > size of receive flexpage

This would be the other situation. The pager is able to deliver more then one page, but the client accepts for instance only one.

The effectiv flexpages would be calculated like this:

effective receive flexpage = receiver flexpage

effective send flexpage =

( (sndfpage AND
  (rcvfpage AND (size of sndfpage-1) AND -size of rcvfpage), 
  size of rcvfpage )

A flexpage is described by a structure consisting of the following elements:

the map code

The map code describes the mapping operation to be performed by the kernel. You can map a page with the following modes:

IPC_FPAGE_READ_ONLY
The flex page is mapped read only.

IPC_FPAGE_READ_WRITE
The flex page is mapped read write.

the send base

The send base is used by the kernel to calculate the final flex page, if the receive flex page is bigger then the send flex page. In the case of a page fault it is normally the page fault address.

the send flex page

The send flex page describes the flex page to be send.

the receive flex page

The receive flex page describes an area in which a process is willing to accept mappings.

And the last but not least elements are the dataspaces described by dataspace id's.

Direct and indirect strings are copied strictly, memory objects lazily. A message can have 257 dwords (direct string), 15 indirect strings, 15 flexpages and 255 dataspaces. Every message has at least two dwords, no matter if you use them or not. If you don't need the dwords, you have to insert a dummy dword for each missing dword.

That results from the special IPC design of L3. If you receive a message, the first two dwords are delivered in registers. Because a high proportion of messages are very short, e.g. acknowledgements from a server process, this leads to a performance gain. Another advantage is the usage of indirect strings. They help to avoid copy operations at user level. For further informations about L3 ipc see: ,,Improving IPC by Kernel Design'' from Jochen Liedtke, GMD.

Let's summerize:

L3 communication
- is synchron,
- is direct,
- has timeouts,
- and is implemented with a single system call.

Messages
- contain two dopes describing the message, one for the size of the message area, the other for the message to be sent,
- have to have at least two dwords,
- are strictly ordered (1. dwords, 2. strings, 3. flexpages, 4. dataspaces),
- can be used for send and receive in one call (call(dest, &msg, &msg,-1,-1)),
- should be as big as the expected reply, if used for receive.

Let's take a look at some examples.

Consider the following situation:

A string needs to be sent to a terminal device driver. The string is located in an array named text. The terminal driver requires an operation code and a string. The message could look like the following:

struct {
   TMessageDope message_size;
   TMessageDope message;
   int device_order;
   int dummy;
   TStringDope text;
} msg;

You can see, that you have two dwords althought you need only one. It is important to insert this dummy dword, because the ipc interface expects to find at least two dwords. You would fill the elements of the message as follows:

   msg.message_size.DWord = ((2 dwords) - 2) + (1 indirect string)
   msg.message            = message_size;
   msg.device_order       = order_code_read;
   msg.text.StrAddress    = text;
   msg.text.StrLength     = strlen(text);

There are three things to notice:

The dummy element is ignored.

The number of dwords zero, althougth there are two dwords. But each message contains at least these two dwords. that why you decrement the number by two.

The message dope has the same value as the message size dope, because you want to send the whole message area.

Another situation:

A client requests a read operation from a server. The server expects a read operation code, a number of bytes to read and replies with an errorcode and an indirect string containing the requested bytes. The message could look like the following:

struct {
   TMessageDope message_size;
   TMessageDope message;
   int order;
   int number;
   TStringDope buffer;
} msg;

As you can see, the message contains an indirect string, although you only want to send an order code and another dword. But you need to supply a message buffer for the reply of the server. You can use the same message buffer for both operations. You would fill the message elements like this:

   char buffer[MAX_BUFFER_LENGTH];
   msg.message_size.DWord = ((2 dwords) - 2) + (1 indirect string);
   msg.message            = ((2 dwords) - 2);
   msg.device_order       = order_code_read;
   msg.text.StrBufAddress = buffer;
   msg.text.StrLength     = MAX_BUFFER_LENGTH;

As you can see, the dopes arn't equal. The message area is bigger then the message to be sent. You send only two dwords, but the message area is big enough for the incoming reply, so you can use one message area for both, for the send operation and for the receive operation.

Remark: For compatibility reasons L3 supports the old V2 ipc calls. It emulates V2 ipc with two special dwords, the so called emulation dope and the emulation token [here].

2.2 IPC Parameters

2.2.1 Parameters of IPC

send/receive timeout

0 <= n <= 2**30 : n ms timeout

n = -1 : never

destination/source id

thread id (V3 type!)

intr id

nil id ( 0 )

address of message to be sent

linear address of message to be sent (dword aligned!)

nil address ( -1 ), no message will be sent

address of receive buffer

linear address of message buffer (dword aligned!) for receiving a message, may be identical with snd vector address, IPC includes receive operation

nil address ( -1 ), IPC does not include receive operation

closed option

closed ( 0 ) : if IPC includes receive operation, a message will be accepted solely from the specified dest

open ( 2 ) : if IPC includes receive operation, messages from anyone are accepted

received message dope

result code 
+ (dwords     << 8 )  
+ (strings    << 16) 
+ (flexpages  << 20)   
+ (dataspaces << 24)

2.2.2 Message structure

message structure:

DWORD message size dope   
DWORD message send dope  
DWORD direct_part[max dwords+2] 
STRINGDOPE indirect_part_option[max strings]  
DATASPACE dataspace_part_option[max dataspaces]

message size dope =

(max dwords     << 8)  
+ (max strings    << 16) 
+ (max dataspaces << 24)

max dwords: 0 <= n < 256, size of direct part is (n+2)*4
max strings: 0 <= n < 256, length of indirect part is n*4*4
max dataspaces: 0 <= n < 256, length of dataspace part is n*4

msg snd dope =

dummy 
+ (dwords << 8) 
+ (strings << 16) 
+ (dataspaces << 24)

dummy: 0 <= n < 256 ignored
dwords: 0 <= n < 256, message to be sent has n+2 (!) dwords in direct part
strings: 0 <= n < 256, message to be sent has n strings in indirect part
dataspaces: 0 <= n < 256, message to be sent has n dataspaces in dataspace part

StringDope

DWORD string length  
DWORD string address 
DWORD string buffer size 
DWORD string buffer address

string length/string addr: describes string to be sent, ignored if string not to be sent.
string buffer: string buffer addr/ describes buffer for receiving a string, ignored if no corresponding string received

relation between size and send dope

For size and send dope of the same message structure must hold:

send dwords     <= size max dwords   AND 
send strings    <= size max strings  AND 
send dataspaces <= size max dataspaces

message buffer for send and receive

If a message structure is used for both, send and receive operation, it must hold:

max(dwords,      expected dwords in reply)     <= max dwords AND 
max(strings,     expected strings in reply)    <= max dwords AND 
max(dataspaces,  expected dataspaces in reply) <= max dwords

receive only buffer

If a msg structure is used as a receive buffer only in an IPC, the contents of the snd dope is irrelevant.

resend a received message

Simply storing the received message dope (EAX) in the receive buffer's send dope and received dword0 (EBX) and 1 (ECX) in the receive buffer's direct part (the dword part) leads to a sendable message structure.

2.2.3 Results

received msg dope =

result code 
+ (dwords     <<  8) 
+ (strings    << 16) 
+ (dataspaces << 24)

result code

0b000   ok, direct 
0b001   ok, redirected from inner clan
0b010   ok, redirected from outer clan 
0x10    send timeout 
0x18    receive timeout 
0x20    dest not existent 
0x28    source not existent 
0x30    erroneous (e.g. send to myself) 
0x40    dwords overflow 
0x48    dataspaces overflow 
0x50    strings overflow 
0x58    string overflow  
0x60    dwords overflow at rcv 
0x68    dataspaces overflow at rcv 
0x70    strings overflow at rcv 
0x78    string overflow at rcv

dwords

0 <= n < 256 , n+2 (!) dwords in direct msg part are received; dword 0 and 1 are contained in EBX and ECX, not (yet) in the message buffer!

strings

0 <= n < 256 , n indirect strings are received

dataspaces

0 <= n < 256 , n dataspaces are received

2.2.4 V3 - V2 IPC

How to send a message to a thread, which uses the V2 IPC:

V2 ipc uses a message vector describing the message to be send/received. This vector allows to order the single parts of the message. You can compose two messages with the same components, but different structure, eg:

     V2                       V2                   V3

DWORD     svcode     DWORD     svcode        DWORD     svcode
dataspace ds         dataspace ds            DWORD     dummy
string    name       DWORD     dummy         string    name
DWORD     dummy      string    name          dataspace ds

This isn't possible with V3 ipc. There is only one order, first part is the direct part, followed by the strings, if any, and last the dataspaces, if any. If you want to send a message to a thread, which uses the V2 ipc and depends on the correct order of the message components, you have to add two integers to the direct part of the message. The first integer ,,v2_emu_dope'' describes the V2 structure of the message (beginning on the right site with the first element) and the second is something like a token for the V2 emulation.

            v2_emu_dope 
            0x12345678

The v2_emu_dope consists of the following elements:

  00       INT       (2 bits) 
1101       String    (4 bits) 
  10       DATASPACE (2 bits) 
  11       nil       (2 bits) 

 
 
For instance: TEXT,INT,INT,INT,DATASPACE : 
 
 
11 11 11 11 11 11 11 11 11 11 10 00 00 00 11 01
                              -- -- -- -- ----- 
                               |  |  |  |   +------- 1. elem, string 
                               |  |  |  +----------- 2. elem, INT 
                               |  |  +-------------- 3. elem, INT 
                               |  +----------------- 4. elem, INT 
                               +-------------------- 5. elem, DS

The direct part of the V3 message looks like the following:

int1, int2, int3, 0xfffff80d, 0x12345678

2.2.5 V2 versus V3 ID-s

how to construct a V3 thread id out of a V2 thread id:

v3.high = 0x00043000 ; 
 
v3.low  = ((v2.low  & 0x00003fff) << 12) + 
          ((v2.high & 0x00000fff)      ) + 
          ((v2.high & 0x0003f000) << 18)

2.3 IPC C-Library

This is a first attempt to write a c-library for the l3 ipc system interface. It was done in a straigth forward manner and needs some future optimisations to utilisize the special features of the l3 ipc, like short messages.

2.3.1 Parameters

This chapter describes briefly the parameters of the IPC calls of the C library. All IPC calls have at least a msg and a timeout parameter, some have an additional thread parameter.

msg

The parameter msg represents the message to be sent. It should be the linear address of a structure containing at least the messagesizedope, messagedope and the direct part. [here]

timeout

The parameter timeout specifies the time, which the client will wait, if the server isn't ready. The value is given in milliseconds and means:

0 <= timeout <= 2**30 : timeout ms timeout

timeout == -1 : never

thread

The parameter thread is the id of the sending/receiving thread. It can be:

thread id (V3 type!)

intr id

nil id ( 0 )

return value

The return value describes the result of the requested operation. If IPC includes a receive operation and a message was received, the return value is the messagedope of the received message. If the operation was successful, the lower bits are zero (result code & 0xfc == 0), otherwise the lower eight bits represent the error code of the operation[here]. If the message was redirected on any clan border, the lower two bits are not zero.

There are different categories of errors.

Thread specific errors

Thread don't exists (destination not existent/source not existent)

Thread not ready (send timeout/receive timeout)

Message buffer specific errors

Message buffer not big enough (miscellaneous overflows)

2.3.2 Call

int Call(ThreadT thread, void *msg, void *reply, int send_timeout, int receive_timeout);

This function implements the normal RPC system call. The client process sends a message to a server process with thread id destination_id and waits for a reply from the server.

Parameters:

msg: The parameter msg represents the message to be sent. It should be the address of a structure containing at least the messagesizedope, messagedope and the direct part. [here]
reply: The parameter reply is the linear address of a message buffer [here] for the received message. It should be at least as big as the expected reply. Remark: You can use the same message buffer for msg and reply [here].
send_timeout: The parameter send_timeout specifies the time, which the client will wait, if the server isn't ready [here].
receive_timeout: The parameter receive_timeout specifies the time, which the client will wait for the reply of the Server [here].
thread: The parameter thread is the id of the receiving thread [here].
return value: The return value contains the messagedope of the received reply, if the call was succesful, otherwise the lower eight bit describes the error [here], [here].

2.3.3 ReplyAndWait

int ReplyAndWait(ThreadT thread, void *reply, ThreadT *client, void *order, int send_timeout, int receive_timeout);

This function sends a reply to client thread and waits for the next order from any client. It is designed for the implementation of a server, which typically sends a message to a distinct thread and waits for a message from any thread.

Parameters:

reply

The parameter reply represents the message to be sent. It should be the address of a structure containing at least the messagesizedope, messagedope and the direct part. ref id=,,Message-structure''//

order

The parameter order is the linear address of a message buffer [here] for the received message. It should be at least as big as the expected order.

Remark: You can use the same message buffer for reply and order [here].

timeout

The parameter timeout specifies the time, which the calle will wait, if the server isn't ready [here].

thread

The parameter thread is the id of the receiving thread [here].

return value

The return value contains the messagedope of the received order, if the call was succesful, otherwise the lower eight bit describes the error [here], [here].

2.3.4 Send

int Send(ThreadT thread, void *msg, int timeout);

This functions sends the message msg to the thread thread. If the thread isn't ready to receive the message, the sender waits timout milliseconds [here].

Parameters:

msg: The parameter msg represents the message to be sent. It should be the address of a structure containing at least the messagesizedope, messagedope and the direct part. [here]
timeout: The parameter timeout specifies the time, which the caller will wait, if the destination isn't ready [here].
thread: The parameter thread is the id of the receiving thread [here].
return value: The return value is zero, if the call was succesful, otherwise the lower eight bit describes the error [here], [here].

2.3.5 Receive

int Receive(ThreadT thread, void *msg, int timeout);

This function waits for a message from thread thread. If no message arrives within timeout milliseconds [here], it returns.

Parameters:

msg: The parameter msg represents the message buffer for the incoming message. It should be the address of a structure containing at least the messagesizedope, messagedope and the direct part and should be at least as big as the expected message [here].
timeout: The parameter timeout specifies the time, which the caller will wait for a message [here].
thread: The parameter thread is the id of the sending thread [here].
return value: The return value contains the messagedope of the received message, if the call was succesful, otherwise the lower eight bit describes the error [here], [here].

2.3.6 Wait

int Wait(ThreadT *thread, void *msg, int timeout);

This function waits for a message from any thread. If no message arrives within timeout milliseconds [here], it returns.

Parameters:

msg: The parameter msg represents the message buffer for the incoming message. It should be the address of a structure containing at least the messagesizedope, messagedope and the direct part and should be at least as big as the expected message [here].
timeout: The parameter timeout specifies the time, which the caller will wait for a message [here].
thread: The parameter thread is a pointer to the bufffer that will contain the id of the sending thread [here].
return value: The return value contains the messagedope of the received message, if the call was succesful, otherwise the lower eight bit describes the error [here], [here].

2.4 Kernel Interface

System Call: IPC

Invocation: INT 50h

Input

EAX address of receive buffer + closed option

EBX send timeout

ECX address of message to be send

EDX destination id low

ESI destination id high

EDI receive timeout

Output:

EAX received msg dope

EBX received dword0

ECX received dword1

EDX source id low

ESI source id high

EDI undefined

EBP undefined

DS,ES linear space

FS,GS undefined

Remark:: Undefined means, there is no garantie for the contents of that register. If it has to remaim, it has to be saved before IPC invocation.

Marion Schalm, Jean Wolter, Michael Hohmuth
26.12.1995 (unfinished)