DICE_NO_REPLY error
Michael Scheibel
m.scheibel at sirrix.com
Mon Jul 24 19:52:43 CEST 2006
Am Montag 24 Juli 2006 13:05 schrieb Ronald Aigner:
> Michael Scheibel wrote on 24.07.2006 12:52 this:
> >> Michael Scheibel wrote on 21.07.2006 17:20 this:
> >>> Running the generated stubs we noticed that ~0.89% of the IPC
> >>> transactions fail. In such a case the parameters passed to the server
> >>> still contain their initialization values on return. The client trace
> >>> says "ipc error c0" and the server trace says "DICE_NO_REPLY".
> >>
> >> IPC error code 0xc0 is "IPC: Receive operation aborted". This usually
> >> only occurs if the thread's state is changed (using the l4_exregs system
> >> call). My first guess is that either server or client are L4Linux
> >> applications and L4Linux is changing the thread state. That would
> >> axplain the "ipc error c0". Because the IPC is aborted, the parameters
> >> are not overwritten and thus still contain their initial values.
> >
> > You're perfectly right, both client and server are L4Linux applications.
> > However, I still don't understand what the l4_exregs syscall does and how
> > it affects the state of the server and/or client thread. Second, I wonder
> > if there is an option to avoid the resulting IPC aborts. The only
> > solution I can think of are manual retransmissions in case of a failure
> > (which is not really a solution but rather a work-around).
>
> For a full description of the functionality of the l4_exregs syscall,
> please refer to the L4 version 2 manual [1].
>
> The interesting effects for this scenario are: Either client or server
> thread are interrupted in their execution, because their execution
> environment -- L4Linux -- decided to suspend these threads (or one of
> them). This is fully valid, because L4Linux is the scheduler of these
> threads. After L4Linux resumes the threads they get an error code
> delivered informing them about the aborted IPC operation.
>
> Thus you really want that IPC error code to get reflected, because
> that's the only chance to determine if the IPC really got through and
> the results are valid or if the IPC was aborted and you rather should
> not use the results. An application can then handle the error code any
> way it seems fit. It could ignore the error and simply continue
> execution; it could simply retransmit the request; it could run a more
> complicated recovery protocol. So you don't want to loose that error
> code somewhere in the kernel or generated stub.
We've implemented some sort of retransmission mechanism with exponential
backoff between retransmissions but the client still gets regularly stuck in
the transmission loop:
---snip---
CORBA_Environment env = dice_default_environment;
UInt32 t=1;
TRANSMISSION:
CORBA_exception_free(&env);
(void)do_IPC(&serverID, &buf, &bufLen, &buf2, &buf2Len, &env);
if (!DICE_IS_NO_EXCEPTION(&env)) {
l4_sleep(t*=2); //exponential backoff (ms)
goto TRANSMISSION; //yehaw
}
---snip---
Is there anything we have overlooked?
Michael
--
Sirrix AG security technologies - http://www.sirrix.com
Michael Scheibel eMail: m.scheibel at sirrix.com
Tel +49(234) 610 071-124
Public key on demand.
Fingerprint 009B 9963 7B28 4356 CA43 5BFD 17A4 AE0F 6943 4B54
This message may contain confidential and/or privileged information.
If you are not the addressee, you must not use, copy, disclose or
take any action based on this message or any information herein.
If you have received this message in error, please advise the sender
immediately by reply e-mail and delete this message.
More information about the l4-hackers
mailing list