DICE_NO_REPLY error

Michael Scheibel

21 Jul 2006 21 Jul '06

5:20 p.m.

Hello list, we have compiled the following interface with DICE 2.2.9: UInt32 genBuf( [out, size_is(*bufLen), max_is(MAX_BUFFER_SIZE)] UInt8** buf, [out] UInt32* bufLen, [out, size_is(*buf2Len), max_is(MAX_BUFFER_SIZE)] UInt8** buf2, [out] UInt32* buf2Len); UInt32 is defined as unsigned long, UInt8 as unsigned char, and MAX_BUFFER_SIZE is 4096. DICE options are -ftrace-server=printf -ftrace-client=printf -fforce-corba-alloc -P-DRAM_BASE=0x0 -P-DUSE_DIETLIBC=y -P-DSYSTEM_x86_l4v2 -P-DARCH_x86 -P-DCPUTYPE_ -P-DL4API_l4v2 -template -Bpia32 -Biv2 -BmC Running the generated stubs we noticed that ~0.89% of the IPC transactions fail. In such a case the parameters passed to the server still contain their initialization values on return. The client trace says "ipc error c0" and the server trace says "DICE_NO_REPLY". We would be happy about some hints what's going wrong here... Michael -- Sirrix AG security technologies - http://www.sirrix.com Michael Scheibel eMail: m.scheibel@sirrix.com Tel +49(234) 610 071-124 Public key on demand. Fingerprint 009B 9963 7B28 4356 CA43 5BFD 17A4 AE0F 6943 4B54 This message may contain confidential and/or privileged information. If you are not the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message.

Show replies by date

Ronald Aigner

23 Jul 23 Jul

11:02 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Michael, Michael Scheibel wrote on 21.07.2006 17:20 this:

...

Running the generated stubs we noticed that ~0.89% of the IPC transactions fail. In such a case the parameters passed to the server still contain their initialization values on return. The client trace says "ipc error c0" and the server trace says "DICE_NO_REPLY". IPC error code 0xc0 is "IPC: Receive operation aborted". This usually only occurs if the thread's state is changed (using the l4_exregs system call). My first guess is that either server or client are L4Linux applications and L4Linux is changing the thread state. That would axplain the "ipc error c0". Because the IPC is aborted, the parameters are not overwritten and thus still contain their initial values.

HTH, Ron. - -- Mit freundlichen Gruessen / with regards ra3 @ inf.tu-dresden.de http://os.inf.tu-dresden.de/~ra3/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEwztBvCdOf9l7ipgRAs82AJwPuMZ6ERKsDrqMYaHwHlPUppu81QCgnnp6 Qf4SOzN85W9WhfF/MwDrhck= =aZLv -----END PGP SIGNATURE-----

Michael Scheibel

24 Jul 24 Jul

12:52 p.m.

Am Sonntag 23 Juli 2006 11:02 schrieb Ronald Aigner:

...

Hello Michael,

Michael Scheibel wrote on 21.07.2006 17:20 this:

...
Running the generated stubs we noticed that ~0.89% of the IPC transactions fail. In such a case the parameters passed to the server still contain their initialization values on return. The client trace says "ipc error c0" and the server trace says "DICE_NO_REPLY".

IPC error code 0xc0 is "IPC: Receive operation aborted". This usually only occurs if the thread's state is changed (using the l4_exregs system call). My first guess is that either server or client are L4Linux applications and L4Linux is changing the thread state. That would axplain the "ipc error c0". Because the IPC is aborted, the parameters are not overwritten and thus still contain their initial values.

You're perfectly right, both client and server are L4Linux applications. However, I still don't understand what the l4_exregs syscall does and how it affects the state of the server and/or client thread. Second, I wonder if there is an option to avoid the resulting IPC aborts. The only solution I can think of are manual retransmissions in case of a failure (which is not really a solution but rather a work-around). Michael -- Sirrix AG security technologies - http://www.sirrix.com Michael Scheibel eMail: m.scheibel@sirrix.com Tel +49(234) 610 071-124 Public key on demand. Fingerprint 009B 9963 7B28 4356 CA43 5BFD 17A4 AE0F 6943 4B54 This message may contain confidential and/or privileged information. If you are not the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message.

Ronald Aigner

1:05 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Scheibel wrote on 24.07.2006 12:52 this:

...

...
Michael Scheibel wrote on 21.07.2006 17:20 this:

...
Running the generated stubs we noticed that ~0.89% of the IPC transactions fail. In such a case the parameters passed to the server still contain their initialization values on return. The client trace says "ipc error c0" and the server trace says "DICE_NO_REPLY". IPC error code 0xc0 is "IPC: Receive operation aborted". This usually only occurs if the thread's state is changed (using the l4_exregs system call). My first guess is that either server or client are L4Linux applications and L4Linux is changing the thread state. That would axplain the "ipc error c0". Because the IPC is aborted, the parameters are not overwritten and thus still contain their initial values.

You're perfectly right, both client and server are L4Linux applications. However, I still don't understand what the l4_exregs syscall does and how it affects the state of the server and/or client thread. Second, I wonder if there is an option to avoid the resulting IPC aborts. The only solution I can think of are manual retransmissions in case of a failure (which is not really a solution but rather a work-around). For a full description of the functionality of the l4_exregs syscall, please refer to the L4 version 2 manual [1].

The interesting effects for this scenario are: Either client or server thread are interrupted in their execution, because their execution environment -- L4Linux -- decided to suspend these threads (or one of them). This is fully valid, because L4Linux is the scheduler of these threads. After L4Linux resumes the threads they get an error code delivered informing them about the aborted IPC operation. Thus you really want that IPC error code to get reflected, because that's the only chance to determine if the IPC really got through and the results are valid or if the IPC was aborted and you rather should not use the results. An application can then handle the error code any way it seems fit. It could ignore the error and simply continue execution; it could simply retransmit the request; it could run a more complicated recovery protocol. So you don't want to loose that error code somewhere in the kernel or generated stub. Thus retransmission is not the only solution in case of failure and surely not a work-around. HTH, Ron. - -- Mit freundlichen Gruessen / with regards ra3 @ inf.tu-dresden.de http://os.inf.tu-dresden.de/~ra3/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFExKmLvCdOf9l7ipgRAj5aAKCsy6y8ys9txtBj383xmDkDij2K5wCfXtjj k5MudDjnORN/e3QscO6zjtU= =1Msi -----END PGP SIGNATURE-----

Ronald Aigner

1:10 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sorry, forgot the reference: Ronald Aigner wrote on 24.07.2006 13:05 this:

...

Michael Scheibel wrote on 24.07.2006 12:52 this:

...
...
...
Michael Scheibel wrote on 21.07.2006 17:20 this:

...
Running the generated stubs we noticed that ~0.89% of the IPC transactions fail. In such a case the parameters passed to the server still contain their initialization values on return. The client trace says "ipc error c0" and the server trace says "DICE_NO_REPLY". IPC error code 0xc0 is "IPC: Receive operation aborted". This usually only occurs if the thread's state is changed (using the l4_exregs system call). My first guess is that either server or client are L4Linux applications and L4Linux is changing the thread state. That would axplain the "ipc error c0". Because the IPC is aborted, the parameters are not overwritten and thus still contain their initial values. You're perfectly right, both client and server are L4Linux applications. However, I still don't understand what the l4_exregs syscall does and how it affects the state of the server and/or client thread. Second, I wonder if there is an option to avoid the resulting IPC aborts. The only solution I can think of are manual retransmissions in case of a failure (which is not really a solution but rather a work-around). For a full description of the functionality of the l4_exregs syscall, please refer to the L4 version 2 manual [1].

The interesting effects for this scenario are: Either client or server thread are interrupted in their execution, because their execution environment -- L4Linux -- decided to suspend these threads (or one of them). This is fully valid, because L4Linux is the scheduler of these threads. After L4Linux resumes the threads they get an error code delivered informing them about the aborted IPC operation.

Thus you really want that IPC error code to get reflected, because that's the only chance to determine if the IPC really got through and the results are valid or if the IPC was aborted and you rather should not use the results. An application can then handle the error code any way it seems fit. It could ignore the error and simply continue execution; it could simply retransmit the request; it could run a more complicated recovery protocol. So you don't want to loose that error code somewhere in the kernel or generated stub.

Thus retransmission is not the only solution in case of failure and surely not a work-around.

[1] http://l4hq.org/docs/manuals/Ln-86-21.pdf _______________________________________________ l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers - -- Mit freundlichen Gruessen / with regards ra3 @ inf.tu-dresden.de http://os.inf.tu-dresden.de/~ra3/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFExKqgvCdOf9l7ipgRApFxAJ4vGvCM5D6+Ydgyv4Tc6VdwFLtSxACfTLcF Mw1HCkzFpA0HvamajMSuyhs= =zSJw -----END PGP SIGNATURE-----

Michael Scheibel

7:52 p.m.

Am Montag 24 Juli 2006 13:05 schrieb Ronald Aigner:

...

Michael Scheibel wrote on 24.07.2006 12:52 this:

...
...
Michael Scheibel wrote on 21.07.2006 17:20 this:

...
Running the generated stubs we noticed that ~0.89% of the IPC transactions fail. In such a case the parameters passed to the server still contain their initialization values on return. The client trace says "ipc error c0" and the server trace says "DICE_NO_REPLY".

IPC error code 0xc0 is "IPC: Receive operation aborted". This usually only occurs if the thread's state is changed (using the l4_exregs system call). My first guess is that either server or client are L4Linux applications and L4Linux is changing the thread state. That would axplain the "ipc error c0". Because the IPC is aborted, the parameters are not overwritten and thus still contain their initial values.

You're perfectly right, both client and server are L4Linux applications. However, I still don't understand what the l4_exregs syscall does and how it affects the state of the server and/or client thread. Second, I wonder if there is an option to avoid the resulting IPC aborts. The only solution I can think of are manual retransmissions in case of a failure (which is not really a solution but rather a work-around).

For a full description of the functionality of the l4_exregs syscall, please refer to the L4 version 2 manual [1].

The interesting effects for this scenario are: Either client or server thread are interrupted in their execution, because their execution environment -- L4Linux -- decided to suspend these threads (or one of them). This is fully valid, because L4Linux is the scheduler of these threads. After L4Linux resumes the threads they get an error code delivered informing them about the aborted IPC operation.

Thus you really want that IPC error code to get reflected, because that's the only chance to determine if the IPC really got through and the results are valid or if the IPC was aborted and you rather should not use the results. An application can then handle the error code any way it seems fit. It could ignore the error and simply continue execution; it could simply retransmit the request; it could run a more complicated recovery protocol. So you don't want to loose that error code somewhere in the kernel or generated stub.

We've implemented some sort of retransmission mechanism with exponential backoff between retransmissions but the client still gets regularly stuck in the transmission loop: ---snip--- CORBA_Environment env = dice_default_environment; UInt32 t=1; TRANSMISSION: CORBA_exception_free(&env); (void)do_IPC(&serverID, &buf, &bufLen, &buf2, &buf2Len, &env); if (!DICE_IS_NO_EXCEPTION(&env)) { l4_sleep(t*=2); //exponential backoff (ms) goto TRANSMISSION; //yehaw } ---snip--- Is there anything we have overlooked? Michael -- Sirrix AG security technologies - http://www.sirrix.com Michael Scheibel eMail: m.scheibel@sirrix.com Tel +49(234) 610 071-124 Public key on demand. Fingerprint 009B 9963 7B28 4356 CA43 5BFD 17A4 AE0F 6943 4B54 This message may contain confidential and/or privileged information. If you are not the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message.

Ronald Aigner

9:55 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, Michael Scheibel wrote on 24.07.2006 19:52 this:

...

Is there anything we have overlooked? While discussing this issue with the L4Linux maintainer, he mentioned a rare corner-case: it had something to do with pagefaults during IPC. Maybe he can elaborate on that?!

Greetings, Ron. - -- Mit freundlichen Gruessen / with regards ra3 @ inf.tu-dresden.de http://os.inf.tu-dresden.de/~ra3/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFExSWXvCdOf9l7ipgRAiXiAJ9uL1Y8OVaQfDNT2OKGJ/KG94+tHACgmBIe uH6+yOM30uOR/wP256R4R4E= =8v39 -----END PGP SIGNATURE-----

Michael Scheibel

25 Jul 25 Jul

5:53 p.m.

Am Montag 24 Juli 2006 21:55 schrieb Ronald Aigner:

...

Hello,

Michael Scheibel wrote on 24.07.2006 19:52 this:

...
Is there anything we have overlooked?

While discussing this issue with the L4Linux maintainer, he mentioned a rare corner-case: it had something to do with pagefaults during IPC. Maybe he can elaborate on that?!

We have reduced the maximum amount of data being transmitted in an IPC call to be less than 4K and haven't noticed any more IPC aborts so far. Is this a known restriction of IPC between L4Linux tasks? Michael -- Sirrix AG security technologies - http://www.sirrix.com Michael Scheibel eMail: m.scheibel@sirrix.com Tel +49(234) 610 071-124 Public key on demand. Fingerprint 009B 9963 7B28 4356 CA43 5BFD 17A4 AE0F 6943 4B54 This message may contain confidential and/or privileged information. If you are not the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message.

Adam Lackorzynski

26 Jul 26 Jul

1:28 p.m.

On Tue Jul 25, 2006 at 17:53:05 +0200, Michael Scheibel wrote:

...

Am Montag 24 Juli 2006 21:55 schrieb Ronald Aigner:

...
Michael Scheibel wrote on 24.07.2006 19:52 this:

...
Is there anything we have overlooked?

While discussing this issue with the L4Linux maintainer, he mentioned a rare corner-case: it had something to do with pagefaults during IPC. Maybe he can elaborate on that?!

We have reduced the maximum amount of data being transmitted in an IPC call to be less than 4K and haven't noticed any more IPC aborts so far. Is this a known restriction of IPC between L4Linux tasks?

It's a bug, we're working on fixing this. Adam -- Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/

Adam Lackorzynski

27 Jul 27 Jul

7:52 p.m.

On Wed Jul 26, 2006 at 13:28:47 +0200, Adam Lackorzynski wrote:

...

On Tue Jul 25, 2006 at 17:53:05 +0200, Michael Scheibel wrote:

...
Am Montag 24 Juli 2006 21:55 schrieb Ronald Aigner:

...
Michael Scheibel wrote on 24.07.2006 19:52 this:

...
Is there anything we have overlooked?

While discussing this issue with the L4Linux maintainer, he mentioned a rare corner-case: it had something to do with pagefaults during IPC. Maybe he can elaborate on that?!

We have reduced the maximum amount of data being transmitted in an IPC call to be less than 4K and haven't noticed any more IPC aborts so far. Is this a known restriction of IPC between L4Linux tasks?

It's a bug, we're working on fixing this.

The bug should be fixed now, please update Fiasco and L4Linux from remote CVS. Adam -- Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/

7159

Age (days ago)

7165

Last active (days ago)

List overview

Download

9 comments

3 participants

participants (3)

Adam Lackorzynski
Michael Scheibel
Ronald Aigner