Hello,
I encounter strange problems with semaphore library from l4env. I use a vanillia revision 230 on an intel PIII architecture. After some debugging and tracing, I get the following scenarion :
1. thread 03 of my process calls a semaphore down timed which then calls BLOCKTIMED IPC to semaphore thread 02. So semaphore structure is: counter = -1 pending = 0 queue = 03
2. thread 08 calls semaphore_up inline assembler code. Before IPC call, the semaphore structure is as follow : counter = 0 pending = 0 queue = 03
3. Thread 03 gains the CPU after timeout and calls IPC RELEASETIMED : counter = 1 pending = 0 queue = empty
4. Thread 02 doesn't find thread into queue, so that pending is set to 1. counter = 1 pending = 1 queue = empty
5. Thread 03 calls semaphore_down. No IPC is called. counter = 0 pending = 1 queue = empty
6. Thread 03 calls semaphore_down again and IPC BLOCK is called. IPC awakes imediately 03 with a counter value of -1 and nobody into queue. counter = -1 pending = 0 queue = empty
It seems to me that this state is not normal. Do you agree ? Is this scenario possible as I observe clearly state 4 5 and 6 ? What pending is for ?
Thanks in advance for your answers. Marc
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
I encounter strange problems with semaphore library from l4env. I use a vanillia revision 230 on an intel PIII architecture. After some debugging and tracing, I get the following scenarion :
- thread 03 of my process calls a semaphore down timed which then
calls BLOCKTIMED IPC to semaphore thread 02. So semaphore structure is: counter = -1 pending = 0 queue = 03
- thread 08 calls semaphore_up inline assembler code. Before IPC
call, the semaphore structure is as follow : counter = 0 pending = 0 queue = 03
- Thread 03 gains the CPU after timeout and calls IPC RELEASETIMED :
counter = 1 pending = 0 queue = empty
- Thread 02 doesn't find thread into queue, so that pending is set to 1.
counter = 1 pending = 1 queue = empty
- Thread 03 calls semaphore_down. No IPC is called.
counter = 0 pending = 1 queue = empty
- Thread 03 calls semaphore_down again and IPC BLOCK is called. IPC
awakes imediately 03 with a counter value of -1 and nobody into queue. counter = -1 pending = 0 queue = empty
It seems to me that this state is not normal. Do you agree ? Is this scenario possible as I observe clearly state 4 5 and 6 ? What pending is for ?
Are you just having a problem understanding the observed values or is there a real problem that you are running into? If the latter, could you provide a small example program so that we can reproduce it here?
Regards, Bjoern
from Bjoern Doebel :
Are you just having a problem understanding the observed values or is there a real problem that you are running into? If the latter, could you provide a small example program so that we can reproduce it here?
That's a real problem. I use several producer (08 and others) and one consumer (03) around a queue. I observed state 4, 5 and 6. This is anoying because semaphore_down exits as my queue is empty. At the begining I looked for some memory overflow into my code... But I found the scenario above which could explain this strange state.
I tried a little change into archindep.h and the problem doesn't appear anymore. However, I find it very dirty and uncomplete because asm.h and generic should be also changed. As I don't know very well semaphore, maybe there are some case broken ?
Marc
--- include/archindep.h (revision 230) +++ include/archindep.h (working copy) @@ -54,7 +54,7 @@ L4_INLINE int l4semaphore_down_timed(l4semaphore_t * sem, unsigned timeout) { - int old,tmp,ret; + int old,pending,tmp,ret; l4_umword_t dummy; l4_msgdope_t result;
@@ -66,6 +66,7 @@ do { old = sem->counter; + pending = sem->pending tmp = old - 1; } /* retry if someone else also modified the counter */ @@ -73,7 +74,7 @@ (l4_uint32_t)old, (l4_uint32_t)tmp));
- if (tmp < 0) + if (pending || (tmp < 0)) { /* we did not get the semaphore, block */ ret = l4_ipc_call(l4semaphore_thread_l4_id,
Hi Marc,
the pending counter have to be used only by the semaphore thread (you named it thread 02) and never by the other application threads (which are calling sem_up and sem_down in archindep.h).
Please look at my last mail, because your program seems to call semaphore_down (by thread 03) twice without semaphore_ups in between (by other threads or by thread 03), which can be a cause for your observed strange behaviors.
Regarding the pending variable, it is only used by the semaphore thread and never evaluated or ever manipulated by the application threads directly. So your proposed changes are incorrect.
See below the description of what the semaphore thread and the pending bit is good for (extracted from l4/pkg/semaphore/lib/src/semaphore.c)
"This (semaphore) thread is used by other threads to block if they did not get the semaphore. Threads call the semaphore thread , it enqueues the caller to the semaphores wait queue (using the wait queue element provided in the message buffer by the caller thread). The thread owning the semaphore calls the semaphore thread to wakeup other threads waiting on that semaphore. The semaphore thread then sends the reply message the the waiting thread.
A separate thread is used to serialize the accesses to the wait queues of the semaphores, thus no additional synchronization is necessary.
There is one tricky situation if the wakeup message for a semaphore is received before the block message. This might happen if the thread waiting on that semaphore is preemted by the thread releasing the semaphore before it can send the message to the semaphore thread (see l4semaphore_down()).
In this case the semaphore thread receives a wakeup message but no thread is enqueued in the wait queue. In such situations, the wakeup message is saved in l4semaphore::pending and the next block message is replied immediately."
Bye,
Alex.
Marc CHALAND wrote:
from Bjoern Doebel :
Are you just having a problem understanding the observed values or is there a real problem that you are running into? If the latter, could you provide a small example program so that we can reproduce it here?
That's a real problem. I use several producer (08 and others) and one consumer (03) around a queue. I observed state 4, 5 and 6. This is anoying because semaphore_down exits as my queue is empty. At the begining I looked for some memory overflow into my code... But I found the scenario above which could explain this strange state.
I tried a little change into archindep.h and the problem doesn't appear anymore. However, I find it very dirty and uncomplete because asm.h and generic should be also changed. As I don't know very well semaphore, maybe there are some case broken ?
Marc
--- include/archindep.h (revision 230) +++ include/archindep.h (working copy) @@ -54,7 +54,7 @@ L4_INLINE int l4semaphore_down_timed(l4semaphore_t * sem, unsigned timeout) {
- int old,tmp,ret;
- int old,pending,tmp,ret; l4_umword_t dummy; l4_msgdope_t result;
@@ -66,6 +66,7 @@ do { old = sem->counter;
} /* retry if someone else also modified the counter */pending = sem->pending tmp = old - 1;
@@ -73,7 +74,7 @@ (l4_uint32_t)old, (l4_uint32_t)tmp));
- if (tmp < 0)
- if (pending || (tmp < 0)) { /* we did not get the semaphore, block */ ret = l4_ipc_call(l4semaphore_thread_l4_id,
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
Alexander Boettcher boettcher@os.inf.tu-dresden.de:
There is one tricky situation if the wakeup message for a semaphore is received before the block message. This might happen if the thread waiting on that semaphore is preemted by the thread releasing the semaphore before it can send the message to the semaphore thread (see l4semaphore_down()).
I think what happens in my case is the mirror of that tricky situation : wakeup message is received just after a releasetimed message.
I agree the correction I proposed is awful :).
Marc
Hi,
there are some state transitions in - at least between 1. state 3. state - which can't occur, see below:
Marc CHALAND wrote:
Hello,
I encounter strange problems with semaphore library from l4env. I use a vanillia revision 230 on an intel PIII architecture. After some debugging and tracing, I get the following scenarion :
- thread 03 of my process calls a semaphore down timed which then
calls BLOCKTIMED IPC to semaphore thread 02. So semaphore structure is: counter = -1 pending = 0 queue = 03
- thread 08 calls semaphore_up inline assembler code. Before IPC
call, the semaphore structure is as follow : counter = 0 pending = 0 queue = 03
2.1 Now, Thread 02 (the semaphore thread) have to run, because this is the only one which remove threads from the queue. Otherwise the 3. step below isn't possible (empty queue, so thread 03 must be removed by thread 02 from the queue).
- Thread 03 gains the CPU after timeout and calls IPC RELEASETIMED :
counter = 1 pending = 0 queue = empty
Thread 03 increment the counter only if it gets really a timeout, however the queue is already empty that means that thread 02 executed between 2. and 3. step (2.1). Therefore Thread 03 should get a wakeup IPC from Thread 02 instead of a timeout.
- Thread 02 doesn't find thread into queue, so that pending is set to 1.
counter = 1 pending = 1 queue = empty
- Thread 03 calls semaphore_down. No IPC is called.
counter = 0 pending = 1 queue = empty
- Thread 03 calls semaphore_down again and IPC BLOCK is called. IPC
awakes imediately 03 with a counter value of -1 and nobody into queue. counter = -1 pending = 0 queue = empty
It seems to me that this state is not normal. Do you agree ? Is this scenario possible as I observe clearly state 4 5 and 6 ? What pending is for ?
In state 5. and 6. the same thread (thread 03) calls semaphore_down twice without any semaphore_ups in between by other threads. Why the thread 03 acquires the semaphore again when it has it already ? This will cause strange behavior of your application/semaphore or will cause dead locks.
Regards,
Alex.
Alexander Boettcher boettcher@os.inf.tu-dresden.de:
Marc CHALAND wrote:
- thread 08 calls semaphore_up inline assembler code. Before IPC
call, the semaphore structure is as follow : counter = 0 pending = 0 queue = 03
2.1 Now, Thread 02 (the semaphore thread) have to run, because this is the only one which remove threads from the queue. Otherwise the 3. step below isn't possible (empty queue, so thread 03 must be removed by thread 02 from the queue).
Is there a mecanism which prevents thread 03 from running before IPC call ?
- Thread 03 gains the CPU after timeout and calls IPC RELEASETIMED :
counter = 1 pending = 0 queue = empty
Thread 03 increment the counter only if it gets really a timeout, however the queue is already empty that means that thread 02 executed between 2. and 3. step (2.1). Therefore Thread 03 should get a wakeup IPC from Thread 02 instead of a timeout.
Into the code, it seams that after IPC timeout, counter is incremented immediately, isn't it ? After, 03 sends IPC RELEASETIMED and thread 02 removes thread from queue.
- Thread 02 doesn't find thread into queue, so that pending is set to 1.
counter = 1 pending = 1 queue = empty
This matches what you said into 2.1. In fact, I think 08 is interrupted in such a way that when IPC is sent, there is nothing into queue, so pending is set to 1.
In state 5. and 6. the same thread (thread 03) calls semaphore_down twice without any semaphore_ups in between by other threads.
Yes, 03 is a consumer, not a producer. I don't use semaphore for critical section but for a producer/consumer mecanism. 03 only makes down_timed before a pop from queue and others make up after a push into queue.
Why the thread 03 acquires the semaphore again when it has it already ?
First time, counter is > 0 so, no ipc is called. Second time, pending is also >0, so thread is imediately awaken even if counter is < 0.
This will cause strange behavior of your application/semaphore or will cause dead locks.
It causes semaphore_down exits without error code as my queue is empty which is annoying.
Marc
Hi,
can you please provide us with a (stripped down) example of your consumer/producer problem. Currently I don't see why or when you are using/mixing down_timed and down in the consumer thread (do you check the result of down_timed ?)
Alex.
Marc CHALAND wrote:
Alexander Boettcher boettcher@os.inf.tu-dresden.de:
Marc CHALAND wrote:
- thread 08 calls semaphore_up inline assembler code. Before IPC
call, the semaphore structure is as follow : counter = 0 pending = 0 queue = 03
2.1 Now, Thread 02 (the semaphore thread) have to run, because this is the only one which remove threads from the queue. Otherwise the 3. step below isn't possible (empty queue, so thread 03 must be removed by thread 02 from the queue).
Is there a mecanism which prevents thread 03 from running before IPC call ?
Thread 03 can be preempted by the kernel and another thread can be scheduled (between decrementing counter and sending the IPC).
- Thread 03 gains the CPU after timeout and calls IPC RELEASETIMED :
counter = 1 pending = 0 queue = empty
Thread 03 increment the counter only if it gets really a timeout, however the queue is already empty that means that thread 02 executed between 2. and 3. step (2.1). Therefore Thread 03 should get a wakeup IPC from Thread 02 instead of a timeout.
Into the code, it seams that after IPC timeout, counter is incremented immediately, isn't it ?
After the IPC call is finished (for whatever reason), the result (res != 0) is checked. If it is a timeout, a abort or was canceled than the counter is incremented, otherwise not.
After, 03 sends IPC RELEASETIMED and thread 02 removes thread from queue.
- Thread 02 doesn't find thread into queue, so that pending is set to 1.
counter = 1 pending = 1 queue = empty
This matches what you said into 2.1. In fact, I think 08 is interrupted in such a way that when IPC is sent, there is nothing into queue, so pending is set to 1.
Please provide us with your example code.
In state 5. and 6. the same thread (thread 03) calls semaphore_down twice without any semaphore_ups in between by other threads.
Yes, 03 is a consumer, not a producer. I don't use semaphore for critical section but for a producer/consumer mecanism. 03 only makes down_timed before a pop from queue and others make up after a push into queue.
Why the thread 03 acquires the semaphore again when it has it already ?
First time, counter is > 0 so, no ipc is called. Second time, pending is also >0, so thread is imediately awaken even if counter is < 0.
This will cause strange behavior of your application/semaphore or will cause dead locks.
It causes semaphore_down exits without error code as my queue is empty which is annoying.
Marc
Hi,
thank you for the stripped down example and the trace. After a while I found the reason of the bug, which handles the pending bit wrong for semaphore_down_timed calls. The fix is checked in and will be in the next days in the public svn repository.
Thank you Marc for reporting,
Alex.
Alexander Boettcher wrote:
Hi,
can you please provide us with a (stripped down) example of your consumer/producer problem. Currently I don't see why or when you are using/mixing down_timed and down in the consumer thread (do you check the result of down_timed ?)
Alex.
Marc CHALAND wrote:
Alexander Boettcher boettcher@os.inf.tu-dresden.de:
Marc CHALAND wrote:
- thread 08 calls semaphore_up inline assembler code. Before IPC
call, the semaphore structure is as follow : counter = 0 pending = 0 queue = 03
2.1 Now, Thread 02 (the semaphore thread) have to run, because this is the only one which remove threads from the queue. Otherwise the 3. step below isn't possible (empty queue, so thread 03 must be removed by thread 02 from the queue).
Is there a mecanism which prevents thread 03 from running before IPC call ?
Thread 03 can be preempted by the kernel and another thread can be scheduled (between decrementing counter and sending the IPC).
- Thread 03 gains the CPU after timeout and calls IPC RELEASETIMED :
counter = 1 pending = 0 queue = empty
Thread 03 increment the counter only if it gets really a timeout, however the queue is already empty that means that thread 02 executed between 2. and 3. step (2.1). Therefore Thread 03 should get a wakeup IPC from Thread 02 instead of a timeout.
Into the code, it seams that after IPC timeout, counter is incremented immediately, isn't it ?
After the IPC call is finished (for whatever reason), the result (res != 0) is checked. If it is a timeout, a abort or was canceled than the counter is incremented, otherwise not.
After, 03 sends IPC RELEASETIMED and thread 02 removes thread from queue.
- Thread 02 doesn't find thread into queue, so that pending is set to 1.
counter = 1 pending = 1 queue = empty
This matches what you said into 2.1. In fact, I think 08 is interrupted in such a way that when IPC is sent, there is nothing into queue, so pending is set to 1.
Please provide us with your example code.
In state 5. and 6. the same thread (thread 03) calls semaphore_down twice without any semaphore_ups in between by other threads.
Yes, 03 is a consumer, not a producer. I don't use semaphore for critical section but for a producer/consumer mecanism. 03 only makes down_timed before a pop from queue and others make up after a push into queue.
Why the thread 03 acquires the semaphore again when it has it already ?
First time, counter is > 0 so, no ipc is called. Second time, pending is also >0, so thread is imediately awaken even if counter is < 0.
This will cause strange behavior of your application/semaphore or will cause dead locks.
It causes semaphore_down exits without error code as my queue is empty which is annoying.
Marc
l4-hackers@os.inf.tu-dresden.de