Hi,
Sometimes, I get the following message : semaphore/lib/src/semaphore.c:167:__enqueue_thread(): Error : l4semahore: failed to get priority of thread 17.08: invalid argument (-3)
Into sources, the comment says that thread has not been created by l4thread or is dead. But all threads are created by l4thread and no one is explicitly killed by another thread. Is there a mechanism which kills threads ? Is it possible a lock is requested as thread is not completely created ? Anybody has got an idea on what happens ?
Regards Marc
Hi,
On Tue Jan 22, 2008 at 15:42:55 +0100, Marc CHALAND wrote:
Sometimes, I get the following message : semaphore/lib/src/semaphore.c:167:__enqueue_thread(): Error : l4semahore: failed to get priority of thread 17.08: invalid argument (-3)
Into sources, the comment says that thread has not been created by l4thread or is dead. But all threads are created by l4thread and no one is explicitly killed by another thread. Is there a mechanism which kills threads ? Is it possible a lock is requested as thread is not completely created ? Anybody has got an idea on what happens ?
The thread should exist, looking at the code path in the semaphore lib. The thread_id used for the prio-get is coming fron an ipc-wait so the thread should be there (in the sense there's a thread at all). The thread-lib might have some other state than ACTIVE for this thread. If you look in the debugger, the thread is there and its state is fine? If it happens, is it a permanent error or just sporadic? Does it happen during some setup phase or after that when things have settled?
Adam
2008/1/22, Adam Lackorzynski adam@os.inf.tu-dresden.de:
The thread should exist, looking at the code path in the semaphore lib. The thread_id used for the prio-get is coming fron an ipc-wait so the thread should be there (in the sense there's a thread at all). The thread-lib might have some other state than ACTIVE for this thread.
OK, I agree and I understand :).
If you look in the debugger, the thread is there and its state is fine?
This happens when we do heavy testing. The task has a standard behavior : a manager thread and several worker threads. This message happens on workers. As worker doesn't live long enough, it is diffult for me to catch info about that threads. Perhaps, I should try to modify semaphore and add a enter_kdebug() ?
If it happens, is it a permanent error or just sporadic? Does it happen during some setup phase or after that when things have settled?
This happens only after several minutes of heavy testing and sometimes. As we have only one testing infra on which problem occurs, we will do more investigation after l4rm synchro valid.
Regards Marc
Here is some more info about this log : Each time this log appears, the state of the thread into jdb is, for example :
17.0d (deleted) a0 17.01 rcv,ipc_progr
Backtrace of each thread is mainly : l4rm_detach l4th_pages_free __do_cleanup_and_block
We observed one thread with the following backtrace : __modify_region l4rm_detach l4th_pages_free __do_cleanup_and_block
If I understand, the thread is quite dead but it still tries to do semaphore, isn't it ?
Regards Marc
On Wed Jan 23, 2008 at 11:39:49 +0100, Marc CHALAND wrote:
2008/1/22, Adam Lackorzynski adam@os.inf.tu-dresden.de:
The thread should exist, looking at the code path in the semaphore lib. The thread_id used for the prio-get is coming fron an ipc-wait so the thread should be there (in the sense there's a thread at all). The thread-lib might have some other state than ACTIVE for this thread.
OK, I agree and I understand :).
If you look in the debugger, the thread is there and its state is fine?
This happens when we do heavy testing. The task has a standard behavior : a manager thread and several worker threads. This message happens on workers. As worker doesn't live long enough, it is diffult for me to catch info about that threads. Perhaps, I should try to modify semaphore and add a enter_kdebug() ?
If it happens, is it a permanent error or just sporadic? Does it happen during some setup phase or after that when things have settled?
This happens only after several minutes of heavy testing and sometimes. As we have only one testing infra on which problem occurs, we will do more investigation after l4rm synchro valid.
On Wed Jan 23, 2008 at 15:04:33 +0100, Marc CHALAND wrote:
Here is some more info about this log : Each time this log appears, the state of the thread into jdb is, for example :
17.0d (deleted) a0 17.01 rcv,ipc_progr
Backtrace of each thread is mainly : l4rm_detach l4th_pages_free __do_cleanup_and_block
We observed one thread with the following backtrace : __modify_region l4rm_detach l4th_pages_free __do_cleanup_and_block
If I understand, the thread is quite dead but it still tries to do semaphore, isn't it ?
No, the thread is still alive, the '(deleted)' just means that it has already been unregistered at the name service. Actually, the thread will also not go away, it will just sleep. So it is basically always there. Could you verify the theory that the threadlib has some strange state at that time? So basically that l4thread_get_prio is returning -L4_EINVAL and then what the value of l4th_tcbs[thread].state in l4th_tcb_get is (include/__tcb.h).
Adam
2008/1/24, Adam Lackorzynski adam@os.inf.tu-dresden.de:
No, the thread is still alive, the '(deleted)' just means that it has already been unregistered at the name service.
Name service data is stored into fiasco data to be shown into jdb ?
Could you verify the theory that the threadlib has some strange state at that time? So basically that l4thread_get_prio is returning -L4_EINVAL
Yes, return value is -L4_EINVAL.
and then what the value of l4th_tcbs[thread].state in l4th_tcb_get is (include/__tcb.h).
state of thread is 4 : TCB_SHUTDOWN. Hope this helps ?
Regards Marc
Hi Adam & all,
here the calling graph, which seems to cause the issue:
lib/src/exit.c: __do_exit () the state is set to TCB_SHUTDOWN
Afterwards the thread in shutdown progress calls __do_cleanup_and_block() -> l4th_stack_free -> l4th_pages_free -> l4rm_detach () -> l4rm_lock_region_list() -> l4lock_lock () -> IPC to semaphore lib if locked [! we have contention, threads in creation or in shutdown progress, otherwise no call to the semaphore lib is required] -> get_prio () -> sanity check TCB_ACTIVE
Here the data structure [l4th_tcb_t] is still valid and associated to the thread. This data structure is removed after l4th_stack_free() invocation by l4th_tcb_deallocate(tcb).
So it seems to be no problem in general, because the data structure is still valid and associated to the thread in shutdown progress. However a solution could be to extend the sanity check from tcb_active also to allow tcb_shutdown [bad/good don't know] or to move around the tcb_shutdown state [possibly bad idea] or to introduce another state and extend sanity check.
Best,
Alex B.
Marc CHALAND wrote:
2008/1/24, Adam Lackorzynski adam@os.inf.tu-dresden.de:
No, the thread is still alive, the '(deleted)' just means that it has already been unregistered at the name service.
Name service data is stored into fiasco data to be shown into jdb ?
Could you verify the theory that the threadlib has some strange state at that time? So basically that l4thread_get_prio is returning -L4_EINVAL
Yes, return value is -L4_EINVAL.
and then what the value of l4th_tcbs[thread].state in l4th_tcb_get is (include/__tcb.h).
state of thread is 4 : TCB_SHUTDOWN. Hope this helps ?
Regards Marc
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
Hi,
forgot this: after the l4lock_lock of course also a l4lock_unlock is called (3 lines after l4lock_lock). So the thread is also dequeued from the wait queue of the lock and so no (indirect) association to the l4th_tcb_t data structure are hold anymore by the lock/sem. implementation.
Alex B.
Alexander Boettcher wrote:
Hi Adam & all,
here the calling graph, which seems to cause the issue:
lib/src/exit.c: __do_exit () the state is set to TCB_SHUTDOWN
Afterwards the thread in shutdown progress calls __do_cleanup_and_block() -> l4th_stack_free -> l4th_pages_free -> l4rm_detach () -> l4rm_lock_region_list() -> l4lock_lock () -> IPC to semaphore lib if locked [! we have contention, threads in creation or in shutdown progress, otherwise no call to the semaphore lib is required] -> get_prio () -> sanity check TCB_ACTIVE
Here the data structure [l4th_tcb_t] is still valid and associated to the thread. This data structure is removed after l4th_stack_free() invocation by l4th_tcb_deallocate(tcb).
So it seems to be no problem in general, because the data structure is still valid and associated to the thread in shutdown progress. However a solution could be to extend the sanity check from tcb_active also to allow tcb_shutdown [bad/good don't know] or to move around the tcb_shutdown state [possibly bad idea] or to introduce another state and extend sanity check.
Best,
Alex B.
Marc CHALAND wrote:
2008/1/24, Adam Lackorzynski adam@os.inf.tu-dresden.de:
No, the thread is still alive, the '(deleted)' just means that it has already been unregistered at the name service.
Name service data is stored into fiasco data to be shown into jdb ?
Could you verify the theory that the threadlib has some strange state at that time? So basically that l4thread_get_prio is returning -L4_EINVAL
Yes, return value is -L4_EINVAL.
and then what the value of l4th_tcbs[thread].state in l4th_tcb_get is (include/__tcb.h).
state of thread is 4 : TCB_SHUTDOWN. Hope this helps ?
Regards Marc
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
On Thu Jan 24, 2008 at 11:13:40 +0100, Marc CHALAND wrote:
2008/1/24, Adam Lackorzynski adam@os.inf.tu-dresden.de:
No, the thread is still alive, the '(deleted)' just means that it has already been unregistered at the name service.
Name service data is stored into fiasco data to be shown into jdb ?
Could you verify the theory that the threadlib has some strange state at that time? So basically that l4thread_get_prio is returning -L4_EINVAL
Yes, return value is -L4_EINVAL.
and then what the value of l4th_tcbs[thread].state in l4th_tcb_get is (include/__tcb.h).
state of thread is 4 : TCB_SHUTDOWN.
It did. Thanks you and Alex for the analysis.
I've modified the threadlib a bit to cope with such a situation. The warning should be gone now.
Adam
l4-hackers@os.inf.tu-dresden.de