Hi All,
I have a question about the dec_lock_cnt method in Context class under multiple processor. In its implementation, it checks if thread's home cpu is equal to current cpu. If not, it does not unset "_running_under_lock" variable, even if the _lock_cnt is 0. Why does it check if home cpu is equal to current cpu?
Consider the following case: Thread A is running on core 1 holding a helping lock, but is preempted by another high priority thread. Thread B is running on core 2, and tries to take the same helping lock. It finds the lock is held by A, so it helps A to run by migrating A to core 2. When A releases its lock, finding the current cpu is not its home cpu, so it does not set "_running_under_lock" to false. Then A continues to run on core 1 and tries to take that lock again. But now it can never take the lock, as the implementation of helping lock.
PRIVATE inline bool NO_INSTRUMENT Switch_lock::set_lock_owner(Context *o) { bool have_no_locks = o->_lock_cnt < 1;
if (have_no_locks) { assert_kdb (current_cpu() == o->home_cpu()); for (;;) { if (EXPECT_FALSE(access_once(&o->_running_under_lock))) continue; if (EXPECT_TRUE(mp_cas(&o->_running_under_lock, Mword(false), Mword(true)))) break; } } ... }
Do I misunderstand anything here?
Thank you very much. Best, Yuxin
Hi All,
I have some further understanding about the code. If thread A can go back to its home cpu(core 1), this means it is not running on the core 2. And when A release its lock, it will switch to its helper. During the context switch from thread A to its helper on core 2, the " _running_under_lock" of thread A should be set to false. Thus thread A is able to continue to run on its home cpu.
I think this logic has no problem. But I observed a thread going into the infinite loop in Switch_lock::set_lock_owner method. in my test program. Because of inappropriate memory barrier, or anything else?
Thank you very much. Best, Yuxin
On Wed, Aug 27, 2014 at 2:28 PM, Yuxin Ren ryx@gwmail.gwu.edu wrote:
Hi All,
I have a question about the dec_lock_cnt method in Context class under multiple processor. In its implementation, it checks if thread's home cpu is equal to current cpu. If not, it does not unset "_running_under_lock" variable, even if the _lock_cnt is 0. Why does it check if home cpu is equal to current cpu?
Consider the following case: Thread A is running on core 1 holding a helping lock, but is preempted by another high priority thread. Thread B is running on core 2, and tries to take the same helping lock. It finds the lock is held by A, so it helps A to run by migrating A to core 2. When A releases its lock, finding the current cpu is not its home cpu, so it does not set "_running_under_lock" to false. Then A continues to run on core 1 and tries to take that lock again. But now it can never take the lock, as the implementation of helping lock.
PRIVATE inline bool NO_INSTRUMENT Switch_lock::set_lock_owner(Context *o) { bool have_no_locks = o->_lock_cnt < 1;
if (have_no_locks) { assert_kdb (current_cpu() == o->home_cpu()); for (;;) { if (EXPECT_FALSE(access_once(&o->_running_under_lock))) continue; if (EXPECT_TRUE(mp_cas(&o->_running_under_lock, Mword(false), Mword(true)))) break; } } ... }
Do I misunderstand anything here?
Thank you very much. Best, Yuxin
On Fri Aug 29, 2014 at 13:46:38 -0400, Yuxin Ren wrote:
I have some further understanding about the code. If thread A can go back to its home cpu(core 1), this means it is not running on the core 2. And when A release its lock, it will switch to its helper. During the context switch from thread A to its helper on core 2, the " _running_under_lock" of thread A should be set to false. Thus thread A is able to continue to run on its home cpu.
I think this logic has no problem. But I observed a thread going into the infinite loop in Switch_lock::set_lock_owner method. in my test program. Because of inappropriate memory barrier, or anything else?
On Wed, Aug 27, 2014 at 2:28 PM, Yuxin Ren ryx@gwmail.gwu.edu wrote:
I have a question about the dec_lock_cnt method in Context class under multiple processor. In its implementation, it checks if thread's home cpu is equal to current cpu. If not, it does not unset "_running_under_lock" variable, even if the _lock_cnt is 0. Why does it check if home cpu is equal to current cpu?
This one should work better:
Context::dec_lock_cnt() { int n = _lock_cnt - 1; write_now(&_lock_cnt, n); if (EXPECT_TRUE(!n && home_cpu() == current_cpu())) { Mem::mp_wmb(); write_now(&_running_under_lock, Mword(false)); } }
Adam
Thanks a lot! I will try it. But could you explain to me why this is better? I am really curious about how to implement and debug such staff.
Best Yuxin
On Mon, Sep 1, 2014 at 5:58 AM, Adam Lackorzynski <adam@os.inf.tu-dresden.de
wrote:
On Fri Aug 29, 2014 at 13:46:38 -0400, Yuxin Ren wrote:
I have some further understanding about the code. If thread A can go back to its home cpu(core 1), this means it is not running on the core 2. And when A release its lock, it will switch to its helper. During the context switch from thread A to its helper on core 2, the " _running_under_lock" of thread A should be set to false. Thus thread A is able to continue to run on its home cpu.
I think this logic has no problem. But I observed a thread going into the infinite loop in Switch_lock::set_lock_owner method. in my test program. Because of inappropriate memory barrier, or anything else?
On Wed, Aug 27, 2014 at 2:28 PM, Yuxin Ren ryx@gwmail.gwu.edu wrote:
I have a question about the dec_lock_cnt method in Context class under multiple processor. In its implementation, it checks if thread's home cpu is equal to
current
cpu. If not, it does not unset "_running_under_lock" variable, even if the _lock_cnt is 0. Why does it check if home cpu is equal to current cpu?
This one should work better:
Context::dec_lock_cnt() { int n = _lock_cnt - 1; write_now(&_lock_cnt, n); if (EXPECT_TRUE(!n && home_cpu() == current_cpu())) { Mem::mp_wmb(); write_now(&_running_under_lock, Mword(false)); } }
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
On Mon Sep 01, 2014 at 23:04:37 +0800, Yuxin Ren wrote:
Thanks a lot! I will try it. But could you explain to me why this is better? I am really curious about how to implement and debug such staff.
For this type of code probably just being relaxed, thinking exactly on what can happen and knowing how SMP systems work memory-wise, on all architectures. Since that doesn't really work in practice it's also running test programs, tracing, logging, making some sense out of this, fixing things, and start all over again :)
Adam
Unfortunately this modification does not help. I have a question about the switch_lock code.
PRIVATE inline bool NO_INSTRUMENT Switch_lock::set_lock_owner(Context *o) { bool have_no_locks = o->_lock_cnt < 1;
if (have_no_locks) { assert_kdb (current_cpu() == o->home_cpu()); for (;;) { if (EXPECT_FALSE(access_once(&o->_running_under_lock))) continue; if (EXPECT_TRUE(mp_cas(&o->_running_under_lock, Mword(false), Mword(true)))) break; } } ... }
In which case, the variable "_running_under_lock" is set to true but the lock_cnt is 0? Now in my test, the kernel goes into this dead loop. I cannot figure out how to solve this problem.
Thank you very much. Yuixn
On Sun, Aug 31, 2014 at 5:58 PM, Adam Lackorzynski < adam@os.inf.tu-dresden.de> wrote:
On Fri Aug 29, 2014 at 13:46:38 -0400, Yuxin Ren wrote:
I have some further understanding about the code. If thread A can go back to its home cpu(core 1), this means it is not running on the core 2. And when A release its lock, it will switch to its helper. During the context switch from thread A to its helper on core 2, the " _running_under_lock" of thread A should be set to false. Thus thread A is able to continue to run on its home cpu.
I think this logic has no problem. But I observed a thread going into the infinite loop in Switch_lock::set_lock_owner method. in my test program. Because of inappropriate memory barrier, or anything else?
On Wed, Aug 27, 2014 at 2:28 PM, Yuxin Ren ryx@gwmail.gwu.edu wrote:
I have a question about the dec_lock_cnt method in Context class under multiple processor. In its implementation, it checks if thread's home cpu is equal to
current
cpu. If not, it does not unset "_running_under_lock" variable, even if the _lock_cnt is 0. Why does it check if home cpu is equal to current cpu?
This one should work better:
Context::dec_lock_cnt() { int n = _lock_cnt - 1; write_now(&_lock_cnt, n); if (EXPECT_TRUE(!n && home_cpu() == current_cpu())) { Mem::mp_wmb(); write_now(&_running_under_lock, Mword(false)); } }
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
On Mon Sep 08, 2014 at 11:54:32 -0400, Yuxin Ren wrote:
Unfortunately this modification does not help. I have a question about the switch_lock code.
PRIVATE inline bool NO_INSTRUMENT Switch_lock::set_lock_owner(Context *o) { bool have_no_locks = o->_lock_cnt < 1;
if (have_no_locks) { assert_kdb (current_cpu() == o->home_cpu()); for (;;) { if (EXPECT_FALSE(access_once(&o->_running_under_lock))) continue; if (EXPECT_TRUE(mp_cas(&o->_running_under_lock, Mword(false), Mword(true)))) break; } } ... }
In which case, the variable "_running_under_lock" is set to true but the lock_cnt is 0? Now in my test, the kernel goes into this dead loop.
Do you know when it was set to true? Maybe that would help finding out why it is stuck there?
Adam
There are three places where "_running_under_lock" is set to true;
First one is just in the above loop. So this code will run only after it goes into the loop. This does not help explain why the code can go into the loop.
The second is in Context::running_on_different_cpu() method in context.cpp file. if (EXPECT_FALSE(lock_cnt()) && EXPECT_FALSE(!mp_cas(&_running_under_lock, Mword(false), Mword(true)))) return true; But this only happens when lock_cnt is non-zero.
The third one is in Context::need_help method in context.cpp file. The method is called only when a thread tries to help the lock holder. I think this implies that the lock holder holds a lock and as a result, its lock_cnt is not 0.
Just from those code, I think it is impossible for a thread that its "_running_under_lock" is true while its lock_cnt is 0. However now that the code is here, I want to know why the author add the "extra" check here? The author should have some reason to do so, right?
Thanks a lot!! Best, Yuxin
On Tue, Sep 9, 2014 at 5:03 PM, Adam Lackorzynski <adam@os.inf.tu-dresden.de
wrote:
On Mon Sep 08, 2014 at 11:54:32 -0400, Yuxin Ren wrote:
Unfortunately this modification does not help. I have a question about the switch_lock code.
PRIVATE inline bool NO_INSTRUMENT Switch_lock::set_lock_owner(Context *o) { bool have_no_locks = o->_lock_cnt < 1;
if (have_no_locks) { assert_kdb (current_cpu() == o->home_cpu()); for (;;) { if (EXPECT_FALSE(access_once(&o->_running_under_lock))) continue; if (EXPECT_TRUE(mp_cas(&o->_running_under_lock, Mword(false), Mword(true)))) break; } } ... }
In which case, the variable "_running_under_lock" is set to true but the lock_cnt is 0? Now in my test, the kernel goes into this dead loop.
Do you know when it was set to true? Maybe that would help finding out why it is stuck there?
Adam
Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
l4-hackers mailing list l4-hackers@os.inf.tu-dresden.de http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers
Hi,
On Tue Sep 09, 2014 at 17:18:16 -0400, Yuxin Ren wrote:
There are three places where "_running_under_lock" is set to true;
First one is just in the above loop. So this code will run only after it goes into the loop. This does not help explain why the code can go into the loop.
The second is in Context::running_on_different_cpu() method in context.cpp file. if (EXPECT_FALSE(lock_cnt()) && EXPECT_FALSE(!mp_cas(&_running_under_lock, Mword(false), Mword(true)))) return true; But this only happens when lock_cnt is non-zero.
The third one is in Context::need_help method in context.cpp file. The method is called only when a thread tries to help the lock holder. I think this implies that the lock holder holds a lock and as a result, its lock_cnt is not 0.
Just from those code, I think it is impossible for a thread that its "_running_under_lock" is true while its lock_cnt is 0. However now that the code is here, I want to know why the author add the "extra" check here? The author should have some reason to do so, right?
Yes, probably. I can just give a fuzzy answer but the new release (due soon) will have a few changes in this area which are hopefully helping you too.
Adam
l4-hackers@os.inf.tu-dresden.de