Re: L4-Linux worked on 486! - l4-hackers

18 Aug 1999


      Edmund,
good news: I think I have found the problem!
edmundo@rano.demon.co.uk writes:
...
I've investigated the ready queue after hitting "irq still active" by
doing "p (class thread_t*)0xc0000000" and "p $.ready_next" repeatedly.
In each case the threads corresponding to irqs 5 and 14 seemed to be
ready. In one case, so was the thread corresponding to irq 0.
Just to confirm: irq 5 = 0xc014a800, irq 14 = 0xc014f000
I think I'm dealing with irq 5 when the error occurs because the value
0x20 is in eax and ebx.
Yep.
...
So presumably I should investigate the kernel stack for irq thread 14,
which was apparently preempted in some mysterious fashion by an
interrupt thread of lower priority. I see there's a kernel_sp in
thread_t. [...]
I had a very similar situation, and from looking at the stack of the
preempted higher-priority thread, it could tell that the
higher-priority thread was voluntarily switching to the lower-priority
thread by calling schedule().  However, schedule() should not have
switched to the lower-priority thread if the higher-priority thread
was runnable...
Anyway, a close look at the scheduler revealed a race condition where
a high-priority thread could become runnable after it has decided to
switch to a lower-priority one.  I think I have eliminated that race
now.  Please try the latest version of thread.cc from CVS (>= 1.51).
Thanks again for testing!
Michael
(Taking a mental note not to throw away my good old 486 so soon
because slow machines reveal much more races...)
-- 
hohmuth@innocent.com, hohmuth@inf.tu-dresden.de
http://home.pages.de/~hohmuth/