Edmund,
good news: I think I have found the problem!
edmundo@rano.demon.co.uk writes:
I've investigated the ready queue after hitting "irq still active" by doing "p (class thread_t*)0xc0000000" and "p $.ready_next" repeatedly. In each case the threads corresponding to irqs 5 and 14 seemed to be ready. In one case, so was the thread corresponding to irq 0.
Just to confirm: irq 5 = 0xc014a800, irq 14 = 0xc014f000
I think I'm dealing with irq 5 when the error occurs because the value 0x20 is in eax and ebx.
Yep.
So presumably I should investigate the kernel stack for irq thread 14, which was apparently preempted in some mysterious fashion by an interrupt thread of lower priority. I see there's a kernel_sp in thread_t. [...]
I had a very similar situation, and from looking at the stack of the preempted higher-priority thread, it could tell that the higher-priority thread was voluntarily switching to the lower-priority thread by calling schedule(). However, schedule() should not have switched to the lower-priority thread if the higher-priority thread was runnable...
Anyway, a close look at the scheduler revealed a race condition where a high-priority thread could become runnable after it has decided to switch to a lower-priority one. I think I have eliminated that race now. Please try the latest version of thread.cc from CVS (>= 1.51).
Thanks again for testing!
Michael
(Taking a mental note not to throw away my good old 486 so soon because slow machines reveal much more races...)
l4-hackers@os.inf.tu-dresden.de