Edmund GRIMLEY EVANS edmundo@rano.demon.co.uk writes:
From what you wrote, it sounds like the problem could be solved in either of two ways:
- make L4 priorities correspond to PIC priorities
- use a specific instead of a non-specific EOI
That's correct. We went for the first solution (because AFAIK, the second one (specific EOI) is not allowed for the mode of the PIC we use -- the special fully nested mode --, or at least it not documented what happens when used in this mode).
Can either of these be implemented as a quick fix even if the long-term solution is moving irq ack into the microkernel?
I've done this already; what you've digged up is my implementation of the first solution:
I notice that linux22/arch/l4-i386/kernel/irq.c has:
static const /* prio for irq 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 */ char irq_prio[] = {1, 0,15, 6, 5, 4, 3, 2,14,13,12,11,10, 9, 8, 7};
Can I change this? What to?
My wild guess would be {15,14,13,4,3,2,1,0,12,11,10,9,8,7,6,5} ... but there must be some reason for it being the way it is ...
This array sorts the priority of the interrupt threads in the order of the hardware priorities. It reflects the priorities after they have been changed by programming the PIC in irq_init.cc: irq_t::init().
Jean already explained why this is the correct order: We have to make sure that IRQ8 (Fiasco's timer interrupt) has the highest priority (i.e., so that it always gets through even if other irqs are in service (i.e., haven't been acknowledged yet)), and that's why IRQ2, the cascade irq for the slave PIC (which hosts IRQ8) gets the highest priority on the master PIC.
So, you shouldn't change this array.
Now, to debug your problem (some irq's in-service flag is still set after we think we're acknowledged the irq), here are some tips:
I suggest you try to find out if there are other irq threads which are in the ready queue and which have been preempted before they have acknowledged their interrupt. In an L4Linux setup, these threads are 5.10 to 5.20 (in hexadecimal task.lthread format, as understood by the built-in kernel debugger Jdb). Then we can try to understand why that thread has been preempted by walking up its kernel stack.
Some random hints:
In Jdb, you can use `r n' and `r p' to traverse the ready queue (mnemonics: ``ready next'', ``ready prev''). Look at a thread_t by typing something like `t 5.10'. For Gdb, .gdbinit has a macro which can be used like this: "tcb 5 0x10" prints thread's 5.10 thread_t.
I've added a command `k p' to Jdb which prints out the state of the PICs. Maybe you can see something suspect there?
You can switch back and forth between remote Gdb and Jdb by using "set jdb::use_nested = 0" in Gdb and by pressing `V s' in Jdb. (Of course, all this only works if you haven't completely disabled Jdb with the "-nojdb" command-line switch.)
There was a long discussion about that and it looks like irq acknowledge will move to the micro kernel.
Was this an e-mail discussion accessible to people like myself, or a discussion in the coffee room in TU Dresden?
An electronic coffee room meant for L4 implementors on various architectures (a private mailing list hosted by USNW). Do you think that more of the internal kernel-design-and-interface discussion should take place in public? If so, we might try to pursuade the other members of that other mailing list to move more of their discussions here.
Michael
I've investigated the ready queue after hitting "irq still active" by doing "p (class thread_t*)0xc0000000" and "p $.ready_next" repeatedly. In each case the threads corresponding to irqs 5 and 14 seemed to be ready. In one case, so was the thread corresponding to irq 0.
Just to confirm: irq 5 = 0xc014a800, irq 14 = 0xc014f000
I think I'm dealing with irq 5 when the error occurs because the value 0x20 is in eax and ebx.
So presumably I should investigate the kernel stack for irq thread 14, which was apparently preempted in some mysterious fashion by an interrupt thread of lower priority. I see there's a kernel_sp in thread_t. Can anyone tell me how to get gdb to analyse that stack for me, to save me picking out the values that look like program addresses from the hex dump by hand?
By the way, /proc/interrupts says:
CPU0 0: 38599 L4-IPC-timeout timer 1: 348 L4-IPC-IRQ keyboard 2: 0 L4-IPC-IRQ non-Linux 3: 0 L4-IPC-IRQ non-Linux 5: 26 L4-IPC-IRQ NE2000 8: 0 L4-IPC-IRQ non-Linux 13: 0 L4-IPC-IRQ math error 14: 18002 L4-IPC-IRQ ide0
I haven't been able to contact os.inf.tu-dresden.de, so I'm still using the same irq.c as before I went on holiday. It seems to be 1.16 with Michael's patch that was sent to l4-hackers on 1999-07-23 plus the following patch:
--- irq.c.mh Sat Jul 31 18:53:15 1999 +++ irq.c Sun Aug 15 18:53:35 1999 @@ -1199,10 +1199,6 @@ __sti(); #endif
-#ifdef SANITY - irqs_in_progress &= ~(1 << TIMER_IRQ); -#endif - l4_i386_ipc_receive(L4_NIL_ID, 0, &dummy, &dummy, L4_IPC_TIMEOUT(0,0,4*39,12,0,0), /* ca. 10 ms */ @@ -1242,7 +1238,11 @@ handle_irq(TIMER_IRQ, cpu); /* we're unblocked -- execute handler */
spin_unlock(&irq_controller_lock); - + +#ifdef SANITY + irqs_in_progress &= ~(1 << TIMER_IRQ); +#endif + execute_bottom_halves(TIMER_IRQ); } } /* timer_irq_thread */
Edmund
l4-hackers@os.inf.tu-dresden.de