Hello,
I implemented a statistical profiler for fiasco using performance monitoring counters (nmi handler is attached to handle_slow_trap()). It works fairly well for "mathematical" workloads but as soon as interrupt activity gets involved to a greater extent, the system freezes. For example, keeping a button pressed while sampling will freeze the machine within a few seconds. The NMI watchdog does not trigger during that freeze. Higher sampling rates also cause these freezes.
So imho there seems to be a problem when irq_interrupt() or Irq::hit() is interrupted by an NMI. What do you think? Any hints on how to fix this or on how to debug this will be greatly appreciated.
Thanks in advance. I will happily provide further information if needed.
On Fri, 2007-08-03 at 09:53 +0200, Stefan Scheler wrote:
Hello,
I implemented a statistical profiler for fiasco using performance monitoring counters (nmi handler is attached to handle_slow_trap()). It works fairly well for "mathematical" workloads but as soon as interrupt activity gets involved to a greater extent, the system freezes. For example, keeping a button pressed while sampling will freeze the machine within a few seconds. The NMI watchdog does not trigger during that freeze. Higher sampling rates also cause these freezes.
So imho there seems to be a problem when irq_interrupt() or Irq::hit() is interrupted by an NMI. What do you think? Any hints on how to fix this or on how to debug this will be greatly appreciated.
Thanks in advance. I will happily provide further information if needed.
I do'nt think tat there is a problem with an NMI in Fiasco's IRQ routines, except that there may be stack overruns.
I'd suggest to use a task gate for your NMI and run it on a completely different stack. Because there are parts of code that are extremely sensible to NMIs and this is basically the sysenter path of the Fiasco kernel.
Hello,
I'd suggest to use a task gate for your NMI and run it on a completely different stack. Because there are parts of code that are extremely sensible to NMIs and this is basically the sysenter path of the Fiasco kernel.
I tried that (see attached patch). Is that the correct way of implementing this? It works exactly once. When I reset the counters the second time I get "APIC error 00000000(00000000)" and the machine freezes.
On Sat Aug 04, 2007 at 11:50:36 +0200, Stefan Scheler wrote:
I'd suggest to use a task gate for your NMI and run it on a completely different stack. Because there are parts of code that are extremely sensible to NMIs and this is basically the sysenter path of the Fiasco kernel.
I tried that (see attached patch). Is that the correct way of implementing this? It works exactly once. When I reset the counters the second time I get "APIC error 00000000(00000000)" and the machine freezes.
-- Stefan Scheler
+++ src/kern/config_gdt.h (working copy) @@ -22,6 +22,7 @@ #define GDT_DATA_USER (0x20) // #4 #define GDT_TSS (0x28) // #5: hardware task segment #define GDT_TSS_DBF (0x30) // #6: tss for dbf handler +#define GDT_TSS_PMI (0x32) // #7: tss for pmi handler
JFYI, the 0x32 seems wrong, as you're overwriting the dbf slot. You need to use some free slot, e.g. 0x60.
Adam
Well, i managed to get my code working. I accidentally set the OVF_PMI flag in every invocation of the NMI handler which you apparently aren't supposed to do. I think that caused all the trouble. No more freezes so far!
l4-hackers@os.inf.tu-dresden.de