Hello, I implemented a statistical profiler for fiasco using performance monitoring counters (nmi handler is attached to handle_slow_trap()). It works fairly well for "mathematical" workloads but as soon as interrupt activity gets involved to a greater extent, the system freezes. For example, keeping a button pressed while sampling will freeze the machine within a few seconds. The NMI watchdog does not trigger during that freeze. Higher sampling rates also cause these freezes. So imho there seems to be a problem when irq_interrupt() or Irq::hit() is interrupted by an NMI. What do you think? Any hints on how to fix this or on how to debug this will be greatly appreciated. Thanks in advance. I will happily provide further information if needed. -- Stefan Scheler
On Fri, 2007-08-03 at 09:53 +0200, Stefan Scheler wrote:
Hello,
I implemented a statistical profiler for fiasco using performance monitoring counters (nmi handler is attached to handle_slow_trap()). It works fairly well for "mathematical" workloads but as soon as interrupt activity gets involved to a greater extent, the system freezes. For example, keeping a button pressed while sampling will freeze the machine within a few seconds. The NMI watchdog does not trigger during that freeze. Higher sampling rates also cause these freezes.
So imho there seems to be a problem when irq_interrupt() or Irq::hit() is interrupted by an NMI. What do you think? Any hints on how to fix this or on how to debug this will be greatly appreciated.
Thanks in advance. I will happily provide further information if needed.
I do'nt think tat there is a problem with an NMI in Fiasco's IRQ routines, except that there may be stack overruns. I'd suggest to use a task gate for your NMI and run it on a completely different stack. Because there are parts of code that are extremely sensible to NMIs and this is basically the sysenter path of the Fiasco kernel. -- Alex
Hello,
I'd suggest to use a task gate for your NMI and run it on a completely different stack. Because there are parts of code that are extremely sensible to NMIs and this is basically the sysenter path of the Fiasco kernel.
I tried that (see attached patch). Is that the correct way of implementing this? It works exactly once. When I reset the counters the second time I get "APIC error 00000000(00000000)" and the machine freezes. -- Stefan Scheler Index: src/kern/ia32/entry-ia32.S =================================================================== --- src/kern/ia32/entry-ia32.S (revision 130) +++ src/kern/ia32/entry-ia32.S (working copy) @@ -99,6 +99,14 @@ popl %eax jmp thread_handle_double_fault + .globl entry_vec02_pmi +entry_vec02_pmi: + cli + call pmihandler + sti + + iret + /* PPro spurious interrupt bug: * See "Pentium Pro Processor Specification Update / January 1999" @@ -345,3 +353,7 @@ .global dbf_stack_top dbf_stack_top: + .bss + .space 4096 + .global pmi_stack_top +pmi_stack_top: Index: src/kern/ia32/cpu-ia32.cpp =================================================================== --- src/kern/ia32/cpu-ia32.cpp (revision 130) +++ src/kern/ia32/cpu-ia32.cpp (working copy) @@ -45,6 +45,7 @@ static Gdt *gdt asm ("CPU_GDT"); static Tss *tss asm ("CPU_TSS"); static Tss *tss_dbf; + static Tss *tss_pmi; }; IMPLEMENTATION[ia32]: @@ -72,6 +73,7 @@ Gdt *Cpu::gdt; Tss *Cpu::tss; Tss *Cpu::tss_dbf; +Tss *Cpu::tss_pmi; extern "C" void entry_sys_fast_ipc (void); extern "C" void entry_sys_fast_ipc_c (void); @@ -549,6 +551,33 @@ tss_dbf->_io_bit_map_offset = 0x8000; } +extern "C" void entry_vec02_pmi (); +extern "C" Address pmi_stack_top; + +PUBLIC static FIASCO_INIT +void +Cpu::init_tss_pmi (Address tss_pmi_mem, Address kdir) +{ + tss_pmi = reinterpret_cast<Tss*>(tss_pmi_mem); + + gdt->set_entry_byte (Gdt::gdt_tss_pmi/8, tss_pmi_mem, sizeof(Tss)-1, + Gdt_entry::Access_kernel | Gdt_entry::Access_tss | + Gdt_entry::Accessed, 0); + + tss_pmi->_cs = Gdt::gdt_code_kernel; + tss_pmi->_ss = Gdt::gdt_data_kernel; + tss_pmi->_ds = Gdt::gdt_data_kernel; + tss_pmi->_es = Gdt::gdt_data_kernel; + tss_pmi->_fs = Gdt::gdt_data_kernel; + tss_pmi->_gs = Gdt::gdt_data_kernel; + tss_pmi->_eip = (Address)entry_vec02_pmi; + tss_pmi->_esp = (Address)&pmi_stack_top; + tss_pmi->_ldt = 0; + tss_pmi->_eflags = 0x00000082; + tss_pmi->_cr3 = kdir; + tss_pmi->_io_bit_map_offset = 0x8000; +} + PUBLIC static inline NEEDS["gdt.h"] void Cpu::set_gdt () Index: src/kern/ia32/kmem-ia32.cpp =================================================================== --- src/kern/ia32/kmem-ia32.cpp (revision 130) +++ src/kern/ia32/kmem-ia32.cpp (working copy) @@ -372,6 +372,10 @@ Cpu::init_tss_dbf (alloc_from_page(& cpu_page_vm, sizeof(Tss)), virt_to_phys(kdir)); + // allocate the task segment for the pmi handler + Cpu::init_tss_pmi (alloc_from_page(& cpu_page_vm, sizeof(Tss)), + virt_to_phys(kdir)); + // Allocate the task segment as the last thing from cpu_page_vm // because with IO protection enabled the task segment includes the // rest of the page and the following IO bitmat (2 pages). Index: src/kern/config_gdt.h =================================================================== --- src/kern/config_gdt.h (revision 130) +++ src/kern/config_gdt.h (working copy) @@ -22,6 +22,7 @@ #define GDT_DATA_USER (0x20) // #4 #define GDT_TSS (0x28) // #5: hardware task segment #define GDT_TSS_DBF (0x30) // #6: tss for dbf handler +#define GDT_TSS_PMI (0x32) // #7: tss for pmi handler #endif Index: src/kern/shared/entry-ia32-ux.S =================================================================== --- src/kern/shared/entry-ia32-ux.S (revision 130) +++ src/kern/shared/entry-ia32-ux.S (working copy) @@ -92,7 +92,11 @@ GATE_ENTRY(0x01,entry_vec01_debug,ACC_PL_K | ACC_INTR_GATE) #endif /* XXX IA32 has to handle NMI occured exactly at entry_sys_fast_ipc */ +#ifndef CONFIG_PCSAMPLING EXCEP_USR(0x02,vec02_nmi) +#else +GATE_ENTRY(0x02, GDT_TSS_PMI, ACC_PL_U | ACC_TASK_GATE) +#endif EXCEP_USR(0x03,vec03_breakpoint) EXCEP_USR(0x04,vec04_into) EXCEP_USR(0x05,vec05_bounds) Index: src/kern/shared/gdt.cpp =================================================================== --- src/kern/shared/gdt.cpp (revision 130) +++ src/kern/shared/gdt.cpp (working copy) @@ -30,6 +30,7 @@ gdt_code_user = GDT_CODE_USER, gdt_data_user = GDT_DATA_USER, gdt_tss_dbf = GDT_TSS_DBF, + gdt_tss_pmi = GDT_TSS_PMI, gdt_utcb = GDT_UTCB, gdt_ldt = GDT_LDT, gdt_user_entry1 = GDT_USER_ENTRY1,
On Sat Aug 04, 2007 at 11:50:36 +0200, Stefan Scheler wrote:
I'd suggest to use a task gate for your NMI and run it on a completely different stack. Because there are parts of code that are extremely sensible to NMIs and this is basically the sysenter path of the Fiasco kernel.
I tried that (see attached patch). Is that the correct way of implementing this? It works exactly once. When I reset the counters the second time I get "APIC error 00000000(00000000)" and the machine freezes.
-- Stefan Scheler
+++ src/kern/config_gdt.h (working copy) @@ -22,6 +22,7 @@ #define GDT_DATA_USER (0x20) // #4 #define GDT_TSS (0x28) // #5: hardware task segment #define GDT_TSS_DBF (0x30) // #6: tss for dbf handler +#define GDT_TSS_PMI (0x32) // #7: tss for pmi handler
JFYI, the 0x32 seems wrong, as you're overwriting the dbf slot. You need to use some free slot, e.g. 0x60. Adam -- Adam adam@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
Well, i managed to get my code working. I accidentally set the OVF_PMI flag in every invocation of the NMI handler which you apparently aren't supposed to do. I think that caused all the trouble. No more freezes so far! -- Stefan Scheler
participants (3)
-
Adam Lackorzynski -
Alexander Warg -
Stefan Scheler